Documentation Source Text

Check-in [16b58c9eb4]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Documentation for the content= and langaugeid= options for FTS4.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 16b58c9eb47d4fda6b8e56a77984ed62eaeef184
User & Date: drh 2012-03-05 19:14:01
Context
2012-03-05
19:40
Fix typos in the FTS4 languageid option documentation. Add entries to the 3.7.11 change log. check-in: eb26f05574 user: drh tags: trunk
19:14
Documentation for the content= and langaugeid= options for FTS4. check-in: 16b58c9eb4 user: drh tags: trunk
2012-02-27
07:06
Fix a bug in the description of the 'simple' FTS tokenizer. Underscores (codepoint 95) are divider characters not token characters. check-in: 73a0dac584 user: dan tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fts3.in.

83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
...
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
....
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
....
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352




1353
1354
1355
1356
1357
1358
1359
....
1365
1366
1367
1368
1369
1370
1371




1372
1373
1374
1375
1376
1377
1378
....
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
....
1440
1441
1442
1443
1444
1445
1446
1447












































































































































































































































































1448
1449
1450
1451
1452
1453
1454
1455
1456
....
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736

  <li> <p>Because it stores extra information on disk in two new 
       [FTS shadow tables|shadow tables] in order to support the performance
       optimizations and extra matchinfo() options, FTS4 tables may consume more
       disk space than the equivalent table created using FTS3. Usually the overhead
       is 1-2% or less, but may be as high as 10% if the documents stored in the
       FTS table are very small. The overhead may be reduced by specifying the
       directive [matchinfo_fts3|"matchinfo=fts3"] as part of the FTS4 table
       declaration, but this comes at the expense of sacrificing some of the
       extra supported matchinfo() options.

  <li> <p>FTS4 provides hooks (the compress and uncompress 
       [FTS4 options|options]) allowing data to be stored in a compressed 
       form, reducing disk usage and IO.
</ul>
................................................................................

<p>
  Which module, FTS3 or FTS4, should you use in your application?  FTS4 is
  sometimes significantly faster than FTS3, even orders of magnitude faster
  depending on the query, though in the common case the performance of the two
  modules is similar. FTS4 also offers the enhanced [matchinfo()] outputs which
  can be useful in ranking the results of a [FTS MATCH|MATCH] operation.  On the
  other hand, in the absence of a [matchinfo_fts3|matchinfo=fts3] directive FTS4 requires a little
  more disk space than FTS3, though only a percent of two in most cases.

<p>
  For newer applications, FTS4 is recommended; though if compatibility with older 
  versions of SQLite is important, then FTS3 will usually serve just as well.  

<h2>Creating and Destroying FTS Tables</h2>
................................................................................
  table is queried from within a transaction in which the associated 
  FTS table has been modified, the results of the query are likely to reflect 
  only a (possibly empty) subset of the changes made.

<h1 id=fts4_options tags="FTS4 options">FTS4 Options</h1>

<p>
<tcl>hd_fragment *matchinfo_fts3 {matchinfo_fts3}</tcl>

  If the "CREATE VIRTUAL TABLE" statement specifies module FTS4 (not FTS3), 
  then special directives - FTS4 options - similar to the "tokenize=*" option
  may also appear in place of column names. An FTS4 option consists of the
  option name, followed by an "=" character, followed by the option value.
  The option value may optionally be enclosed in single or double quotes, with
  embedded quote characters escaped in the same way as for SQL literals. There
  may not be whitespace on either side of the "=" character. For example,
................................................................................

<codeblock>
  <i>-- Create a reduced-footprint FTS4 table.</i>
  CREATE VIRTUAL TABLE papers USING fts4(author, document, matchinfo=fts3);
</codeblock>

<p>
  FTS4 currently supports the following three options:

<table striped=1>
  <tr><th>Option<th>Interpretation
  <tr><td>matchinfo<td> This option may only be set to the value "fts3".
    Attempting to set it otherwise is an error. If this option is specified,
    then some of the extra information stored by FTS4 is omitted. This reduces
    the amount of disk space consumed by an FTS4 table until it is almost the
    same as the amount that would be used by the equivalent FTS3 table, but 
    also means that the data accessed by passing the 'l' flag to the 
    [matchinfo()] function is not available.

  <tr><td>compress<td>
    This option is used to specify the compress function. It is an error to
    specify a compress function without also specifying an uncompress
    function. [fts4 compress option|See below] for details.
    
  <tr><td>uncompress<td>
    This option is used to specify the uncompress function. It is an error to
    specify an uncompress function without also specifying a compress
    function. [fts4 compress option|See below] for details.





  <tr><td>order<td>
    <tcl>hd_fragment fts4order {FTS4 order option}</tcl>
    ^The "order" option may be set to either "DESC" or "ASC" (in upper or
    lower case). ^If it is set to "DESC", then FTS4 stores its data in such
    a way as to optimize returning results in descending order by docid.
    ^If it is set to "ASC" (the default), then the data structures are 
................................................................................
  <tr><td>prefix<td>
    This option may be set to a comma-separated list of positive non-zero 
    integers. For each integer N in the list, a separate index is created
    in the database file to optimize [prefix queries] where
    the query term is N bytes in length, not including the '*' character,
    when encoded using UTF-8. [fts4 prefix option|See below] for details.





</table>

<p>
  When using FTS4, specifying a column name that contains an "=" character
  and is not either a "tokenize=*" specification or a recognized FTS4 option
  is an error. With FTS3, the first token in the unrecognized directive is 
  interpreted as a column name. Similarly, specifying multiple "tokenize=*"
................................................................................
  CREATE VIRTUAL TABLE papers USING fts3(tokenize=porter, tokenize=simple);

  <i>-- An error. Cannot create a table with two columns named "tokenize".</i>
  CREATE VIRTUAL TABLE papers USING fts3(tokenize=porter, tokenize=simple, tokenize=icu);
</codeblock>

<tcl>hd_fragment *fts4compression {compressed FTS4 content}</tcl>
<h2 tags="fts4 compress option"> Compress/Uncompress </h2>

<p>
  The compress and uncompress options allow FTS4 content to be stored in
  the database in a compressed form. Both options should be set to the name
  of an SQL scalar function registered using [sqlite3_create_function()]
  that accepts a single argument. 

................................................................................
  pay attention to data types. Specifically, when a user reads a value from
  a compressed FTS table, the value returned by FTS is exactly the same
  as the value returned by the uncompress function, including the data type.
  If that data type is not the same as the data type of the original value as
  passed to the compress function (for example if the uncompress function is
  returning BLOB when compress was originally passed TEXT), then the users
  query may not function as expected.













































































































































































































































































<tcl>hd_fragment fts4prefix {FTS4 prefix option}</tcl>
<h2 tags="fts4 prefix option">Prefix</h2>

<p>
  ^The FTS4 prefix option causes FTS to index term prefixes of specified lengths
  in the same way that it always indexes complete terms.  ^The prefix option
  must be set to a comma separated list of positive non-zero integers. 
  ^For each value N in the list, prefixes of length N bytes (when encoded 
  using UTF-8) are indexed.  ^FTS4 uses term prefix indexes to speed up
................................................................................
  into the FTS virtual table by the user. If the user does not explicitly
  supply a "docid" value when inserting records, one is selected automatically
  by the system.

<p>
  The %_stat and %_docsize tables are only created if the FTS table uses the
  FTS4 module, not FTS3. Furthermore, the %_docsize table is omitted if the
  FTS4 table is created with the [matchinfo_fts3|"matchinfo=fts3"] directive
  specified as part of the CREATE VIRTUAL TABLE statement. If they are created,
  the schema of the two tables is as follows:
<codeblock>
  CREATE TABLE %_stat(
    id INTEGER PRIMARY KEY, 
    value BLOB
  );







|







 







|







 







<
<







 







|



|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
>
>
>
>







 







>
>
>
>







 







|







 








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

|







 







|







83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
...
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
....
1308
1309
1310
1311
1312
1313
1314


1315
1316
1317
1318
1319
1320
1321
....
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
....
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
....
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
....
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
....
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010

  <li> <p>Because it stores extra information on disk in two new 
       [FTS shadow tables|shadow tables] in order to support the performance
       optimizations and extra matchinfo() options, FTS4 tables may consume more
       disk space than the equivalent table created using FTS3. Usually the overhead
       is 1-2% or less, but may be as high as 10% if the documents stored in the
       FTS table are very small. The overhead may be reduced by specifying the
       directive [FTS4 matchinfo option|"matchinfo=fts3"] as part of the FTS4 table
       declaration, but this comes at the expense of sacrificing some of the
       extra supported matchinfo() options.

  <li> <p>FTS4 provides hooks (the compress and uncompress 
       [FTS4 options|options]) allowing data to be stored in a compressed 
       form, reducing disk usage and IO.
</ul>
................................................................................

<p>
  Which module, FTS3 or FTS4, should you use in your application?  FTS4 is
  sometimes significantly faster than FTS3, even orders of magnitude faster
  depending on the query, though in the common case the performance of the two
  modules is similar. FTS4 also offers the enhanced [matchinfo()] outputs which
  can be useful in ranking the results of a [FTS MATCH|MATCH] operation.  On the
  other hand, in the absence of a [FTS4 matchinfo option|matchinfo=fts3] directive FTS4 requires a little
  more disk space than FTS3, though only a percent of two in most cases.

<p>
  For newer applications, FTS4 is recommended; though if compatibility with older 
  versions of SQLite is important, then FTS3 will usually serve just as well.  

<h2>Creating and Destroying FTS Tables</h2>
................................................................................
  table is queried from within a transaction in which the associated 
  FTS table has been modified, the results of the query are likely to reflect 
  only a (possibly empty) subset of the changes made.

<h1 id=fts4_options tags="FTS4 options">FTS4 Options</h1>

<p>


  If the "CREATE VIRTUAL TABLE" statement specifies module FTS4 (not FTS3), 
  then special directives - FTS4 options - similar to the "tokenize=*" option
  may also appear in place of column names. An FTS4 option consists of the
  option name, followed by an "=" character, followed by the option value.
  The option value may optionally be enclosed in single or double quotes, with
  embedded quote characters escaped in the same way as for SQL literals. There
  may not be whitespace on either side of the "=" character. For example,
................................................................................

<codeblock>
  <i>-- Create a reduced-footprint FTS4 table.</i>
  CREATE VIRTUAL TABLE papers USING fts4(author, document, matchinfo=fts3);
</codeblock>

<p>
  FTS4 currently supports the following options:

<table striped=1>
  <tr><th>Option<th>Interpretation
  <tr><td>compress<td>
    ^The compress option is used to specify the compress function. ^It is an error to
    specify a compress function without also specifying an uncompress
    function. [fts4 compress option|See below] for details.

  <tr><td>content<td>
    ^The content allows the text being indexed to
    stored in a separate table distinct from the FTS4 table, or 
    or even outside of SQLite.

  <tr><td>languageid<td>
    ^The languageid option causes the FTS4 table to have an additional hidden
    integer column that identifies the language of the text contained in
    each row.  The use of the languageid option allows the same FTS4 table
    to hold text in multiple languages or scripts, each with different tokenizer
    rules, and to query each language independently of the others.
    
  <tr><td>matchinfo<td> 
    When set to the value "fts3", the matchinfo option reduces the amount of
    information stored by FTS4 with the consequence that the "l" option of
    [matchinfo()] is no longer available.

  <tr><td>order<td>
    <tcl>hd_fragment fts4order {FTS4 order option}</tcl>
    ^The "order" option may be set to either "DESC" or "ASC" (in upper or
    lower case). ^If it is set to "DESC", then FTS4 stores its data in such
    a way as to optimize returning results in descending order by docid.
    ^If it is set to "ASC" (the default), then the data structures are 
................................................................................
  <tr><td>prefix<td>
    This option may be set to a comma-separated list of positive non-zero 
    integers. For each integer N in the list, a separate index is created
    in the database file to optimize [prefix queries] where
    the query term is N bytes in length, not including the '*' character,
    when encoded using UTF-8. [fts4 prefix option|See below] for details.

  <tr><td>uncompress<td>
    This option is used to specify the uncompress function. It is an error to
    specify an uncompress function without also specifying a compress
    function. [fts4 compress option|See below] for details.
</table>

<p>
  When using FTS4, specifying a column name that contains an "=" character
  and is not either a "tokenize=*" specification or a recognized FTS4 option
  is an error. With FTS3, the first token in the unrecognized directive is 
  interpreted as a column name. Similarly, specifying multiple "tokenize=*"
................................................................................
  CREATE VIRTUAL TABLE papers USING fts3(tokenize=porter, tokenize=simple);

  <i>-- An error. Cannot create a table with two columns named "tokenize".</i>
  CREATE VIRTUAL TABLE papers USING fts3(tokenize=porter, tokenize=simple, tokenize=icu);
</codeblock>

<tcl>hd_fragment *fts4compression {compressed FTS4 content}</tcl>
<h2 tags="fts4 compress option">The compress= and uncompress= options</h2>

<p>
  The compress and uncompress options allow FTS4 content to be stored in
  the database in a compressed form. Both options should be set to the name
  of an SQL scalar function registered using [sqlite3_create_function()]
  that accepts a single argument. 

................................................................................
  pay attention to data types. Specifically, when a user reads a value from
  a compressed FTS table, the value returned by FTS is exactly the same
  as the value returned by the uncompress function, including the data type.
  If that data type is not the same as the data type of the original value as
  passed to the compress function (for example if the uncompress function is
  returning BLOB when compress was originally passed TEXT), then the users
  query may not function as expected.

<tcl>hd_fragment *fts4content {FTS4 content option}</tcl>
<h2 tags="fts4 content option">The content= option </h2>

<p>
  The content option allows FTS4 to forego storing the text being indexed.
  The content option can be used in two ways:

<ul>
<li><p> The indexed documents are not stored within the SQLite database 
        at all (a "contentless" FTS4 table), or

<li><p> The indexed documents are stored in a database table created and
        managed by the user (an "external content" FTS4 table).
</ul>

<p>
  Because the indexed documents themselves are usually much larger than 
  the full-text index, the content option can be used to achieve 
  significant space savings.

<h3> Contentless FTS4 Tables </h3>

<p>
  In order to create an FTS4 table that does not store a copy of the indexed
  documents at all, the content option should be set to an empty string.
  For example, the following SQL creates such an FTS4 table with three
  columns - "a", "b", and "c":

<codeblock>
  CREATE VIRTUAL TABLE t1 USING fts4(content="", a, b, c);
</codeblock>

<p>
  Data can be inserted into such an FTS4 table using an INSERT statements.
  However, unlike ordinary FTS4 tables, the user must supply an explicit
  integer docid value. For example:

<codeblock>
  <i>-- This statement is Ok:</i>
  INSERT INTO t1(docid, a, b, c) VALUES(1, 'a b c', 'd e f', 'g h i');

  <i>-- This statement causes an error, as no docid value has been provided:</i>
  INSERT INTO t1(a, b, c) VALUES('j k l', 'm n o', 'p q r');
</codeblock>

<p>
  It is not possible to UPDATE or DELETE a row stored in a contentless FTS4
  table. Attempting to do so is an error.

<p>
  Contentless FTS4 tables also support SELECT statements. However, it is
  an error to attempt to retrieve the value of any table column other than
  the docid column. The auxiliary function matchinfo() may be used, but
  snippet() and offsets() may not. For example:

<codeblock>
  <i>-- The following statements are Ok:</i>
  SELECT docid FROM t1 WHERE t1 MATCH 'xxx';
  SELECT docid FROM t1 WHERE a MATCH 'xxx';
  SELECT matchinfo(t1) FROM t1 WHERE t1 MATCH 'xxx';

  <i>-- The following statements all cause errors, as the value of columns</i>
  <i>-- other than docid are required to evaluate them.</i>
  SELECT * FROM t1;
  SELECT a, b FROM t1 WHERE t1 MATCH 'xxx';
  SELECT docid FROM t1 WHERE a LIKE 'xxx%';
  SELECT snippet(t1) FROM t1 WHERE t1 MATCH 'xxx';
</codeblock>

<p>
  Errors related to attempting to retrieve column values other than docid
  are runtime errors that occur within sqlite3_step(). In some cases, for
  example if the MATCH expression in a SELECT query matches zero rows, there
  may be no error at all even if a statement does refer to column values 
  other than docid.

<h3> External Content FTS4 Tables </h3>

<p>
  An "external content" FTS4 table is similar to a contentless table, except
  that if evaluation of a query requires the value of a column other than 
  docid, FTS4 attempts to retrieve that value from a table (or view, or 
  virtual table) nominated by the user (hereafter referred to as the "content
  table"). The FTS4 module never writes to the content table, and writing
  to the content table does not affect the full-text index. It is the
  responsibility of the user to ensure that the content table and the 
  full-text index are consistent.

<p>
  An external content FTS4 table is created by setting the content option
  to the name of a table (or view, or virtual table) that may be queried by
  FTS4 to retrieve column values when required. If the nominated table does
  not exist, then an external content table behaves in the same way as
  a contentless table. For example:

<codeblock>
  CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c);
  CREATE VIRTUAL TABLE t3 USING fts4(content="t2", a, c);
</codeblock>

<p>
  Assuming the nominated table does exist, then its columns must be the same 
  as or a superset of those defined for the FTS table.

<p>
  When a users query on the FTS table requires a column value other than
  docid, FTS attempts to read this value from the corresponding column of
  the row in the content table with a rowid value equal to the current FTS
  docid. Or, if such a row cannot be found in the content table, a NULL
  value is used instead. For example:

<codeblock>
  CREATE TABLE t2(id INTEGER PRIMARY KEY, a, b, c, d);
  CREATE VIRTUAL TABLE t3 USING fts4(content="t2", b, c);
  
  INSERT INTO t2 VALUES(2, 'a b', 'c d', 'e f');
  INSERT INTO t2 VALUES(3, 'g h', 'i j', 'k l');
  INSERT INTO t3(docid, b, c) SELECT id, b, c FROM t2;
  <i>-- The following query returns a single row with two columns containing
  -- the text values "i j" and "k l".
  --
  -- The query uses the full-text index to discover that the MATCH 
  -- term matches the row with docid=3. It then retrieves the values
  -- of columns b and c from the row with rowid=3 in the content table
  -- to return.
  --</i>
  SELECT * FROM t3 WHERE t3 MATCH 'k';

  <i>-- Following the UPDATE, the query still returns a single row, this
  -- time containing the text values "xxx" and "yyy". This is because the
  -- full-text index still indicates that the row with docid=3 matches
  -- the FTS4 query 'k', even though the documents stored in the content
  -- table have been modified.
  --</i>
  UPDATE t2 SET b = 'xxx', c = 'yyy' WHERE rowid = 3;
  SELECT * FROM t3 WHERE t3 MATCH 'k';

  <i>-- Following the DELETE below, the query returns one row containing two
  -- NULL values. NULL values are returned because FTS is unable to find
  -- a row with rowid=3 within the content table.
  --</i>
  DELETE FROM t2;
  SELECT * FROM t3 WHERE t3 MATCH 'k';
</codeblock>

<p>
  When a row is deleted from an external content FTS4 table, FTS4 needs to
  retrieve the column values of the row being deleted from the content table.
  This is so that FTS4 can update the full-text index entries for each token
  that occurs within the deleted row to indicate that that row has been 
  deleted. If the content table row cannot be found, or if it contains values
  inconsistent with the contents of the FTS index, the results can be difficult
  to predict. The FTS index may be left containing entries corresponding to the
  deleted row, which can lead to seemingly nonsensical results being returned
  by subsequent SELECT queries. The same applies when a row is updated, as
  internally an UPDATE is the same as a DELETE followed by an INSERT.

<p>  
  Instead of writing separately to the full-text index and the content table,
  some users may wish to use database triggers to keep the full-text index
  up to date with respect to the set of documents stored in the content table.
  For example, using the tables from earlier examples:

<codeblock>
  CREATE TRIGGER t2_bu BEFORE UPDATE ON t2 BEGIN
    DELETE FROM t3 WHERE docid=old.rowid;
  END;
  CREATE TRIGGER t2_bd BEFORE DELETE ON t2 BEGIN
    DELETE FROM t3 WHERE docid=old.rowid;
  END;

  CREATE TRIGGER t2_bu AFTER UPDATE ON t2 BEGIN
    INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
  END;
  CREATE TRIGGER t2_bd AFTER INSERT ON t2 BEGIN
    INSERT INTO t3(docid, b, c) VALUES(new.rowid, new.b, new.c);
  END;
</codeblock>

<p>
  The DELETE trigger must be fired before the actual delete takes place
  on the content table. This is so that FTS4 can still retrieve the original
  values in order to update the full-text index. And the INSERT trigger must
  be fired after the new row is inserted, so as to handle the case where the
  rowid is assigned automatically within the system. The UPDATE trigger must
  be split into two parts, one fired before and one after the update of the
  content table, for the same reasons.

<p>
  FTS4 features a special command similar to the 'optimize' command that
  deletes the entire full-text index and rebuilds it based on the current
  set of documents in the content table. Assuming again that "t3" is the
  name of the external content FTS4 table, the command is:

<codeblock>
  INSERT INTO t3(t3) VALUES('rebuild');
</codeblock>

<p>
  This command may also be used with ordinary FTS4 tables, although it may
  only be useful if the full-text index has somehow become corrupt. It is an
  error to attempt to rebuild the full-text index maintained by a contentless
  FTS4 table.


<tcl>hd_fragment *fts4langaugeid {FTS4 langaugeid option}</tcl>
<h2 tags="fts4 languageid option">The languageid= option</h2>

<p>
  When the langaugeid option is present, it specifies the name of
  another [hidden column] that is added to the FTS4
  table and which is used to specify the language stored in each row
  of the FTS4 table.  The name of the languageid hidden column must
  be distinct from all other column names in the FTS4 table.  Example:

<codeblock>
  CREATE VIRTUAL TABLE t1 USING fts4(x, y, languageid="lid")
</codeblock>

<p>
  The default value of a languageid column is 0. Any value inserted
  into a languageid column is converted to a 32-bit (not 64) signed
  integer.

<p>
  By default, FTS queries (those that use the MATCH operator)
  consider only those rows with the languageid column set to 0. To
  query for rows with other languageid values, a constraint of the
  form "<language-id> = <integer>" must be added to the queries
  WHERE clause. For example:

<codeblock>
  SELECT * FROM t1 WHERE t1 MATCH 'abc' AND lid=5;
</codeblock>

<p>
  It is not possible for a single FTS query to return rows with
  different languageid values. The results of adding WHERE clauses
  that use other operators (e.g.  lid!=5, or lid<=5) are undefined.

<p>
  If the content option is used along with the languageid option,
  then the named languageid column must exist in the content= table
  (subject to the usual rules - if a query never needs to read the
  content table then this restriction does not apply).

<p>
  When the languageid option is used, SQLite invokes the xLanguageid()
  on the sqlite3_tokenizer_module object immediately after the object
  is created in order to pass in the language id that particular
  tokenizer should use.  The xLanguageid() method will never be called
  more than once for any single tokenizer object.
  


<tcl>hd_fragment fts4matchinfo {FTS4 matchinfo option}</tcl>
<h2 tags="fts4 matchinfo option">The matchinfo= option</h2>

<p>
  The matchinfo option may only be set to the value "fts3". 
  Attempting to set matchinfo to anything other than "fts3" is an error.
  If this option is specified, then some of the extra information stored by
  FTS4 is omitted. This reduces the amount of disk space consumed by
  an FTS4 table until it is almost the same as the amount that would
  be used by the equivalent FTS3 table, but also means that the data
  accessed by passing the 'l' flag to the [matchinfo()] function is
  not available. 

<tcl>hd_fragment fts4prefix {FTS4 prefix option}</tcl>
<h2 tags="fts4 prefix option">The prefix= option</h2>

<p>
  ^The FTS4 prefix option causes FTS to index term prefixes of specified lengths
  in the same way that it always indexes complete terms.  ^The prefix option
  must be set to a comma separated list of positive non-zero integers. 
  ^For each value N in the list, prefixes of length N bytes (when encoded 
  using UTF-8) are indexed.  ^FTS4 uses term prefix indexes to speed up
................................................................................
  into the FTS virtual table by the user. If the user does not explicitly
  supply a "docid" value when inserting records, one is selected automatically
  by the system.

<p>
  The %_stat and %_docsize tables are only created if the FTS table uses the
  FTS4 module, not FTS3. Furthermore, the %_docsize table is omitted if the
  FTS4 table is created with the [FTS4 matchinfo option|"matchinfo=fts3"] directive
  specified as part of the CREATE VIRTUAL TABLE statement. If they are created,
  the schema of the two tables is as follows:
<codeblock>
  CREATE TABLE %_stat(
    id INTEGER PRIMARY KEY, 
    value BLOB
  );