Documentation Source Text

Check-in [f1d54965d0]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Enforce the 80-character line limit on the new documentation source text for FTS4 commands.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: f1d54965d02ae8bcb722f8c646f07e9fab43cebd
User & Date: drh 2012-04-02 15:00:15
Context
2012-04-02
15:49
Updates to the atomic commit document in order to reference WAL and PSOW and to improve clarity of presentation. check-in: aaf3ea1155 user: drh tags: trunk
15:00
Enforce the 80-character line limit on the new documentation source text for FTS4 commands. check-in: f1d54965d0 user: drh tags: trunk
14:47
Updates to the FTS4 document to describe recent merge=X,Y, automerge=B, and integrity-check commands. Updates to the change log for the same and for other recent changes. check-in: 2c33993bdc user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fts3.in.

1752
1753
1754
1755
1756
1757
1758

1759
1760
1761
1762
1763
1764
1765
....
1769
1770
1771
1772
1773
1774
1775
1776

1777
1778
1779
1780

1781
1782
1783
1784

1785
1786
1787
1788

1789
1790
1791
1792

1793
1794
1795
1796

1797
1798
1799
1800
1801

1802
1803

1804
1805
1806
1807
1808
1809

1810
1811

1812
1813
1814
1815

1816
1817
1818
1819

1820
1821
1822
1823

1824
1825
1826

1827
1828
1829
1830
1831
1832
1833

1834
1835

1836
1837
1838

1839
1840
1841

1842
1843

1844
1845
1846
1847
1848

1849
1850

1851
1852
1853

1854
1855
1856
1857
1858
1859

1860
1861
1862
1863

1864
1865

1866
1867
1868
1869

1870
1871

1872
1873
1874
1875
1876

1877
1878

1879
1880

1881
1882
1883
1884
1885
1886
1887
1888
  <i>-- The following two queries are both partially optimized using the prefix</i>
  <i>-- indexes. The optimization is not as pronounced as it is for the queries</i>
  <i>-- above, but still an improvement over no prefix indexes at all.</i>
  SELECT * FROM t1 WHERE t1 MATCH 'a*';
  SELECT * FROM t1 WHERE t1 MATCH 'abc*';
</codeblock>


<h1 id=commands tags="commands">Special Commands For FTS3 and FTS4</h1>

<p>
  Special INSERT operates can be used to issue commands to FTS3 and FTS4 tables.
  Every FTS3 and FTS4 has a hidden, read-only column which is the same name as
  the table itself.  INSERTs into this hidden column are interpreted as commands
  to the FTS3/4 table.  For a table with the name "xyz" the following commands
................................................................................
<li><p>INSERT INTO xyz(xyz) VALUES('optimize');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('rebuild');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('integrity-check');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('merge=X,Y');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('automerge=B');</p>
</ul>

<tcl>hd_fragment *fts4optcmd {FTS4 "optimize" command} {"optimize" command}</tcl>

<h2 id=optimize>The "optimize" command</h2>

<p>
  The "optimize" command causes FTS3/4 to merge together all of its inverted index b-trees into

  one large and complete b-tree.  Doing an optimize will make subsequent queries
  run faster since there are fewer b-trees to search, and it may reduce disk usage
  by coalescing redundant entries.  However, for a large FTS table, running optimize can be as
  expensive as running [VACUUM].  The optimize command essentially has to read and write the

  entire FTS table, resulting in a large transaction.

<p>
  In batch-mode operation, where an FTS table is initially built up using a large number of

  INSERT operations, then queried repeatedly without further changes, it is often a good idea
  to run "optimize" after the last INSERT and before the first query.

<tcl>hd_fragment *fts4rebuidcmd {FTS4 "rebuild" command} {"rebuild" command}</tcl>

<h2 id=rebuild>The "rebuild" command</h2>

<p>
  The "rebuild" command causes SQLite to discard the entire FTS3/4 table and then rebuild it

  again from original text.  The concept is similar to [REINDEX], only that it applies to an
  FTS3/4 table instead of an ordinary index.

<p>
  The "rebuild" command should be run whenever the implementation of a custom tokenizer changes,

  so that all content can be retokenized.  The "rebuild" command is also useful when using
  the [FTS4 content option] after changes have been made to the original content table.


<tcl>hd_fragment *fts4ickcmd {FTS4 "integrity-check" command}</tcl>
<h2 id=integcheck>The "integrity-check" command</h2>

<p>
  The "integrity-check" command causes SQLite to read and verify the accuracy of all inverted

  indices in an FTS3/4 table by comparing those inverted indices against the original content.
  The "integrity-check" command silently succeeds if the inverted indices are all ok, but will

  fail with an SQLITE_CORRUPT error if any problems are found.

<p>
  The "integrity-check" command is similar in concept to [PRAGMA integrity_check].  

  In a working system, the "integrity-command" should aways be successful.  Possible causes
  of integrity-check failures include:
  <ul>
  <li> The application has made changes to the [FTS shadow tables] directly, without using the FTS3/4

       virtual table, causing the shadow tables to become out of sync with each other.
  <li> Using the [FTS4 content option] and failing to manually keep the content in sync with the
       FTS4 inverted indices.
  <li> Bugs in the FTS3/4 virtual table.  (The "integrity-check" command was original conceived

       as part of the test suite for FTS3/4.)
  <li> Corruption to the underlying SQLite database file.  (See documentation on
       [how to corrupt] and SQLite database for additional information.)

  </ul>

<tcl>hd_fragment *fts4mergecmd {FTS4 "merge" command} {"merge" command}</tcl>
<h2 id="mergecmd">The "merge=X,Y" command</h2>

<p>
  The "merge=X,Y" command (where X and Y are integers) causes SQLite to do a limited amount of work

  toward merging the various inverted index b-trees of an FTS3/4 table together into one large b-tree.
  The X value is the target number of "blocks" to be merged, and Y is the minimum number of b-tree

  segments on a level required before merging will be applied to that level.  The value of Y should
  be between 2 and 16 with a recommended value of 8.  The value of X can be any positive integer but
  values on the order of 100 to 300 are recommended.


<p>
  When an FTS table accumulates 16 b-tree segments at the same level, the next INSERT into that

  table will cause all 16 segments to be merged into a single b-tree segment at the next higher level.
  The effect of these level merges is that most INSERTs into an FTS table are very fast and take minimal

  memory, but an occasional INSERT is slow and generates a large transaction because of the need to
  do merging. This results in "spiky" performance of INSERTs.

<p>
  To avoid spiky INSERT performance, an application can run the "merge=X,Y" command periodically,

  possibly in an idle thread or idle process, to ensure that the FTS table never accumulates
  too many b-tree segments at the same level.  INSERT performance spikes can generally be avoided, and

  performance of FTS3/4 can be maximized, by running "merge=X,Y" after every few thousand
  document inserts.  Each "merge=X,Y" command will run in a separate transaction (unless they
  are grouped together using [BEGIN]...[COMMIT], of course).  The transactions can be kept

  small by choosing a value for X in the range of 100 to 300.  The idle thread that is running
  the merge commands can know when it is done by checking the difference in
  [sqlite3_total_changes()] before and after each "merge=X,Y" command and stopping the loop
  when the difference drops below two.

<tcl>hd_fragment *fts4automergecmd {FTS4 "automerge" command} {"automerge" command}</tcl>

<h2 id=automerge">The "automerge=B" command</h2>

<p>
  The "automerge=B" command (where B is either "1" or "0") disables or enables automatic incremental

  inverted index merging for an FTS3/4 table.  The default for new tables is for automatic incremental
  merging to be disabled.  The "automerge=B" command changes this setting.  The change is persistent

  and continues to be in effect for all subsequent database connections to the same database.

<p>
  Enabling automatic incremental merge causes SQLite to do a small amount of inverted index merging

  after every INSERT operation.  The amount of merging performed is designed so that the FTS3/4 table
  never reaches a point where it has 16 segments at the same level and hence has to do a large merge

  in order to complete an insert.  In other words, automatic incremental merging is designed to
  prevent spiky INSERT performance.

<p>
  The downside of automatic incremental merging is that it makes every INSERT, UPDATE, and DELETE

  operation on an FTS3/4 table run a little slower, since extra time must be used to do the incremental
  merge.  For maximum performance, it is recommended that applications disable automatic incremental

  merge and instead use the ["merge" command] in an idle process to keep the inverted indices well
  merged.  But if the structure of an application does not easily allow for idle processes, the use

  of automatic incremental merge is a very reasonable fallback solution.


<h1 id=tokenizer tags="tokenizer">Tokenizers</h1>

<p>
  An FTS tokenizer is a set of rules for extracting terms from a document 
  or basic FTS full-text query. 







>







 







|
>



|
>
|
|
|
|
>
|


|
>
|


|
>



|
>
|



|
>
|
|
>





|
>
|
|
>
|


|
>
|
|

|
>
|
|
|
|
>
|
|
|
>






|
>
|
|
>
|
|
|
>


|
>
|
|
>
|



|
>
|
|
>
|
|
|
>
|
|
|
|

|
>



|
>
|
|
>
|


|
>
|
|
>
|
|


|
>
|
|
>
|
|
>
|







1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
....
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
  <i>-- The following two queries are both partially optimized using the prefix</i>
  <i>-- indexes. The optimization is not as pronounced as it is for the queries</i>
  <i>-- above, but still an improvement over no prefix indexes at all.</i>
  SELECT * FROM t1 WHERE t1 MATCH 'a*';
  SELECT * FROM t1 WHERE t1 MATCH 'abc*';
</codeblock>

<tcl>hd_fragment *cmds {FTS4 commands}</tcl>
<h1 id=commands tags="commands">Special Commands For FTS3 and FTS4</h1>

<p>
  Special INSERT operates can be used to issue commands to FTS3 and FTS4 tables.
  Every FTS3 and FTS4 has a hidden, read-only column which is the same name as
  the table itself.  INSERTs into this hidden column are interpreted as commands
  to the FTS3/4 table.  For a table with the name "xyz" the following commands
................................................................................
<li><p>INSERT INTO xyz(xyz) VALUES('optimize');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('rebuild');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('integrity-check');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('merge=X,Y');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('automerge=B');</p>
</ul>

<tcl>hd_fragment *fts4optcmd {FTS4 "optimize" command} \
                             {"optimize" command}</tcl>
<h2 id=optimize>The "optimize" command</h2>

<p>
  The "optimize" command causes FTS3/4 to merge together all of its
  inverted index b-trees into one large and complete b-tree.  Doing
  an optimize will make subsequent queries run faster since there are
  fewer b-trees to search, and it may reduce disk usage by coalescing
  redundant entries.  However, for a large FTS table, running optimize
  can be as expensive as running [VACUUM].  The optimize command
  essentially has to read and write the entire FTS table, resulting
  in a large transaction.

<p>
  In batch-mode operation, where an FTS table is initially built up
  using a large number of INSERT operations, then queried repeatedly
  without further changes, it is often a good idea
  to run "optimize" after the last INSERT and before the first query.

<tcl>hd_fragment *fts4rebuidcmd {FTS4 "rebuild" command} \
                                {"rebuild" command}</tcl>
<h2 id=rebuild>The "rebuild" command</h2>

<p>
  The "rebuild" command causes SQLite to discard the entire FTS3/4
  table and then rebuild it again from original text.  The concept
  is similar to [REINDEX], only that it applies to an
  FTS3/4 table instead of an ordinary index.

<p>
  The "rebuild" command should be run whenever the implementation
  of a custom tokenizer changes, so that all content can be retokenized.
  The "rebuild" command is also useful when using the
  [FTS4 content option] after changes have been made to the original
  content table.

<tcl>hd_fragment *fts4ickcmd {FTS4 "integrity-check" command}</tcl>
<h2 id=integcheck>The "integrity-check" command</h2>

<p>
  The "integrity-check" command causes SQLite to read and verify
  the accuracy of all inverted indices in an FTS3/4 table by comparing
  those inverted indices against the original content. The 
  "integrity-check" command silently succeeds if the inverted
  indices are all ok, but will fail with an SQLITE_CORRUPT error
  if any problems are found.

<p>
  The "integrity-check" command is similar in concept to
 [PRAGMA integrity_check].  In a working system, the "integrity-command"
 should aways be successful.  Possible causes of integrity-check
 failures include:
  <ul>
  <li> The application has made changes to the [FTS shadow tables]
       directly, without using the FTS3/4 virtual table, causing
       the shadow tables to become out of sync with each other.
  <li> Using the [FTS4 content option] and failing to manually keep
       the content in sync with the FTS4 inverted indices.
  <li> Bugs in the FTS3/4 virtual table.  (The "integrity-check"
       command was original conceived as part of the test suite
       for FTS3/4.)
  <li> Corruption to the underlying SQLite database file.  (See
       documentation on [how to corrupt] and SQLite database for
       additional information.)
  </ul>

<tcl>hd_fragment *fts4mergecmd {FTS4 "merge" command} {"merge" command}</tcl>
<h2 id="mergecmd">The "merge=X,Y" command</h2>

<p>
  The "merge=X,Y" command (where X and Y are integers) causes SQLite
  to do a limited amount of work toward merging the various inverted
  index b-trees of an FTS3/4 table together into one large b-tree.
  The X value is the target number of "blocks" to be merged, and Y is
  the minimum number of b-tree segments on a level required before
  merging will be applied to that level.  The value of Y should
  be between 2 and 16 with a recommended value of 8.  The value of X
  can be any positive integer but values on the order of 100 to 300
  are recommended.

<p>
  When an FTS table accumulates 16 b-tree segments at the same level,
  the next INSERT into that table will cause all 16 segments to be
  merged into a single b-tree segment at the next higher level.  The
  effect of these level merges is that most INSERTs into an FTS table
  are very fast and take minimal memory, but an occasional INSERT is
  slow and generates a large transaction because of the need to
  do merging. This results in "spiky" performance of INSERTs.

<p>
  To avoid spiky INSERT performance, an application can run the
  "merge=X,Y" command periodically, possibly in an idle thread or
  idle process, to ensure that the FTS table never accumulates
  too many b-tree segments at the same level.  INSERT performance
  spikes can generally be avoided, and performance of FTS3/4 can be
  maximized, by running "merge=X,Y" after every few thousand
  document inserts.  Each "merge=X,Y" command will run in a separate
  transaction (unless they are grouped together using [BEGIN]...[COMMIT],
  of course).  The transactions can be kept small by choosing a value
  for X in the range of 100 to 300.  The idle thread that is running
  the merge commands can know when it is done by checking the difference
  in [sqlite3_total_changes()] before and after each "merge=X,Y"
  command and stopping the loop when the difference drops below two.

<tcl>hd_fragment *fts4automergecmd {FTS4 "automerge" command} \
                                   {"automerge" command}</tcl>
<h2 id=automerge">The "automerge=B" command</h2>

<p>
  The "automerge=B" command (where B is either "1" or "0") disables
  or enables automatic incremental inverted index merging for an
  FTS3/4 table.  The default for new tables is for automatic incremental
  merging to be disabled.  The "automerge=B" command changes this
  setting.  The change is persistent and continues to be in effect
  for all subsequent database connections to the same database.

<p>
  Enabling automatic incremental merge causes SQLite to do a small
  amount of inverted index merging after every INSERT operation.
  The amount of merging performed is designed so that the FTS3/4
  table never reaches a point where it has 16 segments at the same
  level and hence has to do a large merge in order to complete an
  insert.  In other words, automatic incremental merging is designed
  to prevent spiky INSERT performance.

<p>
  The downside of automatic incremental merging is that it makes
  every INSERT, UPDATE, and DELETE operation on an FTS3/4 table run
  a little slower, since extra time must be used to do the incremental
  merge.  For maximum performance, it is recommended that applications
  disable automatic incremental merge and instead use the 
  ["merge" command] in an idle process to keep the inverted indices
  well merged.  But if the structure of an application does not easily
  allow for idle processes, the use of automatic incremental merge is
  a very reasonable fallback solution.


<h1 id=tokenizer tags="tokenizer">Tokenizers</h1>

<p>
  An FTS tokenizer is a set of rules for extracting terms from a document 
  or basic FTS full-text query.