Documentation Source Text

Check-in [ae994ce63a]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update fts3.html for recent changes to FTS.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: ae994ce63aa67a8ee18e86ef77ac0edf30f3b81c
User & Date: dan 2014-05-16 11:28:32.378
Context
2014-05-19
19:55
Mention the automerge enhancement in the release notes. (check-in: 5b3b975982 user: drh tags: trunk)
2014-05-16
11:28
Update fts3.html for recent changes to FTS. (check-in: ae994ce63a user: dan tags: trunk)
2014-05-09
22:27
Fix typo in VALUES clause documentation in lang.html. (check-in: fee01c2d5b user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fts3.in.
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
  are supported:

<ul>
<li><p>INSERT INTO xyz(xyz) VALUES('optimize');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('rebuild');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('integrity-check');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('merge=X,Y');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('automerge=B');</p>
</ul>

<tcl>hd_fragment *fts4optcmd {FTS4 "optimize" command} \
                             {"optimize" command}</tcl>
<h2 id=optimize>The "optimize" command</h2>

<p>







|







1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
  are supported:

<ul>
<li><p>INSERT INTO xyz(xyz) VALUES('optimize');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('rebuild');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('integrity-check');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('merge=X,Y');</p>
<li><p>INSERT INTO xyz(xyz) VALUES('automerge=N');</p>
</ul>

<tcl>hd_fragment *fts4optcmd {FTS4 "optimize" command} \
                             {"optimize" command}</tcl>
<h2 id=optimize>The "optimize" command</h2>

<p>
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943

1944
1945

1946
1947
1948
1949
1950

1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969





















1970
1971
1972
1973
1974
1975
1976
  for X in the range of 100 to 300.  The idle thread that is running
  the merge commands can know when it is done by checking the difference
  in [sqlite3_total_changes()] before and after each "merge=X,Y"
  command and stopping the loop when the difference drops below two.

<tcl>hd_fragment *fts4automergecmd {FTS4 "automerge" command} \
                                   {"automerge" command}</tcl>
<h2 id=automerge">The "automerge=B" command</h2>

<p>
  The "automerge=B" command (where B is either "1" or "0") disables

  or enables automatic incremental inverted index merging for an
  FTS3/4 table.  The default for new tables is for automatic incremental

  merging to be disabled.  The "automerge=B" command changes this
  setting.  The change is persistent and continues to be in effect
  for all subsequent database connections to the same database.

<p>

  Enabling automatic incremental merge causes SQLite to do a small
  amount of inverted index merging after every INSERT operation.
  The amount of merging performed is designed so that the FTS3/4
  table never reaches a point where it has 16 segments at the same
  level and hence has to do a large merge in order to complete an
  insert.  In other words, automatic incremental merging is designed
  to prevent spiky INSERT performance.

<p>
  The downside of automatic incremental merging is that it makes
  every INSERT, UPDATE, and DELETE operation on an FTS3/4 table run
  a little slower, since extra time must be used to do the incremental
  merge.  For maximum performance, it is recommended that applications
  disable automatic incremental merge and instead use the 
  ["merge" command] in an idle process to keep the inverted indices
  well merged.  But if the structure of an application does not easily
  allow for idle processes, the use of automatic incremental merge is
  a very reasonable fallback solution.























<h1 id=tokenizer tags="tokenizer">Tokenizers</h1>

<p>
  An FTS tokenizer is a set of rules for extracting terms from a document 
  or basic FTS full-text query. 








|


|
>
|
|
>
|
|
|


>
|
|
|
|
<
|
|












>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957

1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
  for X in the range of 100 to 300.  The idle thread that is running
  the merge commands can know when it is done by checking the difference
  in [sqlite3_total_changes()] before and after each "merge=X,Y"
  command and stopping the loop when the difference drops below two.

<tcl>hd_fragment *fts4automergecmd {FTS4 "automerge" command} \
                                   {"automerge" command}</tcl>
<h2 id=automerge">The "automerge=N" command</h2>

<p>
  The "automerge=N" command (where N is an integer between 0 and 15,
  inclusive) is used to configure an FTS3/4 tables "automerge" parameter,
  which controls automatic incremental inverted index merging. The default 
  automerge value for new tables is 0, meaning that automatic incremental 
  merging is completely disabled. If the value of the automerge parameter
  is modified using the "automerge=N" command, the new parameter value is
  stored persistently in the database and is used by all subsequently
  established database connections.

<p>
  Setting the automerge parameter to a non-zero value enables automatic
  incremental merging. This causes SQLite to do a small amount of inverted 
  index merging after every INSERT operation. The amount of merging 
  performed is designed so that the FTS3/4 table never reaches a point 
  where it has 16 segments at the same level and hence has to do a large 

  merge in order to complete an insert.  In other words, automatic 
  incremental merging is designed to prevent spiky INSERT performance.

<p>
  The downside of automatic incremental merging is that it makes
  every INSERT, UPDATE, and DELETE operation on an FTS3/4 table run
  a little slower, since extra time must be used to do the incremental
  merge.  For maximum performance, it is recommended that applications
  disable automatic incremental merge and instead use the 
  ["merge" command] in an idle process to keep the inverted indices
  well merged.  But if the structure of an application does not easily
  allow for idle processes, the use of automatic incremental merge is
  a very reasonable fallback solution.

<p>
  The actual value of the automerge parameter determines the number of
  index segments merged simultaneously by an automatic inverted index
  merge. If the value is set to N, the system waits until there are at
  least N segments on a single level before beginning to incrementally
  merge them. Setting a lower value of N causes segments to be merged more
  quickly, which may speed up full-text queries and, if the workload 
  contains UPDATE or DELETE operations as well as INSERTs, reduce the space
  on disk consumed by the full-text index. However, it also increases the
  amount of data written to disk.

<p>
  For general use in cases where the workload contains few UPDATE or DELETE
  operations, is 8. If the workload contains many UPDATE or DELETE commands, 
  or if query speed is a concern, it may be advantageous to reduce it to 2.

<p>
  For reasons of backwards compatibility, the "automerge=1" command sets
  the automerge parameter to 8, not 1 (a value of 1 would make no sense 
  anyway, as merging data from a single segment is a no-op).


<h1 id=tokenizer tags="tokenizer">Tokenizers</h1>

<p>
  An FTS tokenizer is a set of rules for extracting terms from a document 
  or basic FTS full-text query. 

2528
2529
2530
2531
2532
2533
2534



2535
2536
2537
2538






2539
2540
2541
2542
2543
2544
2545
    belongs to this segment b-tree. Or zero if the entire segment b-tree
    fits on the root node. If it exists, this node is always a leaf node.
  <tr><td>leaves_end_block <td>
    The blockid that corresponds to the leaf node with the largest blockid 
    that belongs to this segment b-tree. Or zero if the entire segment b-tree
    fits on the root node.
  <tr><td>end_block <td>



    The blockid that corresponds to the interior node with the largest 
    blockid that belongs to this segment b-tree.  Or zero if the entire segment
    b-tree fits on the root node. If it exists, this node is always an
    interior node.






  <tr><td>root             <td>
    Blob containing the root node of the segment b-tree.
</table>

<p>
  Apart from the root node, the nodes that make up a single segment b-tree are
  always stored using a contiguous sequence of blockids. Furthermore, the







>
>
>
|
|
|
|
>
>
>
>
>
>







2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
2569
2570
2571
2572
2573
2574
2575
2576
2577
    belongs to this segment b-tree. Or zero if the entire segment b-tree
    fits on the root node. If it exists, this node is always a leaf node.
  <tr><td>leaves_end_block <td>
    The blockid that corresponds to the leaf node with the largest blockid 
    that belongs to this segment b-tree. Or zero if the entire segment b-tree
    fits on the root node.
  <tr><td>end_block <td>
    This field may contain either an integer or a text field consisting of
    two integers separated by a space character (unicode codepoint 0x20).
<p style="margin-left:0;margin-right:0">
    The first, or only, integer is the blockid that corresponds to the interior
    node with the largest blockid that belongs to this segment b-tree. Or zero
    if the entire segment b-tree fits on the root node. If it exists, this node
    is always an interior node.
<p style="margin-left:0;margin-right:0;margin-bottom:0">
    The second integer, if it is present, is the aggregate size of all data
    stored on leaf pages in bytes. If the value is negative, then the segment
    is the output of an unfinished incremental-merge operation, and the
    absolute value is current size in bytes.

  <tr><td>root             <td>
    Blob containing the root node of the segment b-tree.
</table>

<p>
  Apart from the root node, the nodes that make up a single segment b-tree are
  always stored using a contiguous sequence of blockids. Furthermore, the