Documentation Source Text

Check-in [08f6990b8a]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add documentation for the fts5 detail= option.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 08f6990b8a541d1086cc72cf5637a5a7e8edba87
User & Date: dan 2016-01-13 20:52:51
Context
2016-01-14
15:48
Merge the 3.10.1 documentation changes. Update the change log for 3.11.0. check-in: e9bf814a4d user: drh tags: trunk
2016-01-13
20:52
Add documentation for the fts5 detail= option. check-in: 08f6990b8a user: dan tags: trunk
2016-01-12
13:44
Merge changes from 3.10.0. check-in: 7d40b68595 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fts5.in.

400
401
402
403
404
405
406



407
408
409
410
411
412
413
...
750
751
752
753
754
755
756





























































757
758
759
760
761
762
763
  <li> The "content" option, used to make the FTS5 table an 
       [FTS5 content option | external content or contentless table].
  <li> The "content_rowid" option, used to set the rowid field of an 
       [FTS5 external content tables | external content table].
  <li> The [FTS5 columnsize option | "columnsize" option], used to configure
       whether or not the size in tokens of each value in the FTS5 table is
       stored separately within the database.



</ul>

<h2 tags="unindexed">The UNINDEXED column option</h2>

<p>The contents of columns qualified with the UNINDEXED column option are not
added to the FTS index. This means that for the purposes of MATCH queries and
[FTS5 auxiliary functions], the column contains no matchable tokens. 
................................................................................

<p> The name of the table in which the xColumnSize values are stored
(unless columnsize=0 is specified) is "&lt;name&gt;_docsize", where 
&lt;name&gt; is the name of the FTS5 table itself. The 
<a href=https://www.sqlite.org/download.html>sqlite3_analyzer</a>
tool may be used on an existing database in order to determine how much
space might be saved by recreating an FTS5 table using columnsize=0.






























































<h1 tags="FTS5 auxiliary functions"> Auxiliary Functions </h1>

<p> Auxiliary functions are similar to [corefunc | SQL scalar functions],
except that they may only be used within full-text queries (those that use
the MATCH operator) on an FTS5 table. Their results are calculated based not
only on the arguments passed to them, but also on the current match and 







>
>
>







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
...
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
  <li> The "content" option, used to make the FTS5 table an 
       [FTS5 content option | external content or contentless table].
  <li> The "content_rowid" option, used to set the rowid field of an 
       [FTS5 external content tables | external content table].
  <li> The [FTS5 columnsize option | "columnsize" option], used to configure
       whether or not the size in tokens of each value in the FTS5 table is
       stored separately within the database.
  <li> The [FTS5 detail option | "detail" option]. This option may be used 
       to reduce the size of the FTS index on disk by omitting some information
       from it.  
</ul>

<h2 tags="unindexed">The UNINDEXED column option</h2>

<p>The contents of columns qualified with the UNINDEXED column option are not
added to the FTS index. This means that for the purposes of MATCH queries and
[FTS5 auxiliary functions], the column contains no matchable tokens. 
................................................................................

<p> The name of the table in which the xColumnSize values are stored
(unless columnsize=0 is specified) is "&lt;name&gt;_docsize", where 
&lt;name&gt; is the name of the FTS5 table itself. The 
<a href=https://www.sqlite.org/download.html>sqlite3_analyzer</a>
tool may be used on an existing database in order to determine how much
space might be saved by recreating an FTS5 table using columnsize=0.

<h2 tags="FTS5 detail option">The Detail Option</h2>

<p> For each term in a document, the FTS index maintained by FTS5 
stores the rowid of the document, the column number of the column that contains
the term and the offset of the term within the column value. The "detail"
option may be used to omit some of this information. This reduces the space
that the index consumes within the database file, but also reduces the
capability and efficiency of the system.

<p> The detail option may be set to "full" (the default value), "column" or
"none". For example:

<codeblock>
  <i>-- The following two lines are equivalent (because the default value</i>
  <i>-- of "detail" is "full". </i>
  CREATE VIRTUAL TABLE ft1 USING fts5(a, b, c);
  CREATE VIRTUAL TABLE ft1 USING fts5(a, b, c, detail=full);

  CREATE VIRTUAL TABLE ft2 USING fts5(a, b, c, detail=column);
  CREATE VIRTUAL TABLE ft3 USING fts5(a, b, c, detail=none);
</codeblock>

<p>If the detail option is set to <b>column</b>, then for each term the FTS
index records the rowid and column number only, omitting the term offset
information. This results in the following restrictions:

<ul>
  <li> NEAR queries are not available.
  <li> Phrase queries are not available.
  <li> Assuming the table is not also a 
  [FTS5 contentless tables | contentless table], the 
  <a href=#xInstCount>xInstCount</a>, <a href=#xInst>xInst</a>, 
  <a href=#xPhraseFirst>xPhraseFirst</a> and <a href=#xPhraseNext>xPhraseNext</a>
  are slower than usual. This is because instead of reading the required data
  directly from the FTS index they have to load and tokenize the document text 
  on demand.
  <li> If the table is also a contentless table, the xInstCount, xInst, 
  xPhraseFirst and xPhraseNext APIs behave as if the current row contains no
  phrase matches at all (i.e. xInstCount() returns 0).
</ul>
  
<p>If the detail option is set to <b>none</b>, then for each term the FTS
index records just the rowid is stored. Both column and offset information
are ommitted. As well as the restrictions itemized above for detail=column
mode, this imposes the following extra limitations:

<ul>
  <li> Column filter queries are not available.
  <li> Assuming the table is not also a contentless table, the 
  <a href=#xPhraseFirstColumn>xPhraseFirstColumn</a> and 
  <a href=#xPhraseNextColumn>xPhraseNextColumn</a> are slower than usual. 

  <li> If the table is also a contentless table, the xPhraseFirstColumn and
  xPhraseNextColumn APIs behave as if the current row contains no phrase
  matches at all (i.e. xPhraseFirstColumn() sets the iterator to EOF).
</ul>

<p> In one test that indexed a large set of emails (1636 MiB on disk), the FTS
index was 743 MiB on disk with detail=full, 340 MiB with detail=column and 134
MiB with detail=none.

<h1 tags="FTS5 auxiliary functions"> Auxiliary Functions </h1>

<p> Auxiliary functions are similar to [corefunc | SQL scalar functions],
except that they may only be used within full-text queries (those that use
the MATCH operator) on an FTS5 table. Their results are calculated based not
only on the arguments passed to them, but also on the current match and