Documentation Source Text

Check-in [89c68b1e44]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos in FTS3 documentation.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 89c68b1e44b32e22a1ece2f3a546f5c1b23a5f6a
User & Date: drh 2012-05-15 16:46:36
Context
2012-05-22
02:38
Updates to the website for the 3.7.12.1 patch release. check-in: 80976ba114 user: drh tags: trunk
2012-05-15
16:46
Fix typos in FTS3 documentation. check-in: 89c68b1e44 user: drh tags: trunk
16:41
Typo fix on the homepage. check-in: 743b78dca8 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fts3.in.

1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
....
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
....
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
....
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
....
1924
1925
1926
1927
1928
1929
1930

1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
  document, "[search application tips]", contains an example of using the
  matchinfo() function efficiently.

<h1 id=fts4aux tags="fts4aux">Fts4aux - Direct Access to the Full-Text Index</h1>

<p>
  As of version 3.7.6, SQLite includes a new virtual table module called 
  "fts4aux", which can be used to inspect the full-text index of an exiting
  FTS table directly. Despite its name, fts4aux works just as well with FTS3
  tables as it does with FTS4 tables. Fts4aux tables are read-only. The only
  way to modify the contents of an fts4aux table is by modifying the
  contents of the associated FTS table. The fts4aux module is automatically
  included in all [compile fts|builds that include FTS].

<p>
................................................................................

<table striped=1>
  <tr><th>Column Name<th>Column Contents
  <tr><td>term<td> 
    Contains the text of the term for this row.
  <tr><td>col<td> 
    This column may contain either the text value '*' (i.e. a single 
    character, UTF codepoint 42) or an integer between 0 and N-1, where N is
    again the number of user-defined columns in the corresponding FTS table.

  <tr><td>documents<td>
    This column always contains an integer value greater than zero.
    <br><br>
    If the "col" column contains the value '*', then this column
    contains the number of rows of the FTS table that contain at least one
................................................................................
  <i>--</i>
  SELECT term, col, documents, occurrences FROM ft_terms;
</codeblock>

<p>
  In the example, the values in the "term" column are all lower case, 
  even though they were inserted into table "ft" in mixed case. This is because
  an fts3aux table contains the terms as extracted from the document text
  by the [tokenizer]. In this case, since table "ft" uses the 
  [tokenizer|simple tokenizer], this means all terms have been folded to
  lower case. Also, there is (for example) no row with column "term"
  set to "apple" and column "col" set to 1. Since there are no instances
  of the term "apple" in column 1, no row is present in the fts4aux table.

<p>
................................................................................
  <tr><th>Option<th>Interpretation
  <tr><td>compress<td>
    ^The compress option is used to specify the compress function. ^It is an error to
    specify a compress function without also specifying an uncompress
    function. [fts4 compress option|See below] for details.

  <tr><td>content<td>
    ^The content allows the text being indexed to
    stored in a separate table distinct from the FTS4 table, or 
    or even outside of SQLite.

  <tr><td>languageid<td>
    ^The languageid option causes the FTS4 table to have an additional hidden
    integer column that identifies the language of the text contained in
    each row.  The use of the languageid option allows the same FTS4 table
    to hold text in multiple languages or scripts, each with different tokenizer
................................................................................
  tokenizer, "simple", is used. The simple tokenizer extracts tokens from
  a document or basic FTS full-text query according to the following 
  rules:

<ul>
  <li><p> A term is a contiguous sequence of eligible characters, where 
    eligible characters are all alphanumeric characters and all characters with

    UTF codepoints greater than or equal to 128. All other characters are
    discarded when splitting a document into terms. Their only contribution is
    to separate adjacent terms.

  <li><p> All uppercase characters within the ASCII range (UTF codepoints less 
    than 128), are transformed to their lowercase equivalents as part of the
    tokenization process. Thus, full-text queries are case-insensitive when
    using the simple tokenizer.
</ul>

<p>
  For example, when a document containing the text "Right now, they're very
  frustrated.", the terms extracted from the document and added to the 
  full-text index are, in order, "right now they re very frustrated". Such
  a document would match a full-text query such as "MATCH 'Frustrated'", 







|







 







|







 







|







 







|
|







 







>
|



|
|
|
|







1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
....
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
....
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
....
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
....
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
  document, "[search application tips]", contains an example of using the
  matchinfo() function efficiently.

<h1 id=fts4aux tags="fts4aux">Fts4aux - Direct Access to the Full-Text Index</h1>

<p>
  As of version 3.7.6, SQLite includes a new virtual table module called 
  "fts4aux", which can be used to inspect the full-text index of an existing
  FTS table directly. Despite its name, fts4aux works just as well with FTS3
  tables as it does with FTS4 tables. Fts4aux tables are read-only. The only
  way to modify the contents of an fts4aux table is by modifying the
  contents of the associated FTS table. The fts4aux module is automatically
  included in all [compile fts|builds that include FTS].

<p>
................................................................................

<table striped=1>
  <tr><th>Column Name<th>Column Contents
  <tr><td>term<td> 
    Contains the text of the term for this row.
  <tr><td>col<td> 
    This column may contain either the text value '*' (i.e. a single 
    character, U+002a) or an integer between 0 and N-1, where N is
    again the number of user-defined columns in the corresponding FTS table.

  <tr><td>documents<td>
    This column always contains an integer value greater than zero.
    <br><br>
    If the "col" column contains the value '*', then this column
    contains the number of rows of the FTS table that contain at least one
................................................................................
  <i>--</i>
  SELECT term, col, documents, occurrences FROM ft_terms;
</codeblock>

<p>
  In the example, the values in the "term" column are all lower case, 
  even though they were inserted into table "ft" in mixed case. This is because
  an fts4aux table contains the terms as extracted from the document text
  by the [tokenizer]. In this case, since table "ft" uses the 
  [tokenizer|simple tokenizer], this means all terms have been folded to
  lower case. Also, there is (for example) no row with column "term"
  set to "apple" and column "col" set to 1. Since there are no instances
  of the term "apple" in column 1, no row is present in the fts4aux table.

<p>
................................................................................
  <tr><th>Option<th>Interpretation
  <tr><td>compress<td>
    ^The compress option is used to specify the compress function. ^It is an error to
    specify a compress function without also specifying an uncompress
    function. [fts4 compress option|See below] for details.

  <tr><td>content<td>
    ^The content allows the text being indexed to be
    stored in a separate table distinct from the FTS4 table,
    or even outside of SQLite.

  <tr><td>languageid<td>
    ^The languageid option causes the FTS4 table to have an additional hidden
    integer column that identifies the language of the text contained in
    each row.  The use of the languageid option allows the same FTS4 table
    to hold text in multiple languages or scripts, each with different tokenizer
................................................................................
  tokenizer, "simple", is used. The simple tokenizer extracts tokens from
  a document or basic FTS full-text query according to the following 
  rules:

<ul>
  <li><p> A term is a contiguous sequence of eligible characters, where 
    eligible characters are all alphanumeric characters and all characters with
    Unicode codepoint values greater than or equal to 128.
    All other characters are
    discarded when splitting a document into terms. Their only contribution is
    to separate adjacent terms.

  <li><p> All uppercase characters within the ASCII range (Unicode codepoints
    less than 128), are transformed to their lowercase equivalents as part
    of the tokenization process. Thus, full-text queries are
    case-insensitive when using the simple tokenizer.
</ul>

<p>
  For example, when a document containing the text "Right now, they're very
  frustrated.", the terms extracted from the document and added to the 
  full-text index are, in order, "right now they re very frustrated". Such
  a document would match a full-text query such as "MATCH 'Frustrated'",