Documentation Source Text: Check-in [b8865a0011]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Comment:	Updates to the FTS3 documentation to include additional discussion and links to FTS4 concepts.
Downloads:	Tarball \| ZIP archive
Timelines:	family \| ancestors \| descendants \| both \| trunk
Files:	files \| file ages \| folders
SHA1:	b8865a0011a76e624c57d2e61dcd9bbc9abcacbf
User & Date:	drh 2010-11-26 21:06:49.000

Context

2010-11-26
22:46		Update the download page to use a new, consistent file naming scheme. Download files end with a version number of the form WXXYYZZ where W is 3, XX is the major version, YY is the minor version, and ZZ is the patch number. (check-in: 8b77a71a8c user: drh tags: trunk)
21:06		Updates to the FTS3 documentation to include additional discussion and links to FTS4 concepts. (check-in: b8865a0011 user: drh tags: trunk)
18:23		Update documentation for fts3 matchinfo function. (check-in: 4deb70dd76 user: dan tags: trunk)

Changes

Changes to pages/changes.in.

Changes to pages/fts3.in.

Changes to pages/pragma.in.

Changes to pages/vtab.in.

︙			︙
317 318 319 320 321 322 323 324 325 326 327 328 329 330	The name of the table in this CREATE TABLE statement is ignored, as are all constraints. Only the column names and datatypes matter. The CREATE TABLE statement string need not to be held in persistent memory. The string can be deallocated and/or reused as soon as the [sqlite3_declare_vtab()] routine returns. <p>If a column datatype contains the special keyword "HIDDEN" (in any combination of upper and lower case letters) then that keyword it is omitted from the column datatype name and the column is marked as a hidden column internally. A hidden column differs from a normal column in three respects: <p>	> > > > > > > > > > > > > > > > > > > > >	317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351	The name of the table in this CREATE TABLE statement is ignored, as are all constraints. Only the column names and datatypes matter. The CREATE TABLE statement string need not to be held in persistent memory. The string can be deallocated and/or reused as soon as the [sqlite3_declare_vtab()] routine returns. <p>The xCreate method need not initialize the pModule, nRef, and zErrMsg fields of the [sqlite3_vtab] object. The SQLite core will take care of that chore. <p>The xCreate must should return [SQLITE_OK] if it is successful in creating the new virtual table, or [SQLITE_ERROR] if it is not successful. If not successful, the [sqlite3_vtab] structure must not be allocated. An error message may optionally be returned in *pzErr if unsuccessful. Space to hold the error message string must be allocated using an SQLite memory allocation function like [sqlite3_malloc()] or [sqlite3_mprintf()] as the SQLite core will attempt to free the space using [sqlite3_free()] after the error has been reported up to the application. <p>The xCreate method is required for every virtual table implementation, though the xCreate and [xConnect] pointers of the [sqlite3_module] object may point to the same function the virtual table does not need to initialize backing store. <tcl>hd_fragment hiddencol {hidden column}</tcl> <h4>2.1.1 Hidden columns in virtual tables</h4> <p>If a column datatype contains the special keyword "HIDDEN" (in any combination of upper and lower case letters) then that keyword it is omitted from the column datatype name and the column is marked as a hidden column internally. A hidden column differs from a normal column in three respects: <p>
︙			︙
342 343 344 345 346 347 348 ~~349 350 351 352 353 354 355 356 357 358 359 360 361~~ 362 ~~363 364 365 366~~ 367 368 369 370 371 372 373	<blockquote><pre> CREATE TABLE x(a HIDDEN VARCHAR(12), b INTEGER, c INTEGER Hidden); </pre></blockquote> <p>Then the virtual table would be created with two hidden columns, and with datatypes of "VARCHAR(12)" and "INTEGER". <p>The ~~xCreat~~e ~~method n~~ee~~d not~~ in~~itialize~~ the ~~pMod~~u~~le, nRef, and zErrMsg~~ ~~fields of the [sqlite3_v~~tab] o~~bject. T~~he ~~SQLite co~~re ~~wil~~l ta~~ke care of~~ ~~that chore.~~ ~~<p>The xCreate must should return [SQLITE_OK] if it is successful in~~ ~~creating the new~~ virtual table, or ~~[SQLITE_ERROR] if it is not successful~~. ~~If not successful, the [sqlite3_vtab] structure must not be allocated.~~ ~~An error message may optionally be returned in pzErr if unsuccessful.~~ ~~Space to hold the error message string must be allocated using~~ ~~an SQLite memory allocation function like~~ ~~[sqlite3_malloc()] or [sqlite3_mprintf()] as the SQLite core will~~ ~~attempt to free the space using [sqlite3_free()] after the error has~~ ~~been reported up to the application.~~ ~~<p>The xCreate method is required for every virtual table implementation,~~ ~~though the xCreate and [xConnect] pointers of the [sqlite3_module] object~~ ~~may point to the same function the virtual table does not need to initialize~~ ~~backing store.~~ <tcl>############################################################# xConnect hd_fragment xconnect {sqlite3_module.xConnect} {xConnect}</tcl> <h3>2.2 The xConnect Method</h3> <blockquote><pre> int (xConnect)(sqlite3, void pAux,	\| \| < \| < \| < < < < < < < < < < <	363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381	<blockquote><pre> CREATE TABLE x(a HIDDEN VARCHAR(12), b INTEGER, c INTEGER Hidden); </pre></blockquote> <p>Then the virtual table would be created with two hidden columns, and with datatypes of "VARCHAR(12)" and "INTEGER". <p>An example use of hidden columns can be seen in the [FTS3] virtual table implementation, where every FTS virtual table contains an [FTS hidden column] that is used to pass information from the virtual table into [FTS auxiliary functions] and to the [FTS MATCH] operator. <tcl>############################################################# xConnect hd_fragment xconnect {sqlite3_module.xConnect} {xConnect}</tcl> <h3>2.2 The xConnect Method</h3> <blockquote><pre> int (xConnect)(sqlite3, void *pAux,
︙			︙

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ~~27 28~~ 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43	<tcl>hd_keywords fts3 FTS3 {full-text search}</tcl> ~~<title>SQLite FTS3 Extension</title>~~ <table_of_contents> <h2 style="margin-left:1.0em" notoc> Overview</h2> <p> ~~FTS3 is an SQLite virtual table module that allows users to perform~~ full-text searches on a set of documents. The most common (and effective) way to describe full-text searches is "what Google, Yahoo and Altavista do with documents placed on the World Wide Web". Users input a term, or series of terms, perhaps connected by a binary operator or grouped together into a phrase, and the full-text query system finds the set of documents that best matches those terms considering the operators and groupings the user has ~~specified. This doc~~ument~~ describes the deployment and usage of FTS3.~~ <p> Portions of the original FTS3 code were contributed to the SQLite project by Scott Hess of <a href="http://www.google.com">Google</a>. It is now developed and maintained as part of SQLite. ~~<h1>Introduction to FTS3</h1>~~ <p> ~~The FTS3 extension module allows users to create special tables with a built-in full-text index (hereafter "FTS3 tables"). The full-text index~~ allows the user to efficiently query the database for all rows that contain ~~one or more ~~instances specified word~~ (hereafter a "token", even if the table~~ contains many large documents. <p> For example, if each of the 517430 documents in the "<a href="http://www.cs.cmu.edu/~enron/">Enron E-Mail Dataset</a>" ~~is inserted into both the FTS3 table and the ordinary SQLite table~~ created using the following SQL script: <codeblock> CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT); / FTS3 table / CREATE TABLE enrondata2(content TEXT); / Ordinary table */ </codeblock>	\| \| \| > > \| \| \| \| \|	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45	<tcl>hd_keywords fts3 FTS3 {full-text search}</tcl> <title>SQLite FTS3 and FTS4 Extensions</title> <table_of_contents> <h2 style="margin-left:1.0em" notoc> Overview</h2> <p> FTS3 and FTS4 are an SQLite virtual table modules that allows users to perform full-text searches on a set of documents. The most common (and effective) way to describe full-text searches is "what Google, Yahoo and Altavista do with documents placed on the World Wide Web". Users input a term, or series of terms, perhaps connected by a binary operator or grouped together into a phrase, and the full-text query system finds the set of documents that best matches those terms considering the operators and groupings the user has specified. This article describes the deployment and usage of FTS3 and FTS4. <p> FTS1 and FTS2 are obsolete full-text search modules for SQLite. There are known issues with this older modules and their use should be avoided. Portions of the original FTS3 code were contributed to the SQLite project by Scott Hess of <a href="http://www.google.com">Google</a>. It is now developed and maintained as part of SQLite. <h1>Introduction to FTS3 and FTS4</h1> <p> The FTS3 and FTS4 extension modules allows users to create special tables with a built-in full-text index (hereafter "FTS tables"). The full-text index allows the user to efficiently query the database for all rows that contain one or more words (hereafter "tokens"), even if the table contains many large documents. <p> For example, if each of the 517430 documents in the "<a href="http://www.cs.cmu.edu/~enron/">Enron E-Mail Dataset</a>" is inserted into both the FTS table and the ordinary SQLite table created using the following SQL script: <codeblock> CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT); / FTS3 table / CREATE TABLE enrondata2(content TEXT); / Ordinary table */ </codeblock>
︙			︙
60 61 62 63 64 65 66 67 68 69 70 71 ~~72 73~~ 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 ~~90 91~~ 92 93 94 95 96 97 98 99 100 101 102 103 ~~104~~ 105 106 107 108 109 110 111 112 113 114 115 116 ~~117~~ 118 119 120 ~~121~~ 122 ~~123~~ 124 ~~125~~ 126 127 128 129 130 ~~131~~ 132 133 134 135 ~~136~~ 137 138 139 ~~140~~ 141 142 ~~143~~ 144 145 146 147 148 ~~149~~ 150 ~~151~~ 152 ~~153~~ 154 155 156 157 158 159 160 161 162 163 164 ~~165~~ 166 167 ~~168 169~~ 170 171 172 173 ~~174~~ 175 176 177 178 179 180 181 182 183 184 185 ~~186~~ 187 188 189 190 ~~191~~ 192 193 194 195 196 197 198 ~~199~~ 200 ~~201 202~~ 203 204 205 206 ~~207~~ 208 209 210 ~~211~~ 212 213 214 215 216 ~~217~~ 218 219 220 221 ~~222~~ 223 ~~224~~ 225 226 227 ~~228~~ 229 230 ~~231~~ 232 233 234 235 236 237 ~~238~~ 239 240 241 242 ~~243~~ 244 245 246 247 248 249 250 ~~251~~ 252 253 254 255 256 ~~257~~ 258 259 260 261 262 263 264	selects only those rows that contain "linux" as a discrete token. Both searches are case-insensitive. The FTS3 table consumes around 2006 MB on disk compared to just 1453 MB for the ordinary table. Using the same hardware configuration used to perform the SELECT queries above, the FTS3 table took just under 31 minutes to populate, versus 25 for the ordinary table. ~~<h2>Creating and Destroying FTS3 Tables</h2>~~ <p> ~~Like other virtual table types, new FTS3 tables are created using a~~ [CREATE VIRTUAL TABLE] statement. The module name, which follows ~~the USING keyword, is ~~"fts3~~". The virtual table module arguments may be left empty, in which case an FTS3 table with a single user-defined~~ column named "content" is created. Alternatively, the module arguments may be passed a list of comma separated column names. <p> ~~If column names are explicitly provided for the FTS3 table as part of~~ the CREATE VIRTUAL TABLE statement, then a datatype name may be optionally specified for each column. However, this is pure syntactic sugar, the ~~supplied typenames are not used by FTS3 or the SQLite core for any~~ purpose. The same applies to any constraints specified along with an ~~FTS3 column name - they are parsed but not used or recorded by the system~~ in any way. <codeblock> ~~<i>-- Create an FTS3 table named "data" with one column - "content":</i>~~ CREATE VIRTUAL TABLE data USING fts3(); ~~<i>-- Create an FTS3 table named "pages" with three columns:</i> CREATE VIRTUAL TABLE pages USING fts3(title, keywords, body);~~ ~~<i>-- Create an FTS3 table named "mail" with two columns. Datatypes~~ -- and column constraints are specified along with each column. These ~~-- are completely ignored by FTS3 and SQLite. </i>~~ CREATE VIRTUAL TABLE mail USING fts3( subject VARCHAR(256) NOT NULL, body TEXT CHECK(length(body)<10240) ); </codeblock> <p> As well as a list of columns, the module arguments passed to a CREATE ~~VIRTUAL TABLE statement used to create an FTS3 table may be used to specify~~ a [tokenizer]. This is done by specifying a string of the form "tokenize=<tokenizer name> <tokenizer args>" in place of a column name, where <tokenizer name> is the name of the tokenizer to use and <tokenizer args> is an optional list of whitespace separated qualifiers to pass to the tokenizer implementation. A tokenizer specification may be placed anywhere in the column list, but at most one tokenizer declaration is allowed for each CREATE VIRTUAL TABLE statement. The second and subsequent tokenizer declaration are interpreted as column names. [tokenizer\|See below] for a detailed description of using (and, if necessary, implementing) a tokenizer. <codeblock> ~~<i>-- Create an FTS3 table named "papers" with two columns that uses</i>~~ <i>-- the tokenizer "porter".</i> CREATE VIRTUAL TABLE papers USING fts3(author, document, tokenize=porter); ~~<i>-- Create an FTS3 table with a single column - "content" - that uses</i>~~ <i>-- the "simple" tokenizer.</i> ~~CREATE VIRTUAL TABLE data USING fts3(tokenize=simple);~~ ~~<i>-- Create an FTS3 table with two columns that uses the "icu" tokenizer.</i>~~ <i>-- The qualifier "en_AU" is passed to the tokenizer implementation</i> CREATE VIRTUAL TABLE names USING fts3(a, b, tokenize=icu en_AU); </codeblock> <p> ~~FTS3 tables may be dropped from the database using an ordinary [DROP TABLE]~~ statement. For example: <codeblock> <i>-- Create, then immediately drop, an FTS3 table.</i> ~~CREATE VIRTUAL TABLE data USING fts3();~~ DROP TABLE data; </codeblock> ~~<h2>Populating FTS3 Tables</h2>~~ <p> ~~FTS3 tables are populated using [INSERT], [UPDATE] and [DELETE]~~ statements in the same way as ordinary SQLite tables are. <p> As well as the columns named by the user (or the "content" column if no module arguments where specified as part of the [CREATE VIRTUAL TABLE] ~~statement), each FTS3 table has a "rowid" column. The rowid of an FTS3~~ table behaves in the same way as the rowid column of an ordinary SQLite ~~table, except that the values stored in the rowid column of an FTS3 table~~ remain unchanged if the database is rebuilt using the [VACUUM] command. ~~For FTS3 tables, "docid" is allowed as an alias along with the usual "rowid",~~ "oid" and "_oid_" identifiers. Attempting to insert or update a row with a docid value that already exists in the table is an error, just as it would be with an ordinary SQLite table. <p> There is one other subtle difference between "docid" and the normal SQLite aliases for the rowid column. Normally, if an INSERT or UPDATE statement assigns discrete values to two or more aliases of the rowid column, SQLite writes the rightmost of such values specified in the INSERT or UPDATE statement to the database. However, assigning a non-NULL value to both the "docid" and one or more of the SQLite rowid aliases when inserting or ~~updating an FTS3 table is considered an error. See below for an example.~~ <codeblock> ~~<i>-- Create an FTS3 table</i> CREATE VIRTUAL TABLE pages USING fts3(title, body);~~ <i>-- Insert a row with a specific docid value.</i> INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...'); ~~<i>-- Insert a row and allow FTS3 to assign a docid value using the same algorithm as</i>~~ <i>-- SQLite uses for ordinary tables. In this case the new docid will be 54,</i> <i>-- one greater than the largest docid currently present in the table.</i> INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...'); <i>-- Change the title of the row just inserted.</i> UPDATE pages SET title = 'Download SQLite' WHERE rowid = 54; <i>-- Delete the entire table contents.</i> DELETE FROM pages; <i>-- The following is an error. It is not possible to assign non-NULL values to both</i> ~~<i>-- the rowid and docid columns of an FTS3 table.</i>~~ INSERT INTO pages(rowid, docid, title, body) VALUES(1, 2, 'A title', 'A document body'); </codeblock> <p> ~~To support full-text queries, FTS3 maintains an inverted index that maps~~ from each unique term or word that appears in the dataset to the locations in which it appears within the table contents. For the curious, a complete description of the [segment btree\|data structure] used to store this index within the database file is described below. A feature of this data structure is that at any time the database may contain not one index b-tree, but several different b-trees that are incrementally merged as rows are inserted, updated and deleted. This technique improves ~~performance when writing to an FTS3 table, but causes some overhead for~~ full-text queries that use the index. Executing an SQL statement of the ~~form "INSERT INTO <fts3-table>(<fts3-table>) VALUES('optimize')" causes FTS3 to merge all existing index b-trees into a single large~~ b-tree containing the entire index. This can be an expensive operation, but may speed up future queries. <p> ~~For example, to optimize the full-text index for an FTS3 table named~~ "docs": <codeblock> ~~<i>-- Optimize the internal structure of FTS3 table "docs".</i>~~ INSERT INTO docs(docs) VALUES('optimize'); </codeblock> <p> The statement above may appear syntactically incorrect to some. Refer to ~~the section describing the [simple fts3 queries] for an explanation.~~ <p> There is another, deprecated, method for invoking the optimize operation using a SELECT statement. New code should use statements ~~similar to the INSERT above to optimize FTS3 structures.~~ ~~<h2 tags="simple fts3 queries">Simple FTS3 Queries</h2>~~ <p> As for all other SQLite tables, virtual or otherwise, data is retrieved ~~from FTS3 tables using a [SELECT] statement.~~ <p> ~~FTS3 tables can be queried efficiently using SELECT statements of two~~ different forms: <ul> <li><p> <b>Query by rowid</b>. If the WHERE clause of the SELECT statement contains a sub-clause of the form "rowid = ?", where ? is an SQL expression, ~~FTS3 is able to retrieve the requested row directly using the equivalent~~ of an SQLite [INTEGER PRIMARY KEY] index. <li><p> <b>Full-text query</b>. If the WHERE clause of the SELECT statement contains ~~a sub-clause of the form "<column> MATCH ?", FTS3 is able to use~~ the built-in full-text index to restrict the search to those documents that match the full-text query string specified as the right-hand operand of the MATCH clause. </ul> <p> If neither of the two query strategies enumerated above can be used, all ~~queries on FTS3 tables are implemented using a linear scan of the entire~~ table. If the table contains large amounts of data, this may be an impractically approach (the first example on this page shows that a linear scan of 1.5 GB of data takes around 30 seconds using a modern PC). <codeblock> ~~<i>-- The examples in this block assume the following FTS3 table:</i>~~ CREATE VIRTUAL TABLE mail USING fts3(subject, body); SELECT * FROM mail WHERE rowid = 15; <i>-- Fast. Rowid lookup.</i> SELECT * FROM mail WHERE body MATCH 'sqlite'; <i>-- Fast. Full-text query.</i> SELECT * FROM mail WHERE mail MATCH 'search'; <i>-- Fast. Full-text query.</i> SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20; <i>-- Slow. Linear scan.</i> SELECT * FROM mail WHERE subject = 'database'; <i>-- Slow. Linear scan.</i>	> > > > > > > > > > > > > > > > > > > > > > > > > > > \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293	selects only those rows that contain "linux" as a discrete token. Both searches are case-insensitive. The FTS3 table consumes around 2006 MB on disk compared to just 1453 MB for the ordinary table. Using the same hardware configuration used to perform the SELECT queries above, the FTS3 table took just under 31 minutes to populate, versus 25 for the ordinary table. <tcl>hd_fragment fts4 FTS4</tcl> <h2>Differences between FTS3 and FTS4</h2> <p> FTS3 and FTS4 are nearly identical. They share most of their code in common, and their interfaces are the same. The only difference is that FTS4 stores some additional information about the document collection in two of new [FTS shadow tables]. This additional information allows FTS4 to use certain query performance optimizations that FTS3 cannot use. And the added information permits some additional useful output options in the [matchinfo()] function. <p> FTS4 is an enhancement to FTS3. FTS3 has been available since SQLite [version 3.5.0] in 2007-09-04. The enhancements for FTS4 were added with SQLite [version 3.7.4] on 2010-12-08. <p> Which module, FTS3 or FTS4, should you use in your application? FTS4 can sometimes is significantly faster than FTS3, even orders of magnitude faster, depending on the query, though in the common case the performance of the two modules is similar. FTS4 also offers the enhanced [matchinfo()] outputs which can be useful in ranking the results of a [FTS MATCH\|MATCH] operation. On the other hand, because FTS4 stores additional information, FTS4 requires a little more disk space than FTS3, though only a percent of two in most cases. <p> For newer applications, FTS4 is recommended; though if minimal disk usage or compatibility with older versions of SQLite are important, then FTS3 will usually serve just as well. <h2>Creating and Destroying FTS Tables</h2> <p> Like other virtual table types, new FTS tables are created using a [CREATE VIRTUAL TABLE] statement. The module name, which follows the USING keyword, is either "fts3" or "fts4". The virtual table module arguments may be left empty, in which case an FTS table with a single user-defined column named "content" is created. Alternatively, the module arguments may be passed a list of comma separated column names. <p> If column names are explicitly provided for the FTS table as part of the CREATE VIRTUAL TABLE statement, then a datatype name may be optionally specified for each column. However, this is pure syntactic sugar, the supplied typenames are not used by FTS or the SQLite core for any purpose. The same applies to any constraints specified along with an FTS column name - they are parsed but not used or recorded by the system in any way. <codeblock> <i>-- Create an FTS table named "data" with one column - "content":</i> CREATE VIRTUAL TABLE data USING fts3(); <i>-- Create an FTS table named "pages" with three columns:</i> CREATE VIRTUAL TABLE pages USING fts4(title, keywords, body); <i>-- Create an FTS table named "mail" with two columns. Datatypes -- and column constraints are specified along with each column. These -- are completely ignored by FTS and SQLite. </i> CREATE VIRTUAL TABLE mail USING fts3( subject VARCHAR(256) NOT NULL, body TEXT CHECK(length(body)<10240) ); </codeblock> <p> As well as a list of columns, the module arguments passed to a CREATE VIRTUAL TABLE statement used to create an FTS table may be used to specify a [tokenizer]. This is done by specifying a string of the form "tokenize=<tokenizer name> <tokenizer args>" in place of a column name, where <tokenizer name> is the name of the tokenizer to use and <tokenizer args> is an optional list of whitespace separated qualifiers to pass to the tokenizer implementation. A tokenizer specification may be placed anywhere in the column list, but at most one tokenizer declaration is allowed for each CREATE VIRTUAL TABLE statement. The second and subsequent tokenizer declaration are interpreted as column names. [tokenizer\|See below] for a detailed description of using (and, if necessary, implementing) a tokenizer. <codeblock> <i>-- Create an FTS table named "papers" with two columns that uses</i> <i>-- the tokenizer "porter".</i> CREATE VIRTUAL TABLE papers USING fts3(author, document, tokenize=porter); <i>-- Create an FTS table with a single column - "content" - that uses</i> <i>-- the "simple" tokenizer.</i> CREATE VIRTUAL TABLE data USING fts4(tokenize=simple); <i>-- Create an FTS table with two columns that uses the "icu" tokenizer.</i> <i>-- The qualifier "en_AU" is passed to the tokenizer implementation</i> CREATE VIRTUAL TABLE names USING fts3(a, b, tokenize=icu en_AU); </codeblock> <p> FTS tables may be dropped from the database using an ordinary [DROP TABLE] statement. For example: <codeblock> <i>-- Create, then immediately drop, an FTS3 table.</i> CREATE VIRTUAL TABLE data USING fts4(); DROP TABLE data; </codeblock> <h2>Populating FTS Tables</h2> <p> FTS tables are populated using [INSERT], [UPDATE] and [DELETE] statements in the same way as ordinary SQLite tables are. <p> As well as the columns named by the user (or the "content" column if no module arguments where specified as part of the [CREATE VIRTUAL TABLE] statement), each FTS table has a "rowid" column. The rowid of an FTS table behaves in the same way as the rowid column of an ordinary SQLite table, except that the values stored in the rowid column of an FTS table remain unchanged if the database is rebuilt using the [VACUUM] command. For FTS tables, "docid" is allowed as an alias along with the usual "rowid", "oid" and "_oid_" identifiers. Attempting to insert or update a row with a docid value that already exists in the table is an error, just as it would be with an ordinary SQLite table. <p> There is one other subtle difference between "docid" and the normal SQLite aliases for the rowid column. Normally, if an INSERT or UPDATE statement assigns discrete values to two or more aliases of the rowid column, SQLite writes the rightmost of such values specified in the INSERT or UPDATE statement to the database. However, assigning a non-NULL value to both the "docid" and one or more of the SQLite rowid aliases when inserting or updating an FTS table is considered an error. See below for an example. <codeblock> <i>-- Create an FTS table</i> CREATE VIRTUAL TABLE pages USING fts4(title, body); <i>-- Insert a row with a specific docid value.</i> INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...'); <i>-- Insert a row and allow FTS to assign a docid value using the same algorithm as</i> <i>-- SQLite uses for ordinary tables. In this case the new docid will be 54,</i> <i>-- one greater than the largest docid currently present in the table.</i> INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...'); <i>-- Change the title of the row just inserted.</i> UPDATE pages SET title = 'Download SQLite' WHERE rowid = 54; <i>-- Delete the entire table contents.</i> DELETE FROM pages; <i>-- The following is an error. It is not possible to assign non-NULL values to both</i> <i>-- the rowid and docid columns of an FTS table.</i> INSERT INTO pages(rowid, docid, title, body) VALUES(1, 2, 'A title', 'A document body'); </codeblock> <p> To support full-text queries, FTS maintains an inverted index that maps from each unique term or word that appears in the dataset to the locations in which it appears within the table contents. For the curious, a complete description of the [segment btree\|data structure] used to store this index within the database file is described below. A feature of this data structure is that at any time the database may contain not one index b-tree, but several different b-trees that are incrementally merged as rows are inserted, updated and deleted. This technique improves performance when writing to an FTS table, but causes some overhead for full-text queries that use the index. Executing an SQL statement of the form "INSERT INTO <fts-table>(<fts-table>) VALUES('optimize')" causes FTS to merge all existing index b-trees into a single large b-tree containing the entire index. This can be an expensive operation, but may speed up future queries. <p> For example, to optimize the full-text index for an FTS table named "docs": <codeblock> <i>-- Optimize the internal structure of FTS table "docs".</i> INSERT INTO docs(docs) VALUES('optimize'); </codeblock> <p> The statement above may appear syntactically incorrect to some. Refer to the section describing the [simple fts queries] for an explanation. <p> There is another, deprecated, method for invoking the optimize operation using a SELECT statement. New code should use statements similar to the INSERT above to optimize FTS structures. <h2 tags="simple fts queries">Simple FTS Queries</h2> <p> As for all other SQLite tables, virtual or otherwise, data is retrieved from FTS tables using a [SELECT] statement. <p> FTS tables can be queried efficiently using SELECT statements of two different forms: <ul> <li><p> <b>Query by rowid</b>. If the WHERE clause of the SELECT statement contains a sub-clause of the form "rowid = ?", where ? is an SQL expression, FTS is able to retrieve the requested row directly using the equivalent of an SQLite [INTEGER PRIMARY KEY] index. <li><p> <b>Full-text query</b>. If the WHERE clause of the SELECT statement contains a sub-clause of the form "<column> MATCH ?", FTS is able to use the built-in full-text index to restrict the search to those documents that match the full-text query string specified as the right-hand operand of the MATCH clause. </ul> <p> If neither of the two query strategies enumerated above can be used, all queries on FTS tables are implemented using a linear scan of the entire table. If the table contains large amounts of data, this may be an impractically approach (the first example on this page shows that a linear scan of 1.5 GB of data takes around 30 seconds using a modern PC). <codeblock> <i>-- The examples in this block assume the following FTS table:</i> CREATE VIRTUAL TABLE mail USING fts3(subject, body); SELECT * FROM mail WHERE rowid = 15; <i>-- Fast. Rowid lookup.</i> SELECT * FROM mail WHERE body MATCH 'sqlite'; <i>-- Fast. Full-text query.</i> SELECT * FROM mail WHERE mail MATCH 'search'; <i>-- Fast. Full-text query.</i> SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20; <i>-- Slow. Linear scan.</i> SELECT * FROM mail WHERE subject = 'database'; <i>-- Slow. Linear scan.</i>
︙			︙
272 273 274 275 276 277 278 ~~279~~ 280 281 282 ~~283~~ 284 285 286 287 288 289 ~~290~~ 291 292 ~~293~~ 294 295 ~~296 297~~ 298 299 300 301 302 303 304	instances of the specified word ("sqlite", "search" or "database", depending on which example you look at). Specifying a single term as the right-hand operand of the MATCH operator results in the simplest (and most common) type of full-text query possible. However more complicated queries are possible, including phrase searches, term-prefix searches and searches for documents containing combinations of terms occuring within a defined proximity of each other. The various ways in which the full-text index may be queried are ~~[FTS3 MATCH\|described below].~~ <p> Normally, full-text queries are case-insensitive. However, this is ~~is dependent on the specific [tokenizer] used by the FTS3 table~~ being queried. Refer to the section on [tokenizer\|tokenizers] for details. <p> The paragraph above notes that a MATCH operator with a simple term as the right-hand operand evaluates to true for all documents that contain the specified term. In this context, the "document" may refer to either the ~~data stored in a single column of a row of an FTS3 table, or to the contents~~ of all columns in a single row, depending on the identifier used as the left-hand operand to the MATCH operator. If the identifier specified as ~~the left-hand operand of the MATCH operator is an FTS3 table column name,~~ then the document that the search term must be contained in is the value stored in the specified column. However, if the identifier is the name ~~of the FTS3 <i>table</i> itself, then the MATCH operator evaluates to true for each row of the FTS3 table for which any column contains the search~~ term. The following example demonstrates this: <codeblock> <i>-- Example schema</i> CREATE VIRTUAL TABLE mail USING fts3(subject, body); <i>-- Example table population</i>	\| \| \| \| \| \|	301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333	instances of the specified word ("sqlite", "search" or "database", depending on which example you look at). Specifying a single term as the right-hand operand of the MATCH operator results in the simplest (and most common) type of full-text query possible. However more complicated queries are possible, including phrase searches, term-prefix searches and searches for documents containing combinations of terms occuring within a defined proximity of each other. The various ways in which the full-text index may be queried are [FTS MATCH\|described below]. <p> Normally, full-text queries are case-insensitive. However, this is is dependent on the specific [tokenizer] used by the FTS table being queried. Refer to the section on [tokenizer\|tokenizers] for details. <p> The paragraph above notes that a MATCH operator with a simple term as the right-hand operand evaluates to true for all documents that contain the specified term. In this context, the "document" may refer to either the data stored in a single column of a row of an FTS table, or to the contents of all columns in a single row, depending on the identifier used as the left-hand operand to the MATCH operator. If the identifier specified as the left-hand operand of the MATCH operator is an FTS table column name, then the document that the search term must be contained in is the value stored in the specified column. However, if the identifier is the name of the FTS <i>table</i> itself, then the MATCH operator evaluates to true for each row of the FTS table for which any column contains the search term. The following example demonstrates this: <codeblock> <i>-- Example schema</i> CREATE VIRTUAL TABLE mail USING fts3(subject, body); <i>-- Example table population</i>
︙			︙
312 313 314 315 316 317 318 ~~319~~ 320 321 322 323 ~~324~~ 325 326 327 328 329 330 331 332 333 334 ~~335~~ 336 337 338 339 340 341 342 343 344 345 346 ~~347~~ 348 ~~349~~ 350 ~~351~~ 352 353 354 355 356 ~~357 358 359~~ 360 361 362 ~~363~~ 364 ~~365~~ 366 367 368 ~~369~~ 370 371 372 ~~373~~ 374 375 376 ~~377~~ 378 379 ~~380~~ ~~381~~ 382 ~~383 384 385~~ 386 387 388 389 390 ~~391~~ 392 393 ~~394 395~~ 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 ~~420 421~~ 422 423 ~~424 425 426 427~~ 428 429 430 ~~431 432~~ 433 434 435 436 437 438 439 440 ~~441~~ 442 443 ~~444~~ 445 446 447 ~~448 449~~ 450 451 452 ~~453~~ 454 455 456 ~~457~~ 458 459 460 ~~461 462~~ 463 464 465 466 467 468 469	SELECT * FROM mail WHERE mail MATCH 'software'; <i>-- Selects rows 1, 2 and 3</i> SELECT * FROM mail WHERE mail MATCH 'slow'; <i>-- Selects rows 1 and 3</i> </codeblock> <p> At first glance, the final two full-text queries in the example above seem to be syntactically incorrect, as there is a table name ("mail") used as ~~an SQL expression. The reason this is acceptable is that each FTS3 table~~ actually has a [sqlite3_declare_vtab\|HIDDEN] column with the same name as the table itself (in this case, "mail"). The value stored in this column is not meaningful to the application, but can be used as the left-hand operand to a MATCH operator. This special column may also be ~~passed as an argument to the [snippet()\|FTS3 auxiliary functions].~~ <p> The following example illustrates the above. The expressions "docs", "docs.docs" and "main.docs.docs" all refer to column "docs". However, the expression "main.docs" does not refer to any column. It could be used to refer to a table, but a table name is not allowed in the context in which it is used below. <codeblock> <i>-- Example schema</i> ~~CREATE VIRTUAL TABLE docs USING fts3(content);~~ <i>-- Example queries</i> SELECT * FROM docs WHERE docs MATCH 'sqlite'; <i>-- OK.</i> SELECT * FROM docs WHERE docs.docs MATCH 'sqlite'; <i>-- OK.</i> SELECT * FROM docs WHERE main.docs.docs MATCH 'sqlite'; <i>-- OK.</i> SELECT * FROM docs WHERE main.docs MATCH 'sqlite'; <i>-- Error.</i> </codeblock> <h2>Summary</h2> <p> ~~From the users point of view, FTS3 tables are similar to ordinary SQLite~~ tables in many ways. Data may be added to, modified within and removed ~~from FTS3 tables using the INSERT, UPDATE and DELETE commands just as~~ it may be with ordinary tables. Similarly, the SELECT command may be used ~~to query data. The following list summarizes the differences between FTS3~~ and ordinary tables: <ol> <li><p> As with all virtual table types, it is not possible to create indices or ~~triggers attached to FTS3 tables. Nor is it possible to use the ALTER TABLE command to add extra columns to FTS3 tables (although it is possible to use ALTER TABLE to rename an FTS3 table).~~ <li><p> Data-types specified as part of the "CREATE VIRTUAL TABLE" statement ~~used to create an FTS3 table are ignored completely. Instead of the~~ normal rules for applying type [affinity] to inserted values, all ~~values inserted into FTS3 table columns (except the special rowid~~ column) are converted to type TEXT before being stored. <li><p> ~~FTS3 tables permit the special alias "docid" to be used to refer to the~~ rowid column supported by all [virtual tables]. <li><p> ~~The [FTS3 MATCH] operator is supported for queries based on the built-in~~ full-text index. <li><p> ~~The ~~FTS3~~ auxiliary functions, [snippet~~\|snippet~~() ~~and~~ offsets()], are~~ available to support full-text queries. ~~<li><p>~~ ~~E~~ach~~ FTS3 table has a [~~sqlite3_~~de~~clare_vtab()\|HIDDEN~~ column] with the~~ same name as the table itself. The value contained in each row for the spe~~cial~~ column is only useful ~~when used on~~ the left-hand ~~side~~ of a [FTS3 MATCH\|MATCH] operator, or ~~when specified~~ as an argument to one of the [~~snippet\|FTS3~~ auxiliary functions]. </ol> ~~<h1 tags="compile fts3">Compiling and Enabling FTS3</h1>~~ <p> ~~Although FTS3 is d~~istribut~~ed ~~as part of~~ the SQLite source code, it is not enabled by default. To build SQLite with FTS3 functionality enabled, define~~ the preprocessor macro [SQLITE_ENABLE_FTS3] when compiling. New applications should also define the [SQLITE_ENABLE_FTS3_PARENTHESIS] macro to enable the [enhanced query syntax] (see below). Usually, this is done by adding the following two switches to the compiler command line: <codeblock> -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS </codeblock> <p> If using the amalgamation autoconf based build system, setting the CPPFLAGS environment variable while running the 'configure' script is an easy way to set these macros. For example, the following command: <codeblock> CPPFLAGS="-DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS" ./configure <configure options> </codeblock> <p> where <i><configure options></i> are those options normally passed to the configure script, if any. <p> ~~Because FTS3 ~~is a~~ virtual table, it ~~is in~~compatible ~~with the~~ [SQLITE_OMIT_VIRTUALTABLE] option.~~ <p> If an SQLite ~~build~~ does not include FTS3, then any attempt to prepare an SQL statement to create an FTS3 table or to drop or access an existing FTS3 table in any way will fail. The error message returned will be similar to "no such module: fts3". <p> If the C version of the <a href=http://site.icu-project.org/>ICU library</a> ~~is available, then FTS3 may also be compiled with the SQLITE_ENABLE_ICU pre-processor macro defined. Compiling with this macro enables an FTS3~~ [tokenizer] that uses the ICU library to split a document into terms (words) using the conventions for a specified language and locale. <codeblock> -DSQLITE_ENABLE_ICU </codeblock> ~~<h1 tags="FTS3 MATCH">Full-text Index Queries</h1>~~ <p> ~~The most useful thing about FTS3 tables is the queries that may be~~ performed using the built-in full-text index. Full-text queries are performed by specifying a clause of the form "<column> MATCH <full-text query expression>" to the WHERE ~~clause of a SELECT statement that reads data from an FTS3 table. [simple fts3 queries\|Simple FTS3 queries] that return all documents that~~ contain a given term are described above. In that discussion the right-hand operand of the MATCH operator was assumed to be a string consisting of a single term. This section describes the more complex query types supported ~~by FTS3 tables, and how they may be utilized by specifying a more~~ complex query expression as the right-hand operand of a MATCH operator. <p> ~~FTS3 tables support three basic query types:~~ <ul> <li><p><b>Token or token prefix queries</b>. ~~An FTS3 table may be queried for all documents that contain a specified term (the [simple fts3 queries\|simple case] described above), or for~~ all documents that contain a term with a specified prefix. As we have seen, the query expression for a specific term is simply the term itself. The query expression used to search for a term prefix is the prefix itself with a '*' character appended to it. For example: </ul> <codeblock>	\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| > \| \| \| \| \| \| \| > > > > > \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504	SELECT * FROM mail WHERE mail MATCH 'software'; <i>-- Selects rows 1, 2 and 3</i> SELECT * FROM mail WHERE mail MATCH 'slow'; <i>-- Selects rows 1 and 3</i> </codeblock> <p> At first glance, the final two full-text queries in the example above seem to be syntactically incorrect, as there is a table name ("mail") used as an SQL expression. The reason this is acceptable is that each FTS table actually has a [sqlite3_declare_vtab\|HIDDEN] column with the same name as the table itself (in this case, "mail"). The value stored in this column is not meaningful to the application, but can be used as the left-hand operand to a MATCH operator. This special column may also be passed as an argument to the [snippet()\|FTS auxiliary functions]. <p> The following example illustrates the above. The expressions "docs", "docs.docs" and "main.docs.docs" all refer to column "docs". However, the expression "main.docs" does not refer to any column. It could be used to refer to a table, but a table name is not allowed in the context in which it is used below. <codeblock> <i>-- Example schema</i> CREATE VIRTUAL TABLE docs USING fts4(content); <i>-- Example queries</i> SELECT * FROM docs WHERE docs MATCH 'sqlite'; <i>-- OK.</i> SELECT * FROM docs WHERE docs.docs MATCH 'sqlite'; <i>-- OK.</i> SELECT * FROM docs WHERE main.docs.docs MATCH 'sqlite'; <i>-- OK.</i> SELECT * FROM docs WHERE main.docs MATCH 'sqlite'; <i>-- Error.</i> </codeblock> <h2>Summary</h2> <p> From the users point of view, FTS tables are similar to ordinary SQLite tables in many ways. Data may be added to, modified within and removed from FTS tables using the INSERT, UPDATE and DELETE commands just as it may be with ordinary tables. Similarly, the SELECT command may be used to query data. The following list summarizes the differences between FTS and ordinary tables: <ol> <li><p> As with all virtual table types, it is not possible to create indices or triggers attached to FTS tables. Nor is it possible to use the ALTER TABLE command to add extra columns to FTS tables (although it is possible to use ALTER TABLE to rename an FTS table). <li><p> Data-types specified as part of the "CREATE VIRTUAL TABLE" statement used to create an FTS table are ignored completely. Instead of the normal rules for applying type [affinity] to inserted values, all values inserted into FTS table columns (except the special rowid column) are converted to type TEXT before being stored. <li><p> FTS tables permit the special alias "docid" to be used to refer to the rowid column supported by all [virtual tables]. <li><p> The [FTS MATCH] operator is supported for queries based on the built-in full-text index. <li><p> The [FTS auxiliary functions], [snippet()], [offsets()], and [matchinfo()] are available to support full-text queries. <li><p> <tcl>hd_fragment hiddencol {FTS hidden column}</tcl> Every FTS table has a [hidden column] with the same name as the table itself. The value contained in each row for the hidden column is a blob that is only useful as the left operand of a [FTS MATCH\|MATCH] operator, or as the left-most argument to one of the [FTS auxiliary functions]. </ol> <h1 tags="compile fts">Compiling and Enabling FTS3 and FTS4</h1> <p> Although the FTS3 and FTS4 is included with the SQLite core source code, it is not enabled by default. To build SQLite with FTS functionality enabled, define the preprocessor macro [SQLITE_ENABLE_FTS3] when compiling. New applications should also define the [SQLITE_ENABLE_FTS3_PARENTHESIS] macro to enable the [enhanced query syntax] (see below). Usually, this is done by adding the following two switches to the compiler command line: <codeblock> -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS </codeblock> <p> Note that enabling FTS3 also makes FTS4 available. There is not a separate SQLITE_ENABLE_FTS4 compile-time option. An build of SQLite either supports both FTS3 and FTS4 or it supports neither. <p> If using the amalgamation autoconf based build system, setting the CPPFLAGS environment variable while running the 'configure' script is an easy way to set these macros. For example, the following command: <codeblock> CPPFLAGS="-DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS" ./configure <configure options> </codeblock> <p> where <i><configure options></i> are those options normally passed to the configure script, if any. <p> Because FTS3 and FTS4 are virtual tables, The [SQLITE_ENABLE_FTS3] compile-time option is incompatible with the [SQLITE_OMIT_VIRTUALTABLE] option. <p> If a build of SQLite does not include FTS3, then any attempt to prepare an SQL statement to create an FTS3 or FTS4 table or to drop or access an existing FTS table in any way will fail. The error message returned will be similar to "no such module: ftsN" (where N is either 3 or 4). <p> If the C version of the <a href=http://site.icu-project.org/>ICU library</a> is available, then FTS may also be compiled with the SQLITE_ENABLE_ICU pre-processor macro defined. Compiling with this macro enables an FTS [tokenizer] that uses the ICU library to split a document into terms (words) using the conventions for a specified language and locale. <codeblock> -DSQLITE_ENABLE_ICU </codeblock> <h1 tags="FTS MATCH">Full-text Index Queries</h1> <p> The most useful thing about FTS tables is the queries that may be performed using the built-in full-text index. Full-text queries are performed by specifying a clause of the form "<column> MATCH <full-text query expression>" to the WHERE clause of a SELECT statement that reads data from an FTS table. [simple fts queries\|Simple FTS queries] that return all documents that contain a given term are described above. In that discussion the right-hand operand of the MATCH operator was assumed to be a string consisting of a single term. This section describes the more complex query types supported by FTS tables, and how they may be utilized by specifying a more complex query expression as the right-hand operand of a MATCH operator. <p> FTS tables support three basic query types: <ul> <li><p><b>Token or token prefix queries</b>. An FTS table may be queried for all documents that contain a specified term (the [simple fts queries\|simple case] described above), or for all documents that contain a term with a specified prefix. As we have seen, the query expression for a specific term is simply the term itself. The query expression used to search for a term prefix is the prefix itself with a '*' character appended to it. For example: </ul> <codeblock>
︙			︙
477 478 479 480 481 482 483 ~~484~~ 485 ~~486~~ 487 488 489 490 491 492 493	<i>-- all documents that contain "linux", but also those that contain terms "linear",</i> <i>--"linker", "linguistic" and so on.</i> SELECT * FROM docs WHERE docs MATCH 'lin*'; </codeblock> <ul> <li style="list-style:none"><p> ~~Normally, a token or token prefix query is matched against the FTS3 table~~ column specified as the right-hand side of the MATCH operator. Or, if the ~~special column with the same name as the FTS3 table itself is specified,~~ against all columns. This may be overridden by specifying a column-name followed by a ":" character before a basic term query. There may be space between the ":" and the term to query for, but not between the column-name and the ":" character. For example: </ul> <codeblock>	\| \|	512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528	<i>-- all documents that contain "linux", but also those that contain terms "linear",</i> <i>--"linker", "linguistic" and so on.</i> SELECT * FROM docs WHERE docs MATCH 'lin*'; </codeblock> <ul> <li style="list-style:none"><p> Normally, a token or token prefix query is matched against the FTS table column specified as the right-hand side of the MATCH operator. Or, if the special column with the same name as the FTS table itself is specified, against all columns. This may be overridden by specifying a column-name followed by a ":" character before a basic term query. There may be space between the ":" and the term to query for, but not between the column-name and the ":" character. For example: </ul> <codeblock>
︙			︙
532 533 534 535 536 537 538 ~~539~~ 540 541 542 543 544 545 546	an operator of the form "NEAR/<i><N></i>" may be used, where <i><N></i> is the maximum number of intervening terms allowed. For example: </ul> <codeblock> <i>-- Virtual table declaration.</i> ~~CREATE VIRTUAL TABLE docs USING fts3();~~ <i>-- Virtual table data.</i> INSERT INTO docs VALUES('SQLite is an ACID compliant embedded relational database management system'); <i>-- Search for a document that contains the terms "sqlite" and "database" with</i> <i>-- not more than 10 intervening terms. This matches the only document in</i> <i>-- table docs (since there are only six terms between "SQLite" and "database"</i>	\|	567 568 569 570 571 572 573 574 575 576 577 578 579 580 581	an operator of the form "NEAR/<i><N></i>" may be used, where <i><N></i> is the maximum number of intervening terms allowed. For example: </ul> <codeblock> <i>-- Virtual table declaration.</i> CREATE VIRTUAL TABLE docs USING fts4(); <i>-- Virtual table data.</i> INSERT INTO docs VALUES('SQLite is an ACID compliant embedded relational database management system'); <i>-- Search for a document that contains the terms "sqlite" and "database" with</i> <i>-- not more than 10 intervening terms. This matches the only document in</i> <i>-- table docs (since there are only six terms between "SQLite" and "database"</i>
︙			︙
591 592 593 594 595 596 597 ~~598~~ 599 600 601 602 603 604 605 606 607 608 609 610 611 612 ~~613~~ 614 615 616 617 618 ~~619~~ 620 621 622 623 624 625 ~~626~~ 627 628 629 630 631 632 ~~633~~ 634 635 636 637 638 639 640	<p> Phrase and NEAR queries may not span multiple columns within a row. <p> The three basic query types described above may be used to query the full-text index for the set of documents that match the specified criteria. Using the ~~FTS3 query expression language it is possible to perform various set~~ operations on the results of basic queries. There are currently three supported operations: <ul> <li> The AND operator determines the <b>intersection</b> of two sets of documents. <li> The OR operator calculates the <b>union</b> of two sets of documents. <li> The NOT operator (or, if using the standard syntax, a unary "-" operator) may be used to compute the <b>relative complement</b> of one set of documents with respect to another. </ul> <p> ~~The FTS3 module may be compiled to use one of two slightly different versions~~ of the full-text query syntax, the "standard" query syntax and the "enhanced" query syntax. The basic term, term-prefix, phrase and NEAR queries described above are the same in both versions of the syntax. The way in which set operations are specified is slightly different. The following two sub-sections describe the part of the two query syntaxes that pertains to set operations. ~~Refer to the description of how to [compile fts3] for compilation notes.~~ <h2 tags="enhanced query syntax"> Set Operations Using The Enhanced Query Syntax</h2> <p> The enhanced query syntax supports the AND, OR and NOT binary set operators. ~~Each of the two operands to an operator may be a basic FTS3 query, or the~~ result of another AND, OR or NOT set operation. Operators must be entered using capital letters. Otherwise, they are interpreted as basic term queries instead of set operators. <p> The AND operator may be implicitly specified. If two basic queries appear ~~with no operator separating them in an FTS3 query string, the results are~~ the same as if the two basic queries were separated by an AND operator. For example, the query expression "implicit operator" is a more succinct version of "implicit AND operator". <codeblock> <i>-- Virtual table declaration</i> CREATE VIRTUAL TABLE docs USING fts3();	\| \| \| \| \|	626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675	<p> Phrase and NEAR queries may not span multiple columns within a row. <p> The three basic query types described above may be used to query the full-text index for the set of documents that match the specified criteria. Using the FTS query expression language it is possible to perform various set operations on the results of basic queries. There are currently three supported operations: <ul> <li> The AND operator determines the <b>intersection</b> of two sets of documents. <li> The OR operator calculates the <b>union</b> of two sets of documents. <li> The NOT operator (or, if using the standard syntax, a unary "-" operator) may be used to compute the <b>relative complement</b> of one set of documents with respect to another. </ul> <p> The FTS module may be compiled to use one of two slightly different versions of the full-text query syntax, the "standard" query syntax and the "enhanced" query syntax. The basic term, term-prefix, phrase and NEAR queries described above are the same in both versions of the syntax. The way in which set operations are specified is slightly different. The following two sub-sections describe the part of the two query syntaxes that pertains to set operations. Refer to the description of how to [compile fts] for compilation notes. <h2 tags="enhanced query syntax"> Set Operations Using The Enhanced Query Syntax</h2> <p> The enhanced query syntax supports the AND, OR and NOT binary set operators. Each of the two operands to an operator may be a basic FTS query, or the result of another AND, OR or NOT set operation. Operators must be entered using capital letters. Otherwise, they are interpreted as basic term queries instead of set operators. <p> The AND operator may be implicitly specified. If two basic queries appear with no operator separating them in an FTS query string, the results are the same as if the two basic queries were separated by an AND operator. For example, the query expression "implicit operator" is a more succinct version of "implicit AND operator". <codeblock> <i>-- Virtual table declaration</i> CREATE VIRTUAL TABLE docs USING fts3();
︙			︙
669 670 671 672 673 674 675 ~~676~~ 677 678 679 680 681 682 683	SELECT * FROM docs WHERE docs MATCH 'database and sqlite'; </codeblock> <p> The examples above all use basic full-text term queries as both operands of the set operations demonstrated. Phrase and NEAR queries may also be used, as may the results of other set operations. When more than one set operation ~~is present in an FTS3 query, the precedence of operators is as follows:~~ <table striped=1> <tr><th>Operator<th>Enhanced Query Syntax Precedence <tr><td>NOT <td> Highest precedence (tightest grouping). <tr><td>AND <td> <tr><td>OR <td> Lowest precedence (loosest grouping). </table>	\|	704 705 706 707 708 709 710 711 712 713 714 715 716 717 718	SELECT * FROM docs WHERE docs MATCH 'database and sqlite'; </codeblock> <p> The examples above all use basic full-text term queries as both operands of the set operations demonstrated. Phrase and NEAR queries may also be used, as may the results of other set operations. When more than one set operation is present in an FTS query, the precedence of operators is as follows: <table striped=1> <tr><th>Operator<th>Enhanced Query Syntax Precedence <tr><td>NOT <td> Highest precedence (tightest grouping). <tr><td>AND <td> <tr><td>OR <td> Lowest precedence (loosest grouping). </table>
︙			︙
711 712 713 714 715 716 717 ~~718~~ 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 ~~738~~ 739 740 741 742 743 744 745	); </codeblock> <h2>Set Operations Using The Standard Query Syntax</h2> <p> ~~FTS3 query set operations using the standard query syntax are similar, but~~ not identical, to set operations with the enhanced query syntax. There are four differences, as follows: <ol> <li value=1><p> Only the implicit version of the AND operator is supported. Specifying the string "AND" as part of an standard query syntax query is interpreted as a term query for the set of documents containing the term "and". </ol> <ol> <li value=2><p> Parenthesis are not supported. </ol> <ol> <li value=3><p> The NOT operator is not supported. Instead of the NOT operator, the standard query syntax supports a unary "-" operator that may be applied to basic term and term-prefix queries (but not to phrase or NEAR queries). A term or term-prefix that has a unary "-" operator ~~attached to it may not appear as an operand to an OR operator. An FTS3~~ query may not consist entirely of terms or term-prefix queries with unary "-" operators attached to them. </ol> <codeblock> <i>-- Search for the set of documents that contain the term "sqlite" but do</i> <i>-- not contain the term "database".</i>	\| \|	746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780	); </codeblock> <h2>Set Operations Using The Standard Query Syntax</h2> <p> FTS query set operations using the standard query syntax are similar, but not identical, to set operations with the enhanced query syntax. There are four differences, as follows: <ol> <li value=1><p> Only the implicit version of the AND operator is supported. Specifying the string "AND" as part of an standard query syntax query is interpreted as a term query for the set of documents containing the term "and". </ol> <ol> <li value=2><p> Parenthesis are not supported. </ol> <ol> <li value=3><p> The NOT operator is not supported. Instead of the NOT operator, the standard query syntax supports a unary "-" operator that may be applied to basic term and term-prefix queries (but not to phrase or NEAR queries). A term or term-prefix that has a unary "-" operator attached to it may not appear as an operand to an OR operator. An FTS query may not consist entirely of terms or term-prefix queries with unary "-" operators attached to them. </ol> <codeblock> <i>-- Search for the set of documents that contain the term "sqlite" but do</i> <i>-- not contain the term "database".</i>
︙			︙
769 770 771 772 773 774 775 ~~776~~ 777 778 ~~779~~ 780 781 782 783 784 785 786 787 ~~788 789 790~~ 791 792 793 794 795 796 797 798 799 ~~800~~ 801 802 803 804 805 806 ~~807~~ 808 809 810 811 812 813 ~~814~~ 815 816 817 818 819 ~~820~~ 821 822 823 ~~824 825~~ 826 827 828 ~~829~~ 830 831 832 833 834 835 836 837 838 839 840 841 842 ~~843~~ 844 845 846 847 848 849 850	<i>-- Search for documents that contain at least one of the terms "database"</i> <i>-- and "sqlite", and also contain the term "library". Because of the differences</i> <i>-- in operator precedences, this query would have a different interpretation using</i> <i>-- the enhanced query syntax.</i> SELECT * FROM docs WHERE docs MATCH 'sqlite OR database library'; </codeblock> ~~<h1 ~~tags="snippet, offsets"~~>Auxiliary Functions - Snippet, Offsets and Matchinfo</h1>~~ <p> ~~The FTS3 module provides three special SQL scalar functions that may be useful~~ to the developers of full-text query systems: "snippet", "offsets" and "matchinfo". The purpose of the "snippet" and "offsets" functions is to allow the user to identify the location of queried terms in the returned documents. The "matchinfo" function provides the user with metrics that may be useful for filtering or sorting query results according to relevance. <p> The first argument to all three special SQL scalar functions ~~must be the special hidden column of an~~ FTS3 table that has the same ~~name as the table (see above).~~ For example, given an FTS3 table named ~~"mail":~~ <codeblock> SELECT offsets(mail) FROM mail WHERE mail MATCH <full-text query expression>; SELECT snippet(mail) FROM mail WHERE mail MATCH <full-text query expression>; SELECT matchinfo(mail) FROM mail WHERE mail MATCH <full-text query expression>; </codeblock> <p> The three auxiliary functions are only useful within a SELECT statement that ~~uses the FTS3 table's full-text index. ^If used within a SELECT that uses~~ the "query by rowid" or "linear scan" strategies, then the snippet and offsets both return an empty string, and the matchinfo function returns a blob value zero bytes in size. <p id=matchable> All three auxiliary functions extract a set of "matchable phrases" from ~~the FTS3 query expression to work with. The set of matchable phrases for~~ a given query consists of all phrases (including unquoted tokens and token prefixes) in the expression except those that are prefixed with a unary "-" operator (standard syntax) or are part of a sub-expression that is used as the right-hand operand of a NOT operator. <p> ~~With the following provisos, each series of tokens in the FTS3 table that~~ matches one of the matchable phrases in the query expression is known as a "phrase match": <ol> <li> If a matchable phrase is part of a series of phrases connected by ~~NEAR operators in the FTS3 query expression, then each phrase match~~ must be sufficiently close to other phrase matches of the relevant types to satisfy the NEAR condition. ~~<li> If the matchable phrase in the FTS3 query is restricted to matching data in a specified FTS3 table column, then only phrase matches that~~ occur within that column are considered. </ol> <h2>The Offsets Function</h2> <p> For a SELECT query that uses the full-text index, the offsets() function returns a text value containing a series of space-separated integers. For each term in each <a href=#matchable>phrase match</a> of the current row, there are four integers in the returned list. Each set of four integers is interpreted as follows: <table striped=1> <tr><th>Integer <th>Interpretation <tr><td>0 <td>The column number that the term instance occurs in (0 for the ~~leftmost column of the FTS3 table, 1 for the next leftmost, etc.).~~ <tr><td>1 <td>The term number of the matching term within the full-text query expression. Terms within a query expression are numbered starting from 0 in the order that they occur. <tr><td>2 <td>The byte offset of the matching term within the column. <tr><td>3	> \| \| > > \| \| < \| \| \| \| \| \| \| \|	804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887	<i>-- Search for documents that contain at least one of the terms "database"</i> <i>-- and "sqlite", and also contain the term "library". Because of the differences</i> <i>-- in operator precedences, this query would have a different interpretation using</i> <i>-- the enhanced query syntax.</i> SELECT * FROM docs WHERE docs MATCH 'sqlite OR database library'; </codeblock> <tcl>hd_fragment snippet {FTS auxiliary functions}</tcl> <h1>Auxiliary Functions - Snippet, Offsets and Matchinfo</h1> <p> The FTS3 and FTS4 modules provide three special SQL scalar functions that may be useful to the developers of full-text query systems: "snippet", "offsets" and "matchinfo". The purpose of the "snippet" and "offsets" functions is to allow the user to identify the location of queried terms in the returned documents. The "matchinfo" function provides the user with metrics that may be useful for filtering or sorting query results according to relevance. <p> The first argument to all three special SQL scalar functions must be the [FTS hidden column] of the FTS table that the function is applied to. The [FTS hidden column] is an automatically-generated column found on all FTS tables that has the same name as the FTS table itself. For example, given an FTS table named "mail": <codeblock> SELECT offsets(mail) FROM mail WHERE mail MATCH <full-text query expression>; SELECT snippet(mail) FROM mail WHERE mail MATCH <full-text query expression>; SELECT matchinfo(mail) FROM mail WHERE mail MATCH <full-text query expression>; </codeblock> <p> The three auxiliary functions are only useful within a SELECT statement that uses the FTS table's full-text index. ^If used within a SELECT that uses the "query by rowid" or "linear scan" strategies, then the snippet and offsets both return an empty string, and the matchinfo function returns a blob value zero bytes in size. <p id=matchable> All three auxiliary functions extract a set of "matchable phrases" from the FTS query expression to work with. The set of matchable phrases for a given query consists of all phrases (including unquoted tokens and token prefixes) in the expression except those that are prefixed with a unary "-" operator (standard syntax) or are part of a sub-expression that is used as the right-hand operand of a NOT operator. <p> With the following provisos, each series of tokens in the FTS table that matches one of the matchable phrases in the query expression is known as a "phrase match": <ol> <li> If a matchable phrase is part of a series of phrases connected by NEAR operators in the FTS query expression, then each phrase match must be sufficiently close to other phrase matches of the relevant types to satisfy the NEAR condition. <li> If the matchable phrase in the FTS query is restricted to matching data in a specified FTS table column, then only phrase matches that occur within that column are considered. </ol> <tcl>hd_fragment offsets offsets</tcl> <h2>The Offsets Function</h2> <p> For a SELECT query that uses the full-text index, the offsets() function returns a text value containing a series of space-separated integers. For each term in each <a href=#matchable>phrase match</a> of the current row, there are four integers in the returned list. Each set of four integers is interpreted as follows: <table striped=1> <tr><th>Integer <th>Interpretation <tr><td>0 <td>The column number that the term instance occurs in (0 for the leftmost column of the FTS table, 1 for the next leftmost, etc.). <tr><td>1 <td>The term number of the matching term within the full-text query expression. Terms within a query expression are numbered starting from 0 in the order that they occur. <tr><td>2 <td>The byte offset of the matching term within the column. <tr><td>3
︙			︙
877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 ~~894~~ ~~895 896~~ 897 898 899 900 901 902 903 ~~904~~ 905 906 907 908 909 910 911	<i>-- The following query matches the second row in table "mail". It returns the</i> <i>-- text "1 0 28 7 1 1 36 4". Only those occurrences of terms "serious" and "mail"</i> <i>-- that are part of an instance of the phrase "serious mail" are identified; the</i> <i>-- other occurrences of "serious" and "mail" are ignored.</i> SELECT offsets(mail) FROM mail WHERE mail MATCH '"serious mail"'; </codeblock> <h2>The Snippet Function</h2> <p> The snippet function is used to create formatted fragments of document text for display as part of a full-text query results report. The snippet function may be passed between one and six arguments, as follows: <table striped=1> <tr><th>Argument <th>Default Value <th>Description <tr><td>0 <td>N/A ~~<td> The first argument to the snippet function must always be the spe~~cial~~~~ ~~hidden column of th~~e FTS3 table that takes~~ the same name as th~~e tabl~~e itself.~~ <tr><td>1 <td>"<b>" <td> The "start match" text. <tr><td>2 <td>"</b>" <td> The "end match" text. <tr><td>3 <td>"<b>...</b>" <td> The "ellipses" text. <tr><td>4 <td>-1 ~~<td> The FTS3 table column number to extract the returned fragments of~~ text from. Columns are numbered from left to right starting with zero. A negative value indicates that the text may be extracted from any column. <tr><td>5 <td>-15 <td> The absolute value of this integer argument is used as the (approximate) number of tokens to include in the returned text value. The maximum allowable absolute value is 64. The value of	> \| > \| \| \|	914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950	<i>-- The following query matches the second row in table "mail". It returns the</i> <i>-- text "1 0 28 7 1 1 36 4". Only those occurrences of terms "serious" and "mail"</i> <i>-- that are part of an instance of the phrase "serious mail" are identified; the</i> <i>-- other occurrences of "serious" and "mail" are ignored.</i> SELECT offsets(mail) FROM mail WHERE mail MATCH '"serious mail"'; </codeblock> <tcl>hd_fragment snippet snippet</tcl> <h2>The Snippet Function</h2> <p> The snippet function is used to create formatted fragments of document text for display as part of a full-text query results report. The snippet function may be passed between one and six arguments, as follows: <table striped=1> <tr><th>Argument <th>Default Value <th>Description <tr><td>0 <td>N/A <td> The first argument to the snippet function must always be the [FTS hidden column] of the FTS table being queried and from which the snippet is to be taken. The [FTS hidden column] is an automatically generated column with the same name as the FTS table itself. <tr><td>1 <td>"<b>" <td> The "start match" text. <tr><td>2 <td>"</b>" <td> The "end match" text. <tr><td>3 <td>"<b>...</b>" <td> The "ellipses" text. <tr><td>4 <td>-1 <td> The FTS table column number to extract the returned fragments of text from. Columns are numbered from left to right starting with zero. A negative value indicates that the text may be extracted from any column. <tr><td>5 <td>-15 <td> The absolute value of this integer argument is used as the (approximate) number of tokens to include in the returned text value. The maximum allowable absolute value is 64. The value of
︙			︙
966 967 968 969 970 971 972 ~~973~~ 974 975 976 ~~977 978~~ 979 980 981 982 983 984 985	two and four as described in the paragraphs above, they are joined together in sorted order with the "ellipses" text separating them. The three modifications enumerated earlier are performed on the text before it is returned. <codeblock> <b>Note: In this block of examples, newlines and whitespace characters have ~~been inserted into the document inserted into the FTS3 table, and the expected~~ results described in SQL comments. This is done to enhance readability only, they would not be present in actual SQLite commands or output.</b> ~~<i>-- Create and populate an FTS3 table.</i> CREATE VIRTUAL TABLE text USING fts3();~~ INSERT INTO text VALUES(' During 30 Nov-1 Dec, 2-3oC drops. Cool in the upper portion, minimum temperature 14-16oC and cool elsewhere, minimum temperature 17-20oC. Cold to very cold on mountaintops, minimum temperature 6-12oC. Northeasterly winds 15-30 km/hr. After that, temperature increases. Northeasterly winds 15-30 km/hr. ');	\| \| \|	1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024	two and four as described in the paragraphs above, they are joined together in sorted order with the "ellipses" text separating them. The three modifications enumerated earlier are performed on the text before it is returned. <codeblock> <b>Note: In this block of examples, newlines and whitespace characters have been inserted into the document inserted into the FTS table, and the expected results described in SQL comments. This is done to enhance readability only, they would not be present in actual SQLite commands or output.</b> <i>-- Create and populate an FTS table.</i> CREATE VIRTUAL TABLE text USING fts4(); INSERT INTO text VALUES(' During 30 Nov-1 Dec, 2-3oC drops. Cool in the upper portion, minimum temperature 14-16oC and cool elsewhere, minimum temperature 17-20oC. Cold to very cold on mountaintops, minimum temperature 6-12oC. Northeasterly winds 15-30 km/hr. After that, temperature increases. Northeasterly winds 15-30 km/hr. ');
︙			︙
1007 1008 1009 1010 1011 1012 1013 ~~1014~~ 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 ~~1026~~ 1027 1028 1029 1030 1031 1032 ~~1033 1034 1035~~ 1036 1037 1038 1039 1040 1041 1042 1043 ~~1044 1045~~ 1046 1047 1048 1049 1050 1051 1052	or more 32-bit unsigned integers in machine byte-order. The exact number of integers in the returned array depends on both the query and the value of the second argument (if any) passed to the matchinfo function. <p> The matchinfo function is called with either one or two arguments. As for all auxiliary functions, the first argument must be the special hidden ~~column of an FTS~~3/4~~ table that has the same name as the table itself (see~~ above). The second argument, if it is specified, must be a text value comprised only of the characters 'p', 'c', 'n', 'a', 'l', 's' and 'x'. If no second argument is explicitly supplied, it defaults to "pcx". The second argument is refered to as the "format string" below. <p> Characters in the matchinfo format string are processed from left to right. Each character in the format string causes one or more 32-bit unsigned integer values to be added to the returned array. The "values" column in the following table contains the number of integer values appended to the output buffer for each supported format string character. In the formula ~~given, <i>cols</i> is the number of columns in the FTS~~3/4~~ table, and~~ <i>phrases</i> is the number of <a href=#matchable>matchable phrases</a> in the query. <table striped=1> <tr><th>Character<th>Values<th>Description <tr><td>p <td>1 <td>The number of matchable phrases in the query. ~~<tr><td>c <td>1 <td>The number of user defined columns in the FTS~~3/4~~ table (i.e. not including the docid or the hidden column ~~with the same~~ ~~name as the table).~~~~ <tr><td>x <td style="white-space:nowrap">3 * <i>cols</i> * <i>phrases</i> <td> For each distinct combination of a phrase and table column, the following three values: <ul> <li> In the current row, the number of times the phrase appears in the column. <li> The total number of times the phrase appears in the column in ~~all rows in the FTS~~3/4~~ table. <li> The total number of rows in the FTS~~3/4~~ table for which the~~ column contains at least one instance of the phrase. </ul> The first set of three values corresponds to the left-most column of the table (column 0) and the left-most matchable phrase in the query (phrase 0). If the table has more than one column, the second set of three values in the output array correspond to phrase 0 and column 1. Followed by phrase 0, column 2 and so on for all columns of	\| \| \| \| < \| \|	1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090	or more 32-bit unsigned integers in machine byte-order. The exact number of integers in the returned array depends on both the query and the value of the second argument (if any) passed to the matchinfo function. <p> The matchinfo function is called with either one or two arguments. As for all auxiliary functions, the first argument must be the special hidden column of an FTS table that has the same name as the table itself (see above). The second argument, if it is specified, must be a text value comprised only of the characters 'p', 'c', 'n', 'a', 'l', 's' and 'x'. If no second argument is explicitly supplied, it defaults to "pcx". The second argument is refered to as the "format string" below. <p> Characters in the matchinfo format string are processed from left to right. Each character in the format string causes one or more 32-bit unsigned integer values to be added to the returned array. The "values" column in the following table contains the number of integer values appended to the output buffer for each supported format string character. In the formula given, <i>cols</i> is the number of columns in the FTS table, and <i>phrases</i> is the number of <a href=#matchable>matchable phrases</a> in the query. <table striped=1> <tr><th>Character<th>Values<th>Description <tr><td>p <td>1 <td>The number of matchable phrases in the query. <tr><td>c <td>1 <td>The number of user defined columns in the FTS table (i.e. not including the docid or the [FTS hidden column]). <tr><td>x <td style="white-space:nowrap">3 * <i>cols</i> * <i>phrases</i> <td> For each distinct combination of a phrase and table column, the following three values: <ul> <li> In the current row, the number of times the phrase appears in the column. <li> The total number of times the phrase appears in the column in all rows in the FTS table. <li> The total number of rows in the FTS table for which the column contains at least one instance of the phrase. </ul> The first set of three values corresponds to the left-most column of the table (column 0) and the left-most matchable phrase in the query (phrase 0). If the table has more than one column, the second set of three values in the output array correspond to phrase 0 and column 1. Followed by phrase 0, column 2 and so on for all columns of
︙			︙
1146 1147 1148 1149 1150 1151 1152 ~~1153 1154~~ 1155 1156 1157 ~~1158~~ 1159 ~~1160~~ 1161 1162 1163 1164 1165 1166 1167	be used to order results in a full-text search application. Appendix A of this document, "[search application tips]", contains an example of using the matchinfo() function efficiently. <h1 id=tokenizer tags="tokenizer">Tokenizers</h1> <p> ~~An FTS3 tokenizer is a set of rules for extracting terms from a document or basic FTS3 full-text query.~~ <p> Unless a specific tokenizer is specified as part of the CREATE ~~VIRTUAL TABLE statement used to create the FTS3 table, the default~~ tokenizer, "simple", is used. The simple tokenizer extracts tokens from ~~a document or basic FTS3 full-text query according to the following~~ rules: <ul> <li><p> A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the "_" character, and all characters with UTF codepoints greater than or equal to 128. All other characters are discarded when splitting a document into terms.	\| \| \| \|	1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205	be used to order results in a full-text search application. Appendix A of this document, "[search application tips]", contains an example of using the matchinfo() function efficiently. <h1 id=tokenizer tags="tokenizer">Tokenizers</h1> <p> An FTS tokenizer is a set of rules for extracting terms from a document or basic FTS full-text query. <p> Unless a specific tokenizer is specified as part of the CREATE VIRTUAL TABLE statement used to create the FTS table, the default tokenizer, "simple", is used. The simple tokenizer extracts tokens from a document or basic FTS full-text query according to the following rules: <ul> <li><p> A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters, the "_" character, and all characters with UTF codepoints greater than or equal to 128. All other characters are discarded when splitting a document into terms.
︙			︙
1178 1179 1180 1181 1182 1183 1184 ~~1185~~ 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 ~~1199~~ 1200 1201 1202 1203 1204 1205 1206	frustrated.", the terms extracted from the document and added to the full-text index are, in order, "right now they re very frustrated". Such a document would match a full-text query such as "MATCH 'Frustrated'", as the simple tokenizer transforms the term in the query to lowercase before searching the full-text index. <p> ~~As well as the "simple" tokenizer, the FTS3 source code features a tokenizer~~ that uses the <a href="http://tartarus.org/~martin/PorterStemmer/">Porter Stemming algorithm</a>. This tokenizer uses the same rules to separate the input document into terms, but as well as folding all terms to lower case it uses the Porter Stemming algorithm to reduce related English language words to a common root. For example, using the same input document as in the paragraph above, the porter tokenizer extracts the following tokens: "right now thei veri frustrat". Even though some of these terms are not even English words, in some cases using them to build the full-text index is more useful than the more intelligible output produced by the simple tokenizer. Using the porter tokenizer, the document not only matches full-text queries such as "MATCH 'Frustrated'", but also queries such as "MATCH 'Frustration'", as the term "Frustration" is reduced by the Porter stemmer algorithm to "frustrat" - just as "Frustrated" is. So, when using the porter tokenizer, ~~FTS3 is able to find not just exact matches for queried terms, but matches~~ against similar English language terms. For more information on the Porter Stemmer algorithm, please refer to the page linked above. <p> Example illustrating the difference between the "simple" and "porter" tokenizers:	\| \|	1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244	frustrated.", the terms extracted from the document and added to the full-text index are, in order, "right now they re very frustrated". Such a document would match a full-text query such as "MATCH 'Frustrated'", as the simple tokenizer transforms the term in the query to lowercase before searching the full-text index. <p> As well as the "simple" tokenizer, the FTS source code features a tokenizer that uses the <a href="http://tartarus.org/~martin/PorterStemmer/">Porter Stemming algorithm</a>. This tokenizer uses the same rules to separate the input document into terms, but as well as folding all terms to lower case it uses the Porter Stemming algorithm to reduce related English language words to a common root. For example, using the same input document as in the paragraph above, the porter tokenizer extracts the following tokens: "right now thei veri frustrat". Even though some of these terms are not even English words, in some cases using them to build the full-text index is more useful than the more intelligible output produced by the simple tokenizer. Using the porter tokenizer, the document not only matches full-text queries such as "MATCH 'Frustrated'", but also queries such as "MATCH 'Frustration'", as the term "Frustration" is reduced by the Porter stemmer algorithm to "frustrat" - just as "Frustrated" is. So, when using the porter tokenizer, FTS is able to find not just exact matches for queried terms, but matches against similar English language terms. For more information on the Porter Stemmer algorithm, please refer to the page linked above. <p> Example illustrating the difference between the "simple" and "porter" tokenizers:
︙			︙
1244 1245 1246 1247 1248 1249 1250 ~~1251~~ 1252 1253 1254 1255 ~~1256~~ 1257 1258 1259 1260 1261 1262 1263 ~~1264~~ 1265 ~~1266~~ 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 ~~1287~~ 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 ~~1298~~ 1299 1300 1301 1302 1303 1304 1305	discard punctuation, this can be done by creating a tokenizer implementation that uses the ICU tokenizer as part of its implementation. <h2>Custom (User Implemented) Tokenizers</h2> <p> As well as the built-in "simple", "porter" and (possibly) "icu" tokenizers, ~~FTS3 exports an interface that allows users to implement custom tokenizers~~ using C. The interface used to create a new tokenizer is defined and described in the fts3_tokenizer.h source file. <p> ~~Registering a new FTS3 tokenizer is similar to registering a new~~ virtual table module with SQLite. The user passes a pointer to a structure containing pointers to various callback functions that make up the implementation of the new tokenizer type. For tokenizers, the structure (defined in fts3_tokenizer.h) is called "sqlite3_tokenizer_module". <p> ~~FTS3 does not expose a C-function that users call to register new~~ tokenizer types with a database handle. Instead, the pointer must ~~be encoded as an SQL blob value and passed to FTS3 through the SQL~~ engine by evaluating a special scalar function, "fts3_tokenizer()". The fts3_tokenizer() function may be called with one or two arguments, as follows: <codeblock> SELECT fts3_tokenizer(<tokenizer-name>); SELECT fts3_tokenizer(<tokenizer-name>, <sqlite3_tokenizer_module ptr>); </codeblock> <p> Where <tokenizer-name> is a string identifying the tokenizer and <sqlite3_tokenizer_module ptr> is a pointer to an sqlite3_tokenizer_module structure encoded as an SQL blob. If the second argument is present, it is registered as tokenizer <tokenizer-name> and a copy of it returned. If only one argument is passed, a pointer to the tokenizer implementation currently registered as <tokenizer-name> is returned, encoded as a blob. Or, if no such tokenizer exists, an SQL exception (error) is raised. <p> ~~<b>SECURITY WARNING</b>: If the fts3 extension is used in an environment~~ where potentially malicious users may execute arbitrary SQL, they should be prevented from invoking the fts3_tokenizer() function, possibly using the [sqlite3_set_authorizer()\|authorisation callback]. <p> The following block contains an example of calling the fts3_tokenizer() function from C code: <codeblock> <i>/* ** Register a tokenizer implementation with FTS3. /</i> int registerTokenizer( sqlite3 db, char zName, const sqlite3_tokenizer_module p ){ int rc;	\| \| \| \| \| \|	1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343	discard punctuation, this can be done by creating a tokenizer implementation that uses the ICU tokenizer as part of its implementation. <h2>Custom (User Implemented) Tokenizers</h2> <p> As well as the built-in "simple", "porter" and (possibly) "icu" tokenizers, FTS exports an interface that allows users to implement custom tokenizers using C. The interface used to create a new tokenizer is defined and described in the fts3_tokenizer.h source file. <p> Registering a new FTS tokenizer is similar to registering a new virtual table module with SQLite. The user passes a pointer to a structure containing pointers to various callback functions that make up the implementation of the new tokenizer type. For tokenizers, the structure (defined in fts3_tokenizer.h) is called "sqlite3_tokenizer_module". <p> FTS does not expose a C-function that users call to register new tokenizer types with a database handle. Instead, the pointer must be encoded as an SQL blob value and passed to FTS through the SQL engine by evaluating a special scalar function, "fts3_tokenizer()". The fts3_tokenizer() function may be called with one or two arguments, as follows: <codeblock> SELECT fts3_tokenizer(<tokenizer-name>); SELECT fts3_tokenizer(<tokenizer-name>, <sqlite3_tokenizer_module ptr>); </codeblock> <p> Where <tokenizer-name> is a string identifying the tokenizer and <sqlite3_tokenizer_module ptr> is a pointer to an sqlite3_tokenizer_module structure encoded as an SQL blob. If the second argument is present, it is registered as tokenizer <tokenizer-name> and a copy of it returned. If only one argument is passed, a pointer to the tokenizer implementation currently registered as <tokenizer-name> is returned, encoded as a blob. Or, if no such tokenizer exists, an SQL exception (error) is raised. <p> <b>SECURITY WARNING</b>: If the fts3/4 extension is used in an environment where potentially malicious users may execute arbitrary SQL, they should be prevented from invoking the fts3_tokenizer() function, possibly using the [sqlite3_set_authorizer()\|authorisation callback]. <p> The following block contains an example of calling the fts3_tokenizer() function from C code: <codeblock> <i>/* ** Register a tokenizer implementation with FTS3 or FTS4. /</i> int registerTokenizer( sqlite3 db, char zName, const sqlite3_tokenizer_module p ){ int rc;
︙			︙
1315 1316 1317 1318 1319 1320 1321 ~~1322~~ 1323 1324 1325 1326 1327 1328 1329	sqlite3_bind_blob(pStmt, 2, &p, sizeof(p), SQLITE_STATIC); sqlite3_step(pStmt); return sqlite3_finalize(pStmt); } <i>/* ** Query FTS3 for the tokenizer implementation named zName. /</i> int queryTokenizer( sqlite3 db, char zName, const sqlite3_tokenizer_module *pp ){ int rc;	\|	1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367	sqlite3_bind_blob(pStmt, 2, &p, sizeof(p), SQLITE_STATIC); sqlite3_step(pStmt); return sqlite3_finalize(pStmt); } <i>/* ** Query FTS for the tokenizer implementation named zName. /</i> int queryTokenizer( sqlite3 db, char zName, const sqlite3_tokenizer_module *pp ){ int rc;
︙			︙
1347 1348 1349 1350 1351 1352 1353 ~~1354~~ 1355 ~~1356~~ 1357 ~~1358 1359~~ 1360 1361 ~~1362 1363~~ ~~1364 1365~~ 1366 1367 1368 ~~1369~~ 1370 1371 1372 1373 1374 1375 1376 1377 ~~1378~~ 1379 1380 1381 1382 1383 1384 1385 ~~1386~~ 1387 1388 1389 1390 ~~1391~~ 1392 1393 1394 ~~1395~~ 1396 1397 ~~1398~~ 1399 1400 1401 1402 1403 1404 1405	} </codeblock> <h1 tags="segment btree">Data Structures</h1> <p> ~~This section describes at a high-level the way the FTS3 module stores its~~ index and content in the database. It is <b>not necessary to read or ~~understand the material in this section in order to use FTS3</b> in an~~ application. However, it may be useful to application developers attempting ~~to analyze and understand FTS3 performance characteristics, or to developers contemplating enhancements to the existing FTS3 feature set.~~ <p> ~~For each FTS3 virtual table in a database, three real (non-virtual) tables are created to store the underlying data. The real tables are ~~nam~~ed "~~%_c~~ont~~ent",~~~~ ~~"%_segdir" ~~and~~ "%_segments", where "%" is replaced by the name ~~supplied by~~ ~~the user for~~ the FTS3 virtual table.~~ <p> The leftmost column of the "%_content" table is an INTEGER PRIMARY KEY field ~~named "docid". Following this is one column for each column of the FTS3~~ virtual table as declared by the user, named by prepending the column name supplied by the user with "c<i>N</i>", where <i>N</i> is the index of the column within the table, numbered from left to right starting with 0. Data types supplied as part of the virtual table declaration are not used as part of the %_content table declaration. For example: <codeblock> <i>-- Virtual table declaration</i> ~~CREATE VIRTUAL TABLE abc USING ~~FTS3~~(a NUMBER, b TEXT, c);~~ <i>-- Corresponding %_content table declaration</i> CREATE TABLE abc_content(docid INTEGER PRIMARY KEY, c0a, c1b, c2c); </codeblock> <p> The %_content table contains the unadulterated data inserted by the user ~~into the FTS3 virtual table by the user. If the user does not explicitly~~ supply a "docid" value when inserting records, one is selected automatically by the system. <p> ~~The two ~~remaining~~ tables, %_segments and %_segdir, are used to store the~~ full-text index. Conceptually, this index is a lookup table that maps each term (word) to the set of docid values corresponding to records in the %_content table that contain one or more occurrences of the term. To ~~retrieve all documents that contain a specified term, the FTS3 module~~ queries this index to determine the set of docid values for records that contain the term, then retrieves the required documents from the %_content ~~table. Regardless of the schema of the FTS3 virtual table, the %_segments~~ and %_segdir tables are always created as follows: <codeblock> CREATE TABLE %_segments( blockid INTEGER PRIMARY KEY, <i>-- B-tree node id</i> block blob <i>-- B-tree node data</i> );	\| \| \| \| > \| \| > \| \| \| \| \| \| \| \|	1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445	} </codeblock> <h1 tags="segment btree">Data Structures</h1> <p> This section describes at a high-level the way the FTS module stores its index and content in the database. It is <b>not necessary to read or understand the material in this section in order to use FTS</b> in an application. However, it may be useful to application developers attempting to analyze and understand FTS performance characteristics, or to developers contemplating enhancements to the existing FTS feature set. <tcl>hd_fragment *shadowtab {FTS shadow tables}</tcl> <p> For each FTS virtual table in a database, three to five real (non-virtual) tables are created to store the underlying data. These real tables are called "shadow tables". The real tables are named "%_content", "%_segdir", "%_segments", "%_stat", and "%_docsize", where "%" is replaced by the name of the FTS virtual table. <p> The leftmost column of the "%_content" table is an INTEGER PRIMARY KEY field named "docid". Following this is one column for each column of the FTS virtual table as declared by the user, named by prepending the column name supplied by the user with "c<i>N</i>", where <i>N</i> is the index of the column within the table, numbered from left to right starting with 0. Data types supplied as part of the virtual table declaration are not used as part of the %_content table declaration. For example: <codeblock> <i>-- Virtual table declaration</i> CREATE VIRTUAL TABLE abc USING fts4(a NUMBER, b TEXT, c); <i>-- Corresponding %_content table declaration</i> CREATE TABLE abc_content(docid INTEGER PRIMARY KEY, c0a, c1b, c2c); </codeblock> <p> The %_content table contains the unadulterated data inserted by the user into the FTS virtual table by the user. If the user does not explicitly supply a "docid" value when inserting records, one is selected automatically by the system. <p> The two tables, %_segments and %_segdir, are used to store the full-text index. Conceptually, this index is a lookup table that maps each term (word) to the set of docid values corresponding to records in the %_content table that contain one or more occurrences of the term. To retrieve all documents that contain a specified term, the FTS module queries this index to determine the set of docid values for records that contain the term, then retrieves the required documents from the %_content table. Regardless of the schema of the FTS virtual table, the %_segments and %_segdir tables are always created as follows: <codeblock> CREATE TABLE %_segments( blockid INTEGER PRIMARY KEY, <i>-- B-tree node id</i> block blob <i>-- B-tree node data</i> );
︙			︙
1449 1450 1451 1452 1453 1454 1455 ~~1456 1457~~ 1458 1459 1460 1461 1462 1463 1464 1465 1466 ~~1467~~ 1468 1469 1470 1471 1472 1473 1474 ~~1475~~ 1476 1477 1478 ~~1479~~ 1480 1481 1482 1483 1484 1485 1486 1487 1488 ~~1489~~ 1490 1491 1492 1493 1494 1495 1496	segment b-tree, then it maps to the union of each individual doclist. If, for a single term, the same docid occurs in more than one doclist, then only the doclist that is part of the most recently created segment b-tree is considered valid. <p> Multiple b-tree structures are used instead of a single b-tree to reduce ~~the cost of inserting records into FTS3 tables. When a new record is inserted into an FTS3 table that already contains a lot of data, it is~~ likely that many of the terms in the new record are already present in a large number of existing records. If a single b-tree were used, then large doclist structures would have to be loaded from the database, amended to include the new docid and term-offset list, then written back to the database. Using multiple b-tree tables allows this to be avoided by creating a new b-tree which can be merged with the existing b-tree (or b-trees) later on. Merging of b-tree structures can be performed as a background task, or once a certain number of separate b-tree structures have been accumulated. Of course, this scheme makes queries more expensive ~~(as the FTS3 code may have to look up individual terms in more than one~~ b-tree and merge the results), but it has been found that in practice this overhead is often negligible. <h2>Variable Length Integer (varint) Format</h2> <p> Integer values stored as part of segment b-tree nodes are encoded using the ~~FTS3 varint format. This encoding is similar, but <b>not identical</b>, to the~~ the <a href="fileformat.html#varint_format">SQLite varint format</a>. <p> ~~An encoded FTS3 varint consumes between one and ten bytes of space. The~~ number of bytes required is determined by the sign and magnitude of the integer value encoded. More accurately, the number of bytes used to store the encoded integer depends on the position of the most significant set bit in the 64-bit twos-complement representation of the integer value. Negative values always have the most significant bit set (the sign bit), and so are always stored using the full ten bytes. Positive integer values may be stored using less space. <p> ~~The final byte of an encoded FTS3 varint has its most significant bit~~ cleared. All preceding bytes have the most significant bit set. Data is stored in the remaining seven least significant bits of each byte. The first byte of the encoded representation contains the least significant seven bits of the encoded integer value. The second byte of the encoded representation, if it is present, contains the seven next least significant bits of the integer value, and so on. The following table contains examples of encoded integer values:	\| \| \| \| \| \|	1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 1533 1534 1535 1536	segment b-tree, then it maps to the union of each individual doclist. If, for a single term, the same docid occurs in more than one doclist, then only the doclist that is part of the most recently created segment b-tree is considered valid. <p> Multiple b-tree structures are used instead of a single b-tree to reduce the cost of inserting records into FTS tables. When a new record is inserted into an FTS table that already contains a lot of data, it is likely that many of the terms in the new record are already present in a large number of existing records. If a single b-tree were used, then large doclist structures would have to be loaded from the database, amended to include the new docid and term-offset list, then written back to the database. Using multiple b-tree tables allows this to be avoided by creating a new b-tree which can be merged with the existing b-tree (or b-trees) later on. Merging of b-tree structures can be performed as a background task, or once a certain number of separate b-tree structures have been accumulated. Of course, this scheme makes queries more expensive (as the FTS code may have to look up individual terms in more than one b-tree and merge the results), but it has been found that in practice this overhead is often negligible. <h2>Variable Length Integer (varint) Format</h2> <p> Integer values stored as part of segment b-tree nodes are encoded using the FTS varint format. This encoding is similar, but <b>not identical</b>, to the the <a href="fileformat.html#varint_format">SQLite varint format</a>. <p> An encoded FTS varint consumes between one and ten bytes of space. The number of bytes required is determined by the sign and magnitude of the integer value encoded. More accurately, the number of bytes used to store the encoded integer depends on the position of the most significant set bit in the 64-bit twos-complement representation of the integer value. Negative values always have the most significant bit set (the sign bit), and so are always stored using the full ten bytes. Positive integer values may be stored using less space. <p> The final byte of an encoded FTS varint has its most significant bit cleared. All preceding bytes have the most significant bit set. Data is stored in the remaining seven least significant bits of each byte. The first byte of the encoded representation contains the least significant seven bits of the encoded integer value. The second byte of the encoded representation, if it is present, contains the seven next least significant bits of the integer value, and so on. The following table contains examples of encoded integer values:
︙			︙
1582 1583 1584 1585 1586 1587 1588 ~~1589~~ 1590 1591 1592 1593 1594 1595 1596 1597 ~~1598~~ 1599 1600 1601 1602 1603 1604 1605	</center> <h2>Doclist Format</h2> <p> A doclist consists of an array of 64-bit signed integers, serialized using ~~the FTS3 varint format. Each doclist entry is made up of a series of two~~ or more integers, as follows: <ol> <li> The docid value. The first entry in a doclist contains the literal docid value. The first field of each subsequent doclist entry contains the difference between the new docid and the previous one (always a positive number). <li> Zero or more term-offset lists. A term-offset list is present for each ~~column of the FTS3 virtual table that contains the term. A term-offset~~ list consists of the following: <ol> <li> Constant value 1. This field is omitted for any term-offset list associated with column 0. <li> The column number (1 for the second leftmost column, etc.). This field is omitted for any term-offset list associated with column 0. <li> A list of term-offsets, sorted from smallest to largest. Instead	\| \|	1622 1623 1624 1625 1626 1627 1628 1629 1630 1631 1632 1633 1634 1635 1636 1637 1638 1639 1640 1641 1642 1643 1644 1645	</center> <h2>Doclist Format</h2> <p> A doclist consists of an array of 64-bit signed integers, serialized using the FTS varint format. Each doclist entry is made up of a series of two or more integers, as follows: <ol> <li> The docid value. The first entry in a doclist contains the literal docid value. The first field of each subsequent doclist entry contains the difference between the new docid and the previous one (always a positive number). <li> Zero or more term-offset lists. A term-offset list is present for each column of the FTS virtual table that contains the term. A term-offset list consists of the following: <ol> <li> Constant value 1. This field is omitted for any term-offset list associated with column 0. <li> The column number (1 for the second leftmost column, etc.). This field is omitted for any term-offset list associated with column 0. <li> A list of term-offsets, sorted from smallest to largest. Instead
︙			︙