Documentation Source Text

Check-in [0b949f6763]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix some problems with fts3.html.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 0b949f6763debd7dcd72f92b597eeb0aac067b32
User & Date: dan 2009-11-30 11:52:39.000
Context
2009-11-30
13:55
Fix typo on the famous.html page. Other updates prior to publication. (check-in: 5572b2965d user: drh tags: trunk)
11:52
Fix some problems with fts3.html. (check-in: 0b949f6763 user: dan tags: trunk)
08:41
Add text to fts3.html describing the snippet and offsets functions. (check-in: d91a4c7249 user: dan tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fts3.in.
1
2
3
4
5
6
7
8
9















10
11
12
13
14
15
16
<title>SQLite FTS3 Extension</title>
<tcl>

hd_keywords *fts3 FTS3
source [file join $::DOC pages fancyformat.tcl]
fancyformat_document "SQLite FTS3 Extension" {} {

<h2 style="margin-left:1.0em"> Overview</h2>
















[h1 "Introduction to FTS3"]

<p>
  The FTS3 extension module allows users to create special tables with a 
  built-in full-text index (hereafter "FTS3 tables"). The full-text index
  allows the user to efficiently query the database for all rows that contain
  one or more instances specified word (hereafter a "token", even if the table









>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<title>SQLite FTS3 Extension</title>
<tcl>

hd_keywords *fts3 FTS3
source [file join $::DOC pages fancyformat.tcl]
fancyformat_document "SQLite FTS3 Extension" {} {

<h2 style="margin-left:1.0em"> Overview</h2>

<p>
  FTS3 is an SQLite virtual table module that allows users to perform 
  full-text searches on a set of documents. The most common (and effective) 
  way to describe full-text searches is "what Google, Yahoo and Altavista do
  with documents placed on the World Wide Web". Users input a term, or series 
  of terms, perhaps connected by a binary operator or grouped together into a 
  phrase, and the full-text query system finds the set of documents that best 
  matches those terms considering the operators and groupings the user has 
  specified. This document describes the deployment and usage of FTS3.

<p>
  FTS3 was originally contributed to the SQLite project by Scott Hess and
  <a href="http://www.google.com">Google</a>. It is now developed and maintained
  as part of SQLite.

[h1 "Introduction to FTS3"]

<p>
  The FTS3 extension module allows users to create special tables with a 
  built-in full-text index (hereafter "FTS3 tables"). The full-text index
  allows the user to efficiently query the database for all rows that contain
  one or more instances specified word (hereafter a "token", even if the table
139
140
141
142
143
144
145
146
147
148





149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166




167
168
169
170
171
172
173
    \[INTEGER PRIMARY KEY\], except that values remain unchanged if the
    database is rebuilt using the \[VACUUM\] command. For FTS3 tables, "docid"
    is allowed as an alias along with the usual "rowid", "oid" and "_oid_"
    identifiers. Attempting to insert or update a row with a docid value that 
    already exists in the table is an error, just as it would be with an 
    ordinary SQLite table.

  <p class=todo>
    The problem when a row is inserted or updated and distinct values are 
    specified for the rowid and docid.






[Code {
  <i>-- Create an FTS3 table</i>
  CREATE VIRTUAL TABLE pages USING fts3(title, body);

  <i>-- Insert a row with a specific docid value.</i>
  INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...');

  <i>-- Insert a row and allow FTS3 to assign a docid value using the same algorithm as</i>
  <i>-- SQLite uses for ordinary tables. In this case the new docid will be 54,</i>
  <i>-- one greater than the largest docid currently present in the table.</i>
  INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...');

  <i>-- Change the title of the row just inserted.</i>
  UPDATE pages SET title = 'Download SQLite' WHERE rowid = last_insert_rowid();

  <i>-- Delete the entire table contents.</i>
  DELETE FROM pages;




}]

  <p>
    To support full-text queries, FTS3 maintains an inverted index that maps
    from each unique term or word that appears in the dataset to the locations
    in which it appears within the table contents. For the curious, a 
    complete description of the \[segment btree|data structure\] used to store







|
|
|
>
>
>
>
>


















>
>
>
>







154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
    \[INTEGER PRIMARY KEY\], except that values remain unchanged if the
    database is rebuilt using the \[VACUUM\] command. For FTS3 tables, "docid"
    is allowed as an alias along with the usual "rowid", "oid" and "_oid_"
    identifiers. Attempting to insert or update a row with a docid value that 
    already exists in the table is an error, just as it would be with an 
    ordinary SQLite table.

  <p>
    There is one other subtle difference between "docid" and the normal SQLite
    aliases for the rowid column. Normally, if an INSERT or UPDATE statement 
    assigns discreet values to two or more aliases of the rowid column, SQLite 
    writes the rightmost of of such values specified in the INSERT or UPDATE
    statement to the database. However, assigning a non-NULL value to both
    the "docid" and one or more of the SQLite rowid aliases when inserting or
    updating an FTS3 table is considered an error. See below for an example.

[Code {
  <i>-- Create an FTS3 table</i>
  CREATE VIRTUAL TABLE pages USING fts3(title, body);

  <i>-- Insert a row with a specific docid value.</i>
  INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...');

  <i>-- Insert a row and allow FTS3 to assign a docid value using the same algorithm as</i>
  <i>-- SQLite uses for ordinary tables. In this case the new docid will be 54,</i>
  <i>-- one greater than the largest docid currently present in the table.</i>
  INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...');

  <i>-- Change the title of the row just inserted.</i>
  UPDATE pages SET title = 'Download SQLite' WHERE rowid = last_insert_rowid();

  <i>-- Delete the entire table contents.</i>
  DELETE FROM pages;

  <i>-- The following is an error. It is not possible to assign non-NULL values to both</i>
  <i>-- the rowid and docid columns of an FTS3 table.</i>
  INSERT INTO pages(rowid, docid, title, body) VALUES(1, 2, 'A title', 'A document body');
}]

  <p>
    To support full-text queries, FTS3 maintains an inverted index that maps
    from each unique term or word that appears in the dataset to the locations
    in which it appears within the table contents. For the curious, a 
    complete description of the \[segment btree|data structure\] used to store
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
    The optimize() function returns a text value. If the index was already
    optimized when it was called the text is "Index already optimal". 
    Otherwise if the index was not already optimized, it is made so and
    the text "Index optimized" returned.

[h2 "Querying FTS3 Tables" {} {simple fts3 queries}]

<p class=todo>
  Some note clarifying what is meant by a "term", and how that relates to
  the specific tokenizer used by the FTS3 table in question.

<p>
  As for all other SQLite tables, virtual or otherwise, data is retrieved
  from FTS3 tables using a \[SELECT\] statement.

<p>
  FTS3 tables can be queried efficiently using SELECT statements of two
  different forms:







<
<
<
<







223
224
225
226
227
228
229




230
231
232
233
234
235
236
    The optimize() function returns a text value. If the index was already
    optimized when it was called the text is "Index already optimal". 
    Otherwise if the index was not already optimized, it is made so and
    the text "Index optimized" returned.

[h2 "Querying FTS3 Tables" {} {simple fts3 queries}]





<p>
  As for all other SQLite tables, virtual or otherwise, data is retrieved
  from FTS3 tables using a \[SELECT\] statement.

<p>
  FTS3 tables can be queried efficiently using SELECT statements of two
  different forms:
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265





266
267
268
269
270
271
272
273
  SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20;   <i>-- Slow. Linear scan.</i>
  SELECT * FROM mail WHERE subject = 'database';      <i>-- Slow. Linear scan.</i>
  SELECT * FROM mail WHERE subject MATCH 'database';  <i>-- Fast. Full-text query.</i>
}]

<p>
  In all of the full-text queries above, the right-hand operand of the MATCH
  operator is a string consisting of a single word. In this case, the MATCH
  expression evaluates to true for all documents that contain one or more 
  instances of the specified word ("sqlite", "search" or "database", depending 
  on which example you look at). Specifying a single word as the right-hand
  operand of the MATCH operator results in the simplest (and most common) type 
  of full-text query possible. However more complicated queries are possible,
  including phrase searches, term-prefix searches and searches for documents 
  containing combinations of terms occuring within a defined proximity of each
  other. The various ways in which the full-text index may be queried are
  \[FTS3 MATCH|described below\].

<p>





  The paragraph above notes that a MATCH operator with a simple word as the
  right-hand operand evaluates to true for all documents that contain the
  specified term. In this context, the "document" may refer to either the 
  data stored in a single column of a row of an FTS3 table, or to the contents
  of all columns in a single row, depending on the identifier used as the
  left-hand operand to the MATCH operator. If the identifier specified as
  the left-hand operand of the MATCH operator is an FTS3 table column name,
  then the document that the search term must be contained in is the value







|


|








>
>
>
>
>
|







267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
  SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20;   <i>-- Slow. Linear scan.</i>
  SELECT * FROM mail WHERE subject = 'database';      <i>-- Slow. Linear scan.</i>
  SELECT * FROM mail WHERE subject MATCH 'database';  <i>-- Fast. Full-text query.</i>
}]

<p>
  In all of the full-text queries above, the right-hand operand of the MATCH
  operator is a string consisting of a single term. In this case, the MATCH
  expression evaluates to true for all documents that contain one or more 
  instances of the specified word ("sqlite", "search" or "database", depending 
  on which example you look at). Specifying a single term as the right-hand
  operand of the MATCH operator results in the simplest (and most common) type 
  of full-text query possible. However more complicated queries are possible,
  including phrase searches, term-prefix searches and searches for documents 
  containing combinations of terms occuring within a defined proximity of each
  other. The various ways in which the full-text index may be queried are
  \[FTS3 MATCH|described below\].

<p>
  Normally, full-text queries are case-insensitive. However, this (and other
  details) are depend on the specific \[tokenizer\] used by the FTS3 table
  being queried. Refer to the section on \[tokenizer|tokenizers\] for details.

<p>
  The paragraph above notes that a MATCH operator with a simple term as the
  right-hand operand evaluates to true for all documents that contain the
  specified term. In this context, the "document" may refer to either the 
  data stored in a single column of a row of an FTS3 table, or to the contents
  of all columns in a single row, depending on the identifier used as the
  left-hand operand to the MATCH operator. If the identifier specified as
  the left-hand operand of the MATCH operator is an FTS3 table column name,
  then the document that the search term must be contained in is the value
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451



452
453
454
455
456
457
458
459
460
























461
462
463
464
465
466
467
  \[simple fts3 queries|Simple FTS3 queries\] that return all documents that 
  contain a given term are described above. In that discussion the right-hand
  operand of the MATCH operator was assumed to be a string consisting of a
  single term. This section describes the more complex query types supported 
  by FTS3 tables, and how they may be utilized by specifying more a more
  complex query expression as the right-hand operand of a MATCH operator.

<p class=todo>
  Where should the column selection syntax go?

<p>
  FTS3 tables support three basic query types:

<ul>
  <li><p><b>Token or token prefix queries</b>. 
    An FTS3 table may be queried for all documents that contain a specified
    term (the \[simple fts3 queries|simple case\] described above), or for
    all documents that contain a term with a specified prefix. As we have
    seen, the query expression for a specific term is simply the term itself.
    The query expression used to search for a term prefix is the prefix
    itself with a '*' character appended to it. For example:
</ul>

[Code {



  <i>-- Query for all documents containing the term "linux":</i>
  SELECT * FROM docs WHERE docs MATCH 'linux';

  <i>-- Query for all documents containing a term with the prefix "lin". This will match</i>
  <i>-- all documents that contain "linux", but also those that contain terms "linear",</i>
  <i>--"linker", "linguistic" and so on.</i>
  SELECT * FROM docs WHERE docs MATCH 'lin*';
}]
   
























<ul>
  <li><p><b>Phrase queries</b>.
    A phrase query is a query that retreives all documents that contain a
    nominated set of terms or term prefixes in a specified order with no
    intervening tokens. Phrase queries are specified by enclosing a space
    separated sequence of terms or term prefixes in double quotes (").
    For example:







<
<
<














>
>
>








|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







453
454
455
456
457
458
459



460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
  \[simple fts3 queries|Simple FTS3 queries\] that return all documents that 
  contain a given term are described above. In that discussion the right-hand
  operand of the MATCH operator was assumed to be a string consisting of a
  single term. This section describes the more complex query types supported 
  by FTS3 tables, and how they may be utilized by specifying more a more
  complex query expression as the right-hand operand of a MATCH operator.




<p>
  FTS3 tables support three basic query types:

<ul>
  <li><p><b>Token or token prefix queries</b>. 
    An FTS3 table may be queried for all documents that contain a specified
    term (the \[simple fts3 queries|simple case\] described above), or for
    all documents that contain a term with a specified prefix. As we have
    seen, the query expression for a specific term is simply the term itself.
    The query expression used to search for a term prefix is the prefix
    itself with a '*' character appended to it. For example:
</ul>

[Code {
  <i>-- Virtual table declaration</i>
  CREATE VIRTUAL TABLE docs USING fts3(title, body);

  <i>-- Query for all documents containing the term "linux":</i>
  SELECT * FROM docs WHERE docs MATCH 'linux';

  <i>-- Query for all documents containing a term with the prefix "lin". This will match</i>
  <i>-- all documents that contain "linux", but also those that contain terms "linear",</i>
  <i>--"linker", "linguistic" and so on.</i>
  SELECT * FROM docs WHERE docs MATCH 'lin*';
}]

<ul>
  <li style="list-style:none"><p>
    Normally, a token or token prefix query is matched against the FTS3 table 
    column specified as the right-hand side of the MATCH operator. Or, if the
    special column with the same name as the FTS3 table itself is specified,
    against all columns. This may be overridden by specifying a column-name
    followed by a ":" character before a basic term query. There may be space
    between the ":" and the term to query for, but not between the column-name
    and the ":" character. For example:
</ul>
   
[Code {
  <i>-- Query the database for documents for which the term "linux" appears in</i>
  <i>-- the document title, and the term "problems" appears in either the title</i>
  <i>-- or body of the document.</i>
  SELECT * FROM docs WHERE docs MATCH 'title:linux problems';

  <i>-- Query the database for documents for which the term "linux" appears in</i>
  <i>-- the document title, and the term "driver" appears in the body of the document</i>
  <i>-- ("driver" may also appear in the title, but this alone will not satisfy the</i>.
  <i>-- query criteria).</i>
  SELECT * FROM docs WHERE body MATCH 'title:linux driver';
}]

<ul>
  <li><p><b>Phrase queries</b>.
    A phrase query is a query that retreives all documents that contain a
    nominated set of terms or term prefixes in a specified order with no
    intervening tokens. Phrase queries are specified by enclosing a space
    separated sequence of terms or term prefixes in double quotes (").
    For example: