Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Fix some problems with fts3.html. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
0b949f6763debd7dcd72f92b597eeb0a |
User & Date: | dan 2009-11-30 11:52:39.000 |
Context
2009-11-30
| ||
13:55 | Fix typo on the famous.html page. Other updates prior to publication. (check-in: 5572b2965d user: drh tags: trunk) | |
11:52 | Fix some problems with fts3.html. (check-in: 0b949f6763 user: dan tags: trunk) | |
08:41 | Add text to fts3.html describing the snippet and offsets functions. (check-in: d91a4c7249 user: dan tags: trunk) | |
Changes
Changes to pages/fts3.in.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <title>SQLite FTS3 Extension</title> <tcl> hd_keywords *fts3 FTS3 source [file join $::DOC pages fancyformat.tcl] fancyformat_document "SQLite FTS3 Extension" {} { <h2 style="margin-left:1.0em"> Overview</h2> [h1 "Introduction to FTS3"] <p> The FTS3 extension module allows users to create special tables with a built-in full-text index (hereafter "FTS3 tables"). The full-text index allows the user to efficiently query the database for all rows that contain one or more instances specified word (hereafter a "token", even if the table | > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | <title>SQLite FTS3 Extension</title> <tcl> hd_keywords *fts3 FTS3 source [file join $::DOC pages fancyformat.tcl] fancyformat_document "SQLite FTS3 Extension" {} { <h2 style="margin-left:1.0em"> Overview</h2> <p> FTS3 is an SQLite virtual table module that allows users to perform full-text searches on a set of documents. The most common (and effective) way to describe full-text searches is "what Google, Yahoo and Altavista do with documents placed on the World Wide Web". Users input a term, or series of terms, perhaps connected by a binary operator or grouped together into a phrase, and the full-text query system finds the set of documents that best matches those terms considering the operators and groupings the user has specified. This document describes the deployment and usage of FTS3. <p> FTS3 was originally contributed to the SQLite project by Scott Hess and <a href="http://www.google.com">Google</a>. It is now developed and maintained as part of SQLite. [h1 "Introduction to FTS3"] <p> The FTS3 extension module allows users to create special tables with a built-in full-text index (hereafter "FTS3 tables"). The full-text index allows the user to efficiently query the database for all rows that contain one or more instances specified word (hereafter a "token", even if the table |
︙ | ︙ | |||
139 140 141 142 143 144 145 | \[INTEGER PRIMARY KEY\], except that values remain unchanged if the database is rebuilt using the \[VACUUM\] command. For FTS3 tables, "docid" is allowed as an alias along with the usual "rowid", "oid" and "_oid_" identifiers. Attempting to insert or update a row with a docid value that already exists in the table is an error, just as it would be with an ordinary SQLite table. | | | | > > > > > > > > > | 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | \[INTEGER PRIMARY KEY\], except that values remain unchanged if the database is rebuilt using the \[VACUUM\] command. For FTS3 tables, "docid" is allowed as an alias along with the usual "rowid", "oid" and "_oid_" identifiers. Attempting to insert or update a row with a docid value that already exists in the table is an error, just as it would be with an ordinary SQLite table. <p> There is one other subtle difference between "docid" and the normal SQLite aliases for the rowid column. Normally, if an INSERT or UPDATE statement assigns discreet values to two or more aliases of the rowid column, SQLite writes the rightmost of of such values specified in the INSERT or UPDATE statement to the database. However, assigning a non-NULL value to both the "docid" and one or more of the SQLite rowid aliases when inserting or updating an FTS3 table is considered an error. See below for an example. [Code { <i>-- Create an FTS3 table</i> CREATE VIRTUAL TABLE pages USING fts3(title, body); <i>-- Insert a row with a specific docid value.</i> INSERT INTO pages(docid, title, body) VALUES(53, 'Home Page', 'SQLite is a software...'); <i>-- Insert a row and allow FTS3 to assign a docid value using the same algorithm as</i> <i>-- SQLite uses for ordinary tables. In this case the new docid will be 54,</i> <i>-- one greater than the largest docid currently present in the table.</i> INSERT INTO pages(title, body) VALUES('Download', 'All SQLite source code...'); <i>-- Change the title of the row just inserted.</i> UPDATE pages SET title = 'Download SQLite' WHERE rowid = last_insert_rowid(); <i>-- Delete the entire table contents.</i> DELETE FROM pages; <i>-- The following is an error. It is not possible to assign non-NULL values to both</i> <i>-- the rowid and docid columns of an FTS3 table.</i> INSERT INTO pages(rowid, docid, title, body) VALUES(1, 2, 'A title', 'A document body'); }] <p> To support full-text queries, FTS3 maintains an inverted index that maps from each unique term or word that appears in the dataset to the locations in which it appears within the table contents. For the curious, a complete description of the \[segment btree|data structure\] used to store |
︙ | ︙ | |||
199 200 201 202 203 204 205 | The optimize() function returns a text value. If the index was already optimized when it was called the text is "Index already optimal". Otherwise if the index was not already optimized, it is made so and the text "Index optimized" returned. [h2 "Querying FTS3 Tables" {} {simple fts3 queries}] | < < < < | 223 224 225 226 227 228 229 230 231 232 233 234 235 236 | The optimize() function returns a text value. If the index was already optimized when it was called the text is "Index already optimal". Otherwise if the index was not already optimized, it is made so and the text "Index optimized" returned. [h2 "Querying FTS3 Tables" {} {simple fts3 queries}] <p> As for all other SQLite tables, virtual or otherwise, data is retrieved from FTS3 tables using a \[SELECT\] statement. <p> FTS3 tables can be queried efficiently using SELECT statements of two different forms: |
︙ | ︙ | |||
247 248 249 250 251 252 253 | SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20; <i>-- Slow. Linear scan.</i> SELECT * FROM mail WHERE subject = 'database'; <i>-- Slow. Linear scan.</i> SELECT * FROM mail WHERE subject MATCH 'database'; <i>-- Fast. Full-text query.</i> }] <p> In all of the full-text queries above, the right-hand operand of the MATCH | | | > > > > > | | 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 | SELECT * FROM mail WHERE rowid BETWEEN 15 AND 20; <i>-- Slow. Linear scan.</i> SELECT * FROM mail WHERE subject = 'database'; <i>-- Slow. Linear scan.</i> SELECT * FROM mail WHERE subject MATCH 'database'; <i>-- Fast. Full-text query.</i> }] <p> In all of the full-text queries above, the right-hand operand of the MATCH operator is a string consisting of a single term. In this case, the MATCH expression evaluates to true for all documents that contain one or more instances of the specified word ("sqlite", "search" or "database", depending on which example you look at). Specifying a single term as the right-hand operand of the MATCH operator results in the simplest (and most common) type of full-text query possible. However more complicated queries are possible, including phrase searches, term-prefix searches and searches for documents containing combinations of terms occuring within a defined proximity of each other. The various ways in which the full-text index may be queried are \[FTS3 MATCH|described below\]. <p> Normally, full-text queries are case-insensitive. However, this (and other details) are depend on the specific \[tokenizer\] used by the FTS3 table being queried. Refer to the section on \[tokenizer|tokenizers\] for details. <p> The paragraph above notes that a MATCH operator with a simple term as the right-hand operand evaluates to true for all documents that contain the specified term. In this context, the "document" may refer to either the data stored in a single column of a row of an FTS3 table, or to the contents of all columns in a single row, depending on the identifier used as the left-hand operand to the MATCH operator. If the identifier specified as the left-hand operand of the MATCH operator is an FTS3 table column name, then the document that the search term must be contained in is the value |
︙ | ︙ | |||
428 429 430 431 432 433 434 | \[simple fts3 queries|Simple FTS3 queries\] that return all documents that contain a given term are described above. In that discussion the right-hand operand of the MATCH operator was assumed to be a string consisting of a single term. This section describes the more complex query types supported by FTS3 tables, and how they may be utilized by specifying more a more complex query expression as the right-hand operand of a MATCH operator. | < < < > > > | > > > > > > > > > > > > > > > > > > > > > > > > | 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 | \[simple fts3 queries|Simple FTS3 queries\] that return all documents that contain a given term are described above. In that discussion the right-hand operand of the MATCH operator was assumed to be a string consisting of a single term. This section describes the more complex query types supported by FTS3 tables, and how they may be utilized by specifying more a more complex query expression as the right-hand operand of a MATCH operator. <p> FTS3 tables support three basic query types: <ul> <li><p><b>Token or token prefix queries</b>. An FTS3 table may be queried for all documents that contain a specified term (the \[simple fts3 queries|simple case\] described above), or for all documents that contain a term with a specified prefix. As we have seen, the query expression for a specific term is simply the term itself. The query expression used to search for a term prefix is the prefix itself with a '*' character appended to it. For example: </ul> [Code { <i>-- Virtual table declaration</i> CREATE VIRTUAL TABLE docs USING fts3(title, body); <i>-- Query for all documents containing the term "linux":</i> SELECT * FROM docs WHERE docs MATCH 'linux'; <i>-- Query for all documents containing a term with the prefix "lin". This will match</i> <i>-- all documents that contain "linux", but also those that contain terms "linear",</i> <i>--"linker", "linguistic" and so on.</i> SELECT * FROM docs WHERE docs MATCH 'lin*'; }] <ul> <li style="list-style:none"><p> Normally, a token or token prefix query is matched against the FTS3 table column specified as the right-hand side of the MATCH operator. Or, if the special column with the same name as the FTS3 table itself is specified, against all columns. This may be overridden by specifying a column-name followed by a ":" character before a basic term query. There may be space between the ":" and the term to query for, but not between the column-name and the ":" character. For example: </ul> [Code { <i>-- Query the database for documents for which the term "linux" appears in</i> <i>-- the document title, and the term "problems" appears in either the title</i> <i>-- or body of the document.</i> SELECT * FROM docs WHERE docs MATCH 'title:linux problems'; <i>-- Query the database for documents for which the term "linux" appears in</i> <i>-- the document title, and the term "driver" appears in the body of the document</i> <i>-- ("driver" may also appear in the title, but this alone will not satisfy the</i>. <i>-- query criteria).</i> SELECT * FROM docs WHERE body MATCH 'title:linux driver'; }] <ul> <li><p><b>Phrase queries</b>. A phrase query is a query that retreives all documents that contain a nominated set of terms or term prefixes in a specified order with no intervening tokens. Phrase queries are specified by enclosing a space separated sequence of terms or term prefixes in double quotes ("). For example: |
︙ | ︙ |