Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add text to fts3.html describing the snippet and offsets functions. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
d91a4c72493186fafeee787ef7e14ca3 |
User & Date: | dan 2009-11-30 08:41:45.000 |
Context
2009-11-30
| ||
11:52 | Fix some problems with fts3.html. (check-in: 0b949f6763 user: dan tags: trunk) | |
08:41 | Add text to fts3.html describing the snippet and offsets functions. (check-in: d91a4c7249 user: dan tags: trunk) | |
2009-11-28
| ||
19:17 | Add a description of tokenizers to fts3.in. (check-in: f1940858bf user: dan tags: trunk) | |
Changes
Changes to pages/fts3.in.
︙ | ︙ | |||
106 107 108 109 110 111 112 113 114 115 116 117 118 119 | <i>-- Create an FTS3 table named "papers" with two columns that uses</i> <i>-- the tokenizer "porter".</i> CREATE VIRTUAL TABLE papers USING fts3(author, document, tokenize=porter); <i>-- Create an FTS3 table with a single column - "content" - that uses</i> <i>-- the "simple" tokenizer.</i> CREATE VIRTUAL TABLE data USING fts3(tokenize=simple); }] <p> FTS3 tables may be dropped from the database using an ordinary \[DROP TABLE\] statement. For example: [Code { | > > > > | 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | <i>-- Create an FTS3 table named "papers" with two columns that uses</i> <i>-- the tokenizer "porter".</i> CREATE VIRTUAL TABLE papers USING fts3(author, document, tokenize=porter); <i>-- Create an FTS3 table with a single column - "content" - that uses</i> <i>-- the "simple" tokenizer.</i> CREATE VIRTUAL TABLE data USING fts3(tokenize=simple); <i>-- Create an FTS3 table with two columns that uses the "icu" tokenizer.</i> <i>-- The qualifier "en_AU" is passed to the tokenizer implementation</i> CREATE VIRTUAL TABLE names USING fts3(a, b, tokenize=icu en_AU); }] <p> FTS3 tables may be dropped from the database using an ordinary \[DROP TABLE\] statement. For example: [Code { |
︙ | ︙ | |||
185 186 187 188 189 190 191 | [Code { <i>-- Optimize the internal structure of FTS3 table "docs".</i> SELECT optimize(docs) FROM docs LIMIT 1; }] <p> The statement above may appear syntacticly incorrect to some. Refer to | | < | 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | [Code { <i>-- Optimize the internal structure of FTS3 table "docs".</i> SELECT optimize(docs) FROM docs LIMIT 1; }] <p> The statement above may appear syntacticly incorrect to some. Refer to the section describing the \[simple fts3 queries\] for an explanation. <p> The optimize() function returns a text value. If the index was already optimized when it was called the text is "Index already optimal". Otherwise if the index was not already optimized, it is made so and the text "Index optimized" returned. |
︙ | ︙ | |||
293 294 295 296 297 298 299 | At first glance, the final two full-text queries in the example above seem to be syntacticly incorrect, as there is a table name ("mail") used as an SQL expression. The reason this is acceptable is that each FTS3 table actually has a \[sqlite3_declare_vtab|HIDDEN\] column with the same name as the table itself (in this case, "mail"). The value stored in this column is not meaningful to the application, but can be used as the left-hand operand to a MATCH operator. This special column may also be | | | 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 | At first glance, the final two full-text queries in the example above seem to be syntacticly incorrect, as there is a table name ("mail") used as an SQL expression. The reason this is acceptable is that each FTS3 table actually has a \[sqlite3_declare_vtab|HIDDEN\] column with the same name as the table itself (in this case, "mail"). The value stored in this column is not meaningful to the application, but can be used as the left-hand operand to a MATCH operator. This special column may also be passed as an argument to the \[snippet()|FTS3 auxillary functions\]. <p> The following example illustrates the above. The expressions "docs", "docs.docs" and "main.docs.docs" all refer to column "docs". However, the expression "main.docs" does not refer to any column. It could be used to refer to a table, but a table name is not allowed in the context in which it is used below. |
︙ | ︙ | |||
346 347 348 349 350 351 352 | rowid column supported by all \[virtual tables\]. <li><p> The \[FTS3 MATCH\] operator is supported for queries based on the built-in full-text index. <li><p> | | | | < < < | 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 | rowid column supported by all \[virtual tables\]. <li><p> The \[FTS3 MATCH\] operator is supported for queries based on the built-in full-text index. <li><p> The FTS3 auxillary functions, \[snippet|snippet() and offsets()\], are available to support full-text queries. <li><p> Each FTS3 table has a \[sqlite3_declare_vtab()|HIDDEN column\] with the same name as the table itself. The value contained in each row for the special column is only useful when used on the left-hand side of a \[FTS3 MATCH|MATCH\] operator, or when specified as an argument to one of the \[snippet|FTS3 auxillary functions\]. </ol> [h1 "Compiling and Enabling FTS3" {} {compile fts3}] <p> Although FTS3 is distributed as part of the SQLite source code, it is not enabled by default. To build SQLite with FTS3 functionality enabled, define the preprocessor macro \[SQLITE_ENABLE_FTS3\] when compiling. New applications should also define the \[SQLITE_ENABLE_FTS3_PARENTHESIS\] macro to enable the \[enhanced query syntax\] (see below). Usually, this is done by adding the following two switches to the compiler command line: [Code { -DSQLITE_ENABLE_FTS3 -DSQLITE_ENABLE_FTS3_PARENTHESIS }] <p> If using the amalgamation autoconf based build system, setting the CPPFLAGS |
︙ | ︙ | |||
400 401 402 403 404 405 406 407 408 409 410 411 412 413 | \[SQLITE_OMIT_VIRTUALTABLE\] option. <p> If an SQLite build does not include FTS3, then any attempt to prepare an SQL statement to create an FTS3 table or to drop or access an existing FTS3 table in any way will fail. The error message returned will be similar to "no such module: fts3". [h1 "Full-text Index Queries" {} {FTS3 MATCH}] <p> The most useful thing about FTS3 tables is the queries that may be performed using the built-in full-text index. Full-text queries are | > > > > > > > > > > > | 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 | \[SQLITE_OMIT_VIRTUALTABLE\] option. <p> If an SQLite build does not include FTS3, then any attempt to prepare an SQL statement to create an FTS3 table or to drop or access an existing FTS3 table in any way will fail. The error message returned will be similar to "no such module: fts3". <p> If the C version of the <a href=http://site.icu-project.org/>ICU library</a> is available, then FTS3 may also be compiled with the SQLITE_ENABLE_ICU pre-processor macro defined. Compiling with this macro enables an FTS3 \[tokenizer\] that uses the ICU library to split a document into terms (words) using the conventions for a specified language and locale. [Code { -DSQLITE_ENABLE_ICU }] [h1 "Full-text Index Queries" {} {FTS3 MATCH}] <p> The most useful thing about FTS3 tables is the queries that may be performed using the built-in full-text index. Full-text queries are |
︙ | ︙ | |||
557 558 559 560 561 562 563 | of the full-text query syntax, the "standard" query syntax and the "enhanced" query syntax. The basic term, term-prefix, phrase and NEAR queries described above are the same in both versions of the syntax. The way in which set operations are specified is slightly different. The following two sub-sections describe the part of the two query syntaxes that pertains to set operations. Refer to the description of how to \[compile fts3\] for compilation notes. | | | 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 | of the full-text query syntax, the "standard" query syntax and the "enhanced" query syntax. The basic term, term-prefix, phrase and NEAR queries described above are the same in both versions of the syntax. The way in which set operations are specified is slightly different. The following two sub-sections describe the part of the two query syntaxes that pertains to set operations. Refer to the description of how to \[compile fts3\] for compilation notes. [h2 "Set Operations Using The Enhanced Query Syntax" {} {enhanced query syntax}] <p> The enhanced query syntax supports the AND, OR and NOT binary set operators. Each of the two operands to an operator may be a basic FTS3 query, or the result of another AND, OR or NOT set operation. Operators must be entered using capital letters. Otherwise, they are interpreted as basic term queries instead of set operators. |
︙ | ︙ | |||
685 686 687 688 689 690 691 | [Code { <i>-- Search for the set of documents that contain the term "sqlite" but do</i> <i>-- not contain the term "database".</i> SELECT * FROM docs WHERE docs MATCH 'sqlite -database'; }] <ol> | | > | > | 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 | [Code { <i>-- Search for the set of documents that contain the term "sqlite" but do</i> <i>-- not contain the term "database".</i> SELECT * FROM docs WHERE docs MATCH 'sqlite -database'; }] <ol> <li value=4><p> The relative precedence of the set operations is different. In particular, using the standard query syntax the "OR" operator has a higher precedence than "AND". The precedence of operators when using the standard query syntax is: </ol> [Table] [Tr]<th>Operator<th>Standard Query Syntax Precedence [Tr]<td>Unary "-" <td> Highest precedence (tightest grouping). [Tr]<td>OR <td> [Tr]<td>AND <td> Lowest precedence (loosest grouping). |
︙ | ︙ | |||
709 710 711 712 713 714 715 | <i>-- Search for documents that contains at least one of the terms "database"</i> <i>-- and "sqlite", and also contains the term "library". Because of the differences</i> <i>-- in operator precedences, this query would have a different interpretation using</i> <i>-- the enhanced query syntax.</i> SELECT * FROM docs WHERE docs MATCH 'sqlite OR database library'; }] | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 | <i>-- Search for documents that contains at least one of the terms "database"</i> <i>-- and "sqlite", and also contains the term "library". Because of the differences</i> <i>-- in operator precedences, this query would have a different interpretation using</i> <i>-- the enhanced query syntax.</i> SELECT * FROM docs WHERE docs MATCH 'sqlite OR database library'; }] [h1 "Auxillary functions - Snippets and Offsets" {} snippet offsets] <p> The FTS3 module provides two special SQL scalar functions that may be useful to the developers of full-text query systems, "snippet" and "offsets". The purpose of both functions is to allow the user to identify the location of queried terms in the returned documents. <p> The first argument to both the snippet and offsets SQL scalar functions must be the the special hidden column of an FTS3 table that has the same name as the table (see above). For example, given an FTS3 table named "mail": [Code { SELECT offsets(mail) FROM mail WHERE mail MATCH <full-text query expression>; SELECT snippet(mail) FROM mail WHERE mail MATCH <full-text query expression>; }] <p> The two auxillary functions are only useful within a SELECT statement that uses the FTS3 table's full-text index. If used within a SELECT that uses the "query by rowid" or "linear scan" strategies, both functions return an empty string. <p> For a SELECT query that uses the full-text index, the offsets() function returns a text value containing a series of space-separated integers. For each occurence of a queried term in the document, there are four integers in the returned list. Each set of four integers is interpreted as follows: [Table] [Tr]<th>Integer <th>Interpretation [Tr]<td>0 <td>The column number that the term instance occurs in (0 for the leftmost column of the FTS3 table, 1 for the next leftmost, etc.). [Tr]<td>1 <td>The term number of the matching term within the full-text query expression. Terms within a query expression are numbered starting from 0 in the order that they occur. [Tr]<td>2 <td>The byte offset of the matching term within the column. [Tr]<td>3 <td>The size of the matching term in bytes. </table> <p> The following block contains examples that use the offsets function. [Code { CREATE VIRTUAL TABLE mail USING fts3(subject, body); INSERT INTO mail VALUES('hello world', 'This message is a hello world message.'); INSERT INTO mail VALUES('urgent: serious', 'This mail is seen as a more serious mail'); <i>-- The following query returns a single row (as it matches only the first</i> <i>-- entry in table "mail". The text returned by the offsets function is</i> <i>-- "0 0 6 5 1 0 24 5".</i> <i>--</i> <i>-- The first set of four integers in the result indicate that column 0</i> <i>-- contains an instance of term 0 ("world") at byte offset 6. The term instance</i> <i>-- is 5 bytes in size. The second set of four integers shows that column 1</i> <i>-- of the matched row contains an instance of term 0 ("world") at byte offset</i> <i>-- 24. Again, the term instance is 5 bytes in size.</i> SELECT offsets(mail) FROM mail WHERE mail MATCH 'world'; <i>-- The following query returns also matches only the first row in table "mail".</i> <i>-- In this case the returned text is "1 0 5 7 1 0 30 7".</i> SELECT offsets(mail) FROM mail WHERE mail MATCH 'message'; <i>-- The following query matches the second row in table "mail". It returns the</i> <i>-- text "1 0 28 7 1 1 36 4". Only those occurences of terms "serious" and "mail"</i> <i>-- that are part of an instance of the phrase "serious mail" are identified; the</i> <i>-- other occurences of "serious" and "mail" are ignored.</i> SELECT offsets(mail) FROM mail WHERE mail MATCH '"serious mail"'; }] <p> The snippet function is used to create formatted fragments of document text for display as part of a full-text query results report. The snippet function may be passed between one and four arguments, as follows: [Table] [Tr]<th>Argument <th>Default Value <th>Description [Tr]<td>0 <td>N/A <td> The first argument to the snippet function must always be the special hidden column of the FTS3 table that takes the same name as the table itself. [Tr]<td>1 <td>"<b>" <td> The "start match" text. [Tr]<td>2 <td>"<b>" <td> The "end match" text. [Tr]<td>3 <td>"<b>...</b>" <td> The "ellipses" text. </table> <p> The snippet function returns a fragment of text from the original document surrounding the term identified by the first four integers that would be returned by the offsets function if it were used in a similar context. In most cases, the selected fragment contains 40 or less bytes of text before the identified term, and 40 or more bytes of text following the identified term. Slightly less than 40 bytes of preceding or following text is provided so that the fragment does not contain any partial terms. If the first term (that would be) identified by the offsets function is less than 40 bytes from the beginning or end of the document, then extra text may appear before or after the identified term within the fragment to make up the difference. <p> If the returned fragment of text does not start at the start of the entire document, then the "ellipses" text (see table above) is prepended to the fragment before it is returned. Similarly, if the end of the returned fragment is not also the end of the entire document, the "ellipses" text is appended to it before it is returned. <p> Before it is returned, the "start match" text is inserted into the fragment immediately before any terms within the fragment that would have been identified by the offsets function (not just the first one) were it invoked in the same context. The "end match" is inserted immediately following all such terms. [Code { <b>Note: In this block of examples, newlines and whitespace characters have been inserted into the document inserted into the FTS3 table, and the expected results described in SQL comments. This is done to enhance readability only, they would not be present in actual SQLite commands or output.</b> <i>-- Create and populate an FTS3 table.</i> CREATE VIRTUAL TABLE text USING fts3(); INSERT INTO text VALUES(' During 30 Nov-1 Dec, 2-3oC drops. Cool in the upper portion, minimum temperature 14-16oC and cool elsewhere, minimum temperature 17-20oC. Cold to very cold on mountaintops, minimum temperature 6-12oC. Northeasterly winds 15-30 km/hr. After that, temperature increases. Northeasterly winds 15-30 km/hr. '); <i>-- The following query returns the text value:</i> <i>--</i> <i>-- "<b>...</b> elsewhere, minimum temperature 17-20oC. <b>Cold</b> to very <b>cold</b> on</i> <i>-- mountaintops, minimum <b>...</b>".</i> <i>--</i> SELECT snippet(text) FROM text WHERE text MATCH 'cold'; <i>-- The following query returns the text value:</i> <i>--</i> <i>-- "... 2-3oC drops. Cool in the upper portion, [minimum] [temperature] 14-16oC and cool</i> <i>-- elsewhere, [minimum] ..."</i> <i>--</i> SELECT snippet(text, '[ ']', '...') FROM text WHERE text MATCH '"min* tem*"' }] [h1 "Tokenizers" tokenizer {tokenizer}] <p> An FTS3 tokenizer is a set of rules for extracting terms from a document or basic FTS3 full-text query. |
︙ | ︙ |