Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Update fts5 docs to reflect support for a user column on the LHS of a MATCH operator. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
ab263df2e645d99e9c6d01c263bd0c11 |
User & Date: | dan 2017-04-13 09:59:26.118 |
Context
2017-04-17
| ||
18:13 | Update the documentation on the foreign_key_check pragma to explain that the second output column is NULL for WITHOUT ROWID child tables. (check-in: af6ceab74d user: drh tags: trunk) | |
2017-04-13
| ||
09:59 | Update fts5 docs to reflect support for a user column on the LHS of a MATCH operator. (check-in: ab263df2e6 user: dan tags: trunk) | |
2017-04-12
| ||
20:02 | Attempt to clarify the operation of the page_size pragma. (check-in: 6d2df6128e user: drh tags: trunk) | |
Changes
Changes to pages/fts5.in.
︙ | ︙ | |||
42 43 44 45 46 47 48 | <ul><li> Using a MATCH operator in the WHERE clause of a SELECT statement, or <li> Using an equals ("=") operator in the WHERE clause of a SELECT statement, or <li> using the [table-valued function] syntax. </ul> <p>If using the MATCH or = operators, the expression to the left of the MATCH | > | | 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | <ul><li> Using a MATCH operator in the WHERE clause of a SELECT statement, or <li> Using an equals ("=") operator in the WHERE clause of a SELECT statement, or <li> using the [table-valued function] syntax. </ul> <p>If using the MATCH or = operators, the expression to the left of the MATCH operator is usually the name of the FTS5 table (the exception is when [FTS5 column filters | specifying a column-filter]). The expression on the right must be a text value specifying the term to search for. For the table-valued function syntax, the term to search for is specified as the first table argument. For example: <codeblock> <i>-- Query for all rows that contain at least once instance of the term</i> <i>-- "fts5" (in any column). The following three queries are equivalent.</i> |
︙ | ︙ | |||
178 179 180 181 182 183 184 185 186 187 188 189 190 191 | <query> := <query> AND <query> <query> := <query> OR <query> <query> := <query> NOT <query> <colspec> := colname <colspec> := { colname1 colname2 ... } </codeblock> <p> Within an FTS expression a <b>string</b> may be specified in one of two ways: <ul> <li> <p>By enclosing it in double quotes ("). Within a string, any embedded double quote characters may be escaped SQL-style - by adding a second double-quote character. | > | 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | <query> := <query> AND <query> <query> := <query> OR <query> <query> := <query> NOT <query> <colspec> := colname <colspec> := { colname1 colname2 ... } </codeblock> <h2 tags="FTS5 Strings">FTS5 Strings</h2> <p> Within an FTS expression a <b>string</b> may be specified in one of two ways: <ul> <li> <p>By enclosing it in double quotes ("). Within a string, any embedded double quote characters may be escaped SQL-style - by adding a second double-quote character. |
︙ | ︙ | |||
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 | do not currently serve any special purpose in FTS5 query expressions may at some point in the future be allowed in barewords or used to implement new query functionality. This means that queries that are currently syntax errors because they include such a character outside of a quoted string may be interpreted differently by some future version of FTS5. </ul> <p> FTS queries are made up of <b>phrases</b>. A phrase is an ordered list of one or more tokens. A string is transformed into a phrase by passing it to the FTS table tokenizer. Two phrases can be concatenated into a single large phrase using the "+" operator. For example, assuming the tokenizer module being used tokenizes the input "one.two.three" to three separate tokens, the following three queries all specify the same phrase: <codeblock> ... MATCH '"one two three"' ... MATCH 'one + two + three' ... MATCH '"one two" + three' ... MATCH 'one.two.three' </codeblock> <p> A phrase matches a document if the document contains at least one sub-sequence of tokens that matches the sequence of tokens that make up the phrase. <p> If a "*" character follows a string within an FTS expression, then the final token extracted from the string is marked as a <b>prefix token</b>. As you might expect, a prefix token matches any document token of which it is a prefix. For example, the first two queries in the following block will match any document that contains the token "one" immediately followed by the token "two" and then any token that begins with "thr". <codeblock> ... MATCH '"one two thr" * ' ... MATCH 'one + two + thr*' ... MATCH '"one two thr*"' <b>-- May not work as expected!</b> </codeblock> <p>The final query in the block above may not work as expected. Because the "*" character is inside the double-quotes, it will be passed to the tokenizer, which will likely discard it (or perhaps, depending on the specific tokenizer in use, include it as part of the final token) instead of recognizing it as a special FTS character. <p>Two or more phrases may be grouped into a <b>NEAR group</b>. A NEAR group is specified by the token "NEAR" (case sensitive) followed by an open parenthesis character, followed by two or more whitespace separated phrases, optionally followed by a comma and the numeric parameter <i>N</i>, followed by a close parenthesis. For example: <codeblock> | > > > > | 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 | do not currently serve any special purpose in FTS5 query expressions may at some point in the future be allowed in barewords or used to implement new query functionality. This means that queries that are currently syntax errors because they include such a character outside of a quoted string may be interpreted differently by some future version of FTS5. </ul> <h2 tags="FTS5 Phrases">FTS5 Phrases</h2> <p> FTS queries are made up of <b>phrases</b>. A phrase is an ordered list of one or more tokens. A string is transformed into a phrase by passing it to the FTS table tokenizer. Two phrases can be concatenated into a single large phrase using the "+" operator. For example, assuming the tokenizer module being used tokenizes the input "one.two.three" to three separate tokens, the following three queries all specify the same phrase: <codeblock> ... MATCH '"one two three"' ... MATCH 'one + two + three' ... MATCH '"one two" + three' ... MATCH 'one.two.three' </codeblock> <p> A phrase matches a document if the document contains at least one sub-sequence of tokens that matches the sequence of tokens that make up the phrase. <h2 tags="FTS5 prefix queries">FTS5 Prefix Queries</h2> <p> If a "*" character follows a string within an FTS expression, then the final token extracted from the string is marked as a <b>prefix token</b>. As you might expect, a prefix token matches any document token of which it is a prefix. For example, the first two queries in the following block will match any document that contains the token "one" immediately followed by the token "two" and then any token that begins with "thr". <codeblock> ... MATCH '"one two thr" * ' ... MATCH 'one + two + thr*' ... MATCH '"one two thr*"' <b>-- May not work as expected!</b> </codeblock> <p>The final query in the block above may not work as expected. Because the "*" character is inside the double-quotes, it will be passed to the tokenizer, which will likely discard it (or perhaps, depending on the specific tokenizer in use, include it as part of the final token) instead of recognizing it as a special FTS character. <h2 tags="FTS5 NEAR queries">FTS5 NEAR Queries</h2> <p>Two or more phrases may be grouped into a <b>NEAR group</b>. A NEAR group is specified by the token "NEAR" (case sensitive) followed by an open parenthesis character, followed by two or more whitespace separated phrases, optionally followed by a comma and the numeric parameter <i>N</i>, followed by a close parenthesis. For example: <codeblock> |
︙ | ︙ | |||
287 288 289 290 291 292 293 294 295 296 297 298 299 300 | ... MATCH 'NEAR(a d e, 5)'; <i>-- Does not match!</i> ... MATCH 'NEAR("a b c d" "b c" "e f", 4)'; <i>-- Matches!</i> ... MATCH 'NEAR("a b c d" "b c" "e f", 3)'; <i>-- Does not match!</i> </codeblock> <p> A single phrase or NEAR group may be restricted to matching text within a specified column of the FTS table by prefixing it with the column name followed by a colon character. Or to a set of columns by prefixing it with a whitespace separated list of column names enclosed in parenthesis ("curly brackets") followed by a colon character. Column names may be specified | > | 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 | ... MATCH 'NEAR(a d e, 5)'; <i>-- Does not match!</i> ... MATCH 'NEAR("a b c d" "b c" "e f", 4)'; <i>-- Matches!</i> ... MATCH 'NEAR("a b c d" "b c" "e f", 3)'; <i>-- Does not match!</i> </codeblock> <h2 tags="FTS5 column filters">FTS5 Column Filters</h2> <p> A single phrase or NEAR group may be restricted to matching text within a specified column of the FTS table by prefixing it with the column name followed by a colon character. Or to a set of columns by prefixing it with a whitespace separated list of column names enclosed in parenthesis ("curly brackets") followed by a colon character. Column names may be specified |
︙ | ︙ | |||
332 333 334 335 336 337 338 339 340 341 342 343 344 345 | <codeblock> <i>-- The following are equivalent:</i> ... MATCH '{a b} : ( {b c} : "hello" AND "world" )' ... MATCH '(b : "hello") AND ({a b} : "world")' </codeblock> <p> Phrases and NEAR groups may be arranged into expressions using <b>boolean operators</b>. In order of precedence, from highest (tightest grouping) to lowest (loosest grouping), the operators are: <table striped=1> <tr><th>Operator <th>Function | > > > > > > > > > > > > > > > > > > > | 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | <codeblock> <i>-- The following are equivalent:</i> ... MATCH '{a b} : ( {b c} : "hello" AND "world" )' ... MATCH '(b : "hello") AND ({a b} : "world")' </codeblock> <p> Finally, a column filter for a single column may be specified by using the column name as the LHS of a MATCH operator (instead of the usual table name). For example: <codeblock> <i>-- Given the following table</i> CREATE VIRTUAL TABLE ft USING fts5(a, b, c); <i>-- The following are equivalent</i> SELECT * FROM ft WHERE b MATCH 'uvw AND xyz'; SELECT * FROM ft WHERE ft MATCH 'b : (uvw AND xyz)'; <i>-- This query cannot match any rows (since all columns are filtered out): </i> SELECT * FROM ft WHERE b MATCH 'a : xyz'; </codeblock> <h2 tags="FTS5 boolean operators">FTS5 Boolean Operators</h2> <p> Phrases and NEAR groups may be arranged into expressions using <b>boolean operators</b>. In order of precedence, from highest (tightest grouping) to lowest (loosest grouping), the operators are: <table striped=1> <tr><th>Operator <th>Function |
︙ | ︙ | |||
1695 1696 1697 1698 1699 1700 1701 | <h3 nonumber> Changes to SELECT statements </h3> <ol> <li> <p>The "docid" alias does not exist. Applications must use "rowid" instead. | | > > > > > > > > | > < < < < < < < < | 1721 1722 1723 1724 1725 1726 1727 1728 1729 1730 1731 1732 1733 1734 1735 1736 1737 1738 1739 1740 1741 1742 1743 1744 1745 1746 1747 1748 1749 1750 1751 1752 1753 1754 | <h3 nonumber> Changes to SELECT statements </h3> <ol> <li> <p>The "docid" alias does not exist. Applications must use "rowid" instead. <li> <p>The behaviour of queries when a column-filter is specified both as part of the FTS query and by using a column as the LHS of a MATCH operator is slightly different. For a table with columns "a" and "b" and a query similar to: <codeblock> ... a MATCH 'b: string' </codeblock> <p>FTS3/4 searches for matches in column "b". However, FTS5 always returns zero rows, as results are first filtered for column "b", then for column "a", leaving no results. In other words, in FTS3/4 the inner filter overrides the outer, in FTS5 both filters are applied. <li> <p>The FTS query syntax (right hand side of the MATCH operator) has changed in some ways. The FTS5 syntax is quite close to the FTS4 "enhanced syntax". The main difference is that FTS5 is fussier about unrecognized punctuation characters and similar within query strings. Most queries that work with FTS3/4 should also work with FTS5, and those that do not should return parse errors. </ol> <h3 nonumber> Auxiliary Function Changes </h3> <p> FTS5 has no matchinfo() or offsets() function, and the snippet() function is not as fully-featured as in FTS3/4. However, since FTS5 does provide an API allowing applications to create [custom auxiliary functions], any required functionality may be implemented within the application code. |
︙ | ︙ |