Documentation Source Text

Check-in [ab263df2e6]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update fts5 docs to reflect support for a user column on the LHS of a MATCH operator.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: ab263df2e645d99e9c6d01c263bd0c117c8c2f068a6341160e45aec9adf0e406
User & Date: dan 2017-04-13 09:59:26.118
Context
2017-04-17
18:13
Update the documentation on the foreign_key_check pragma to explain that the second output column is NULL for WITHOUT ROWID child tables. (check-in: af6ceab74d user: drh tags: trunk)
2017-04-13
09:59
Update fts5 docs to reflect support for a user column on the LHS of a MATCH operator. (check-in: ab263df2e6 user: dan tags: trunk)
2017-04-12
20:02
Attempt to clarify the operation of the page_size pragma. (check-in: 6d2df6128e user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fts5.in.
42
43
44
45
46
47
48

49
50
51
52
53
54
55
56

<ul><li> Using a MATCH operator in the WHERE clause of a SELECT statement, or
    <li> Using an equals ("=") operator in the WHERE clause of a SELECT statement, or
    <li> using the [table-valued function] syntax.
</ul>

<p>If using the MATCH or = operators, the expression to the left of the MATCH

   operator must be the name of the FTS5 table. The expression on the right
   must be a text value specifying the term to search for. For the table-valued
   function syntax, the term to search for is specified as the first table argument.
   For example:

<codeblock>
  <i>-- Query for all rows that contain at least once instance of the term</i>
  <i>-- "fts5" (in any column). The following three queries are equivalent.</i>







>
|







42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

<ul><li> Using a MATCH operator in the WHERE clause of a SELECT statement, or
    <li> Using an equals ("=") operator in the WHERE clause of a SELECT statement, or
    <li> using the [table-valued function] syntax.
</ul>

<p>If using the MATCH or = operators, the expression to the left of the MATCH
   operator is usually the name of the FTS5 table (the exception is when 
   [FTS5 column filters | specifying a column-filter]). The expression on the right
   must be a text value specifying the term to search for. For the table-valued
   function syntax, the term to search for is specified as the first table argument.
   For example:

<codeblock>
  <i>-- Query for all rows that contain at least once instance of the term</i>
  <i>-- "fts5" (in any column). The following three queries are equivalent.</i>
178
179
180
181
182
183
184

185
186
187
188
189
190
191
&lt;query&gt;     := &lt;query&gt; AND &lt;query&gt;
&lt;query&gt;     := &lt;query&gt; OR &lt;query&gt;
&lt;query&gt;     := &lt;query&gt; NOT &lt;query&gt;
&lt;colspec&gt;   := colname
&lt;colspec&gt;   := { colname1 colname2 ... }
</codeblock>


<p>
Within an FTS expression a <b>string</b> may be specified in one of two ways:

<ul>
  <li> <p>By enclosing it in double quotes ("). Within a string, any embedded
       double quote characters may be escaped SQL-style - by adding a second
       double-quote character.







>







179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
&lt;query&gt;     := &lt;query&gt; AND &lt;query&gt;
&lt;query&gt;     := &lt;query&gt; OR &lt;query&gt;
&lt;query&gt;     := &lt;query&gt; NOT &lt;query&gt;
&lt;colspec&gt;   := colname
&lt;colspec&gt;   := { colname1 colname2 ... }
</codeblock>

<h2 tags="FTS5 Strings">FTS5 Strings</h2>
<p>
Within an FTS expression a <b>string</b> may be specified in one of two ways:

<ul>
  <li> <p>By enclosing it in double quotes ("). Within a string, any embedded
       double quote characters may be escaped SQL-style - by adding a second
       double-quote character.
208
209
210
211
212
213
214

215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233

234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252


253
254
255
256
257
258
259
       do not currently serve any special purpose in FTS5 query expressions may
       at some point in the future be allowed in barewords or used to implement
       new query functionality. This means that queries that are currently
       syntax errors because they include such a character outside of a quoted
       string may be interpreted differently by some future version of FTS5.
</ul>


<p>
FTS queries are made up of <b>phrases</b>. A phrase is an ordered list of 
one or more tokens. A string is transformed into a phrase by passing it to
the FTS table tokenizer. Two phrases can be concatenated into a single 
large phrase using the "+" operator. For example, assuming the tokenizer
module being used tokenizes the input "one.two.three" to three separate
tokens, the following three queries all specify the same phrase:

<codeblock>
  ... MATCH '"one two three"'
  ... MATCH 'one + two + three'
  ... MATCH '"one two" + three'
  ... MATCH 'one.two.three'
</codeblock>

<p>
A phrase matches a document if the document contains at least one sub-sequence
of tokens that matches the sequence of tokens that make up the phrase.


<p>
If a "*" character follows a string within an FTS expression, then the final
token extracted from the string is marked as a <b>prefix token</b>. As you
might expect, a prefix token matches any document token of which it is a 
prefix. For example, the first two queries in the following block will match
any document that contains the token "one" immediately followed by the token
"two" and then any token that begins with "thr".

<codeblock>
  ... MATCH '"one two thr" * '
  ... MATCH 'one + two + thr*'
  ... MATCH '"one two thr*"'      <b>-- May not work as expected!</b>
</codeblock>

<p>The final query in the block above may not work as expected. Because the
"*" character is inside the double-quotes, it will be passed to the tokenizer,
which will likely discard it (or perhaps, depending on the specific tokenizer
in use, include it as part of the final token) instead of recognizing it as
a special FTS character.



<p>Two or more phrases may be grouped into a <b>NEAR group</b>. A NEAR group
is specified by the token "NEAR" (case sensitive) followed by an open
parenthesis character, followed by two or more whitespace separated phrases, optionally followed by a comma and the numeric parameter <i>N</i>, followed by
a close parenthesis. For example:

<codeblock>







>



















>



















>
>







210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
       do not currently serve any special purpose in FTS5 query expressions may
       at some point in the future be allowed in barewords or used to implement
       new query functionality. This means that queries that are currently
       syntax errors because they include such a character outside of a quoted
       string may be interpreted differently by some future version of FTS5.
</ul>

<h2 tags="FTS5 Phrases">FTS5 Phrases</h2>
<p>
FTS queries are made up of <b>phrases</b>. A phrase is an ordered list of 
one or more tokens. A string is transformed into a phrase by passing it to
the FTS table tokenizer. Two phrases can be concatenated into a single 
large phrase using the "+" operator. For example, assuming the tokenizer
module being used tokenizes the input "one.two.three" to three separate
tokens, the following three queries all specify the same phrase:

<codeblock>
  ... MATCH '"one two three"'
  ... MATCH 'one + two + three'
  ... MATCH '"one two" + three'
  ... MATCH 'one.two.three'
</codeblock>

<p>
A phrase matches a document if the document contains at least one sub-sequence
of tokens that matches the sequence of tokens that make up the phrase.

<h2 tags="FTS5 prefix queries">FTS5 Prefix Queries</h2>
<p>
If a "*" character follows a string within an FTS expression, then the final
token extracted from the string is marked as a <b>prefix token</b>. As you
might expect, a prefix token matches any document token of which it is a 
prefix. For example, the first two queries in the following block will match
any document that contains the token "one" immediately followed by the token
"two" and then any token that begins with "thr".

<codeblock>
  ... MATCH '"one two thr" * '
  ... MATCH 'one + two + thr*'
  ... MATCH '"one two thr*"'      <b>-- May not work as expected!</b>
</codeblock>

<p>The final query in the block above may not work as expected. Because the
"*" character is inside the double-quotes, it will be passed to the tokenizer,
which will likely discard it (or perhaps, depending on the specific tokenizer
in use, include it as part of the final token) instead of recognizing it as
a special FTS character.

<h2 tags="FTS5 NEAR queries">FTS5 NEAR Queries</h2>

<p>Two or more phrases may be grouped into a <b>NEAR group</b>. A NEAR group
is specified by the token "NEAR" (case sensitive) followed by an open
parenthesis character, followed by two or more whitespace separated phrases, optionally followed by a comma and the numeric parameter <i>N</i>, followed by
a close parenthesis. For example:

<codeblock>
287
288
289
290
291
292
293

294
295
296
297
298
299
300
  ... MATCH 'NEAR(a d e, 5)';                    <i>-- Does not match!</i>

  ... MATCH 'NEAR("a b c d" "b c" "e f", 4)';    <i>-- Matches!</i>
  ... MATCH 'NEAR("a b c d" "b c" "e f", 3)';    <i>-- Does not match!</i>

</codeblock>



<p>
A single phrase or NEAR group may be restricted to matching text within a
specified column of the FTS table by prefixing it with the column name 
followed by a colon character. Or to a set of columns by prefixing it
with a whitespace separated list of column names enclosed in parenthesis
("curly brackets") followed by a colon character. Column names may be specified







>







293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
  ... MATCH 'NEAR(a d e, 5)';                    <i>-- Does not match!</i>

  ... MATCH 'NEAR("a b c d" "b c" "e f", 4)';    <i>-- Matches!</i>
  ... MATCH 'NEAR("a b c d" "b c" "e f", 3)';    <i>-- Does not match!</i>

</codeblock>

<h2 tags="FTS5 column filters">FTS5 Column Filters</h2>

<p>
A single phrase or NEAR group may be restricted to matching text within a
specified column of the FTS table by prefixing it with the column name 
followed by a colon character. Or to a set of columns by prefixing it
with a whitespace separated list of column names enclosed in parenthesis
("curly brackets") followed by a colon character. Column names may be specified
332
333
334
335
336
337
338



















339
340
341
342
343
344
345

<codeblock>
  <i>-- The following are equivalent:</i>
  ... MATCH '{a b} : ( {b c} : "hello" AND "world" )'
  ... MATCH '(b : "hello") AND ({a b} : "world")'
</codeblock>




















<p>
Phrases and NEAR groups may be arranged into expressions using <b>boolean
operators</b>. In order of precedence, from highest (tightest grouping) to
lowest (loosest grouping), the operators are:

<table striped=1>
  <tr><th>Operator <th>Function







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371

<codeblock>
  <i>-- The following are equivalent:</i>
  ... MATCH '{a b} : ( {b c} : "hello" AND "world" )'
  ... MATCH '(b : "hello") AND ({a b} : "world")'
</codeblock>

<p>
Finally, a column filter for a single column may be specified by using
the column name as the LHS of a MATCH operator (instead of the usual
table name). For example:

<codeblock>
  <i>-- Given the following table</i>
  CREATE VIRTUAL TABLE ft USING fts5(a, b, c);

  <i>-- The following are equivalent</i>
  SELECT * FROM ft WHERE b MATCH 'uvw AND xyz';
  SELECT * FROM ft WHERE ft MATCH 'b : (uvw AND xyz)';

  <i>-- This query cannot match any rows (since all columns are filtered out): </i>
  SELECT * FROM ft WHERE b MATCH 'a : xyz';
</codeblock>

<h2 tags="FTS5 boolean operators">FTS5 Boolean Operators</h2>

<p>
Phrases and NEAR groups may be arranged into expressions using <b>boolean
operators</b>. In order of precedence, from highest (tightest grouping) to
lowest (loosest grouping), the operators are:

<table striped=1>
  <tr><th>Operator <th>Function
1695
1696
1697
1698
1699
1700
1701
1702








1703

1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727

<h3 nonumber> Changes to SELECT statements </h3>

<ol>
  <li> <p>The "docid" alias does not exist. Applications must use "rowid"
          instead.

  <li> <p>The left hand side of the MATCH operator in a full-text query must be








          the table name - not any column name as in FTS4.


  <li> <p>The FTS query syntax (right hand side of the MATCH operator) has
          changed in some ways. The FTS5 syntax is quite close to the FTS4
          "enhanced syntax". The main difference is that FTS5 is fussier 
          about unrecognized punctuation characters and similar within query
          strings. Most queries that work with FTS3/4 should also work with
          FTS5, and those that do not should return parse errors.
</ol>

<codeblock>
  <i>-- FTS3/4 query </i>
  SELECT docid FROM t1 WHERE text MATCH 'token';

  <i>-- FTS5 equivalent </i>
  SELECT rowid FROM t1 WHERE t1 MATCH 'text:token';
</codeblock>

<h3 nonumber> Auxiliary Function Changes </h3>

<p> FTS5 has no matchinfo() or offsets() function, and the snippet() function
is not as fully-featured as in FTS3/4. However, since FTS5 does provide 
an API allowing applications to create [custom auxiliary functions], any
required functionality may be implemented within the application code.








|
>
>
>
>
>
>
>
>
|
>









<
<
<
<
<
<
<
<







1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747








1748
1749
1750
1751
1752
1753
1754

<h3 nonumber> Changes to SELECT statements </h3>

<ol>
  <li> <p>The "docid" alias does not exist. Applications must use "rowid"
          instead.

  <li> <p>The behaviour of queries when a column-filter is specified both as
          part of the FTS query and by using a column as the LHS of a MATCH
          operator is slightly different. For a table with columns "a" and "b"
          and a query similar to:
<codeblock>
   ... a MATCH 'b: string'
</codeblock>
       <p>FTS3/4 searches for matches in column "b". However, FTS5 always
          returns zero rows, as results are first filtered for column "b", then
          for column "a", leaving no results. In other words, in FTS3/4 the
          inner filter overrides the outer, in FTS5 both filters are applied.

  <li> <p>The FTS query syntax (right hand side of the MATCH operator) has
          changed in some ways. The FTS5 syntax is quite close to the FTS4
          "enhanced syntax". The main difference is that FTS5 is fussier 
          about unrecognized punctuation characters and similar within query
          strings. Most queries that work with FTS3/4 should also work with
          FTS5, and those that do not should return parse errors.
</ol>









<h3 nonumber> Auxiliary Function Changes </h3>

<p> FTS5 has no matchinfo() or offsets() function, and the snippet() function
is not as fully-featured as in FTS3/4. However, since FTS5 does provide 
an API allowing applications to create [custom auxiliary functions], any
required functionality may be implemented within the application code.