/ Check-in [ed81ad5a]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add the file ext/fts3/README.syntax, containing documentation describing the two query syntaxes now supported by fts3. (CVS 6042)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: ed81ad5a5d22304a4d96e778e8e9094f74c461c0
User & Date: danielk1977 2008-12-19 11:37:39
Context
2008-12-19
16:31
Add some crash-tests for savepoint. Fix a bug revealed by these tests. (CVS 6043) check-in: 6f36c166 user: danielk1977 tags: trunk
11:37
Add the file ext/fts3/README.syntax, containing documentation describing the two query syntaxes now supported by fts3. (CVS 6042) check-in: ed81ad5a user: danielk1977 tags: trunk
2008-12-18
22:25
Fix the Oracle and MS-Sql command-line terminator logic in the CLI. Ticket #3544. (CVS 6041) check-in: dcc8935f user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Added ext/fts3/README.syntax.

            1  +
            2  +1. OVERVIEW
            3  +
            4  +  This README file describes the syntax of the arguments that may be passed to
            5  +  the FTS3 MATCH operator used for full-text queries. For example, if table 
            6  +  "t1" is an Fts3 virtual table, the following SQL query:
            7  +
            8  +    SELECT * FROM t1 WHERE <col> MATCH <full-text query>
            9  +
           10  +  may be used to retrieve all rows that match a specified for full-text query. 
           11  +  The text "<col>" should be replaced by either the name of the fts3 table 
           12  +  (in this case "t1"), or by the name of one of the columns of the fts3 
           13  +  table. <full-text-query> should be replaced by an SQL expression that 
           14  +  computes to a string containing an Fts3 query.
           15  +
           16  +  If the left-hand-side of the MATCH operator is set to the name of the
           17  +  fts3 table, then by default the query may be matched against any column
           18  +  of the table. If it is set to a column name, then by default the query
           19  +  may only match the specified column. In both cases this may be overriden
           20  +  as part of the query text (see sections 2 and 3 below).
           21  +
           22  +  As of SQLite version 3.6.8, Fts3 supports two slightly different query 
           23  +  formats; the standard syntax, which is used by default, and the enhanced
           24  +  query syntax which can be selected by compiling with the pre-processor
           25  +  symbol SQLITE_FTS3_ENABLE_PARENTHESIS defined.
           26  +
           27  +    -DSQLITE_FTS3_ENABLE_PARENTHESIS
           28  +
           29  +2. STANDARD QUERY SYNTAX
           30  +
           31  +  When using the standard Fts3 query syntax, a query usually consists of a 
           32  +  list of terms (words) separated by white-space characters. To match a
           33  +  query, a row (or column) of an Fts3 table must contain each of the specified
           34  +  terms. For example, the following query:
           35  +
           36  +    <col> MATCH 'hello world'
           37  +
           38  +  matches rows (or columns, if <col> is the name of a column name) that 
           39  +  contain at least one instance of the token "hello", and at least one 
           40  +  instance of the token "world". Tokens may be grouped into phrases using
           41  +  quotation marks. In this case, a matching row or column must contain each
           42  +  of the tokens in the phrase in the order specified, with no intervening
           43  +  tokens. For example, the query:
           44  +
           45  +    <col> MATCH '"hello world" joe"
           46  +
           47  +  matches the first of the following two documents, but not the second or
           48  +  third:
           49  +
           50  +    "'Hello world', said Joe."
           51  +    "One should always greet the world with a cheery hello, thought Joe."
           52  +    "How many hello world programs could their be?"
           53  +
           54  +  As well as grouping tokens together by phrase, the binary NEAR operator 
           55  +  may be used to search for rows that contain two or more specified tokens 
           56  +  or phrases within a specified proximity of each other. The NEAR operator
           57  +  must always be specified in upper case. The word "near" in lower or mixed
           58  +  case is treated as an ordinary token. For example, the following query:
           59  +
           60  +    <col> MATCH 'engineering NEAR consultancy'
           61  +
           62  +  matches rows that contain both the "engineering" and "consultancy" tokens
           63  +  in the same column with not more than 10 other words between them. It does
           64  +  not matter which of the two terms occurs first in the document, only that
           65  +  they be seperated by only 10 tokens or less. The user may also specify
           66  +  a different required proximity by adding "/N" immediately after the NEAR
           67  +  operator, where N is an integer. For example:
           68  +
           69  +    <col> MATCH 'engineering NEAR/5 consultancy'
           70  +
           71  +  searches for a row containing an instance of each specified token seperated
           72  +  by not more than 5 other tokens. More than one NEAR operator can be used
           73  +  in as sequence. For example this query:
           74  +
           75  +    <col> MATCH 'reliable NEAR/2 engineering NEAR/5 consultancy'
           76  +
           77  +  searches for a row that contains an instance of the token "reliable" 
           78  +  seperated by not more than two tokens from an instance of "engineering",
           79  +  which is in turn separated by not more than 5 other tokens from an
           80  +  instance of the term "consultancy". Phrases enclosed in quotes may
           81  +  also be used as arguments to the NEAR operator.
           82  +
           83  +  Similar to the NEAR operator, one or more tokens or phrases may be 
           84  +  separated by OR operators. In this case, only one of the specified tokens
           85  +  or phrases must appear in the document. For example, the query:
           86  +
           87  +    <col> MATCH 'hello OR world'
           88  +
           89  +  matches rows that contain either the term "hello", or the term "world",
           90  +  or both. Note that unlike in many programming languages, the OR operator
           91  +  has a higher precedence than the AND operators implied between white-space
           92  +  separated tokens. The following query matches documents that contain the
           93  +  term 'sqlite' and at least one of the terms 'fantastic' or 'impressive',
           94  +  not those that contain both 'sqlite' and 'fantastic' or 'impressive':
           95  +
           96  +    <col> MATCH 'sqlite fantastic OR impressive'
           97  +
           98  +  Any token that is part of an Fts3 query expression, whether or not it is
           99  +  part of a phrase enclosed in quotes, may have a '*' character appended to
          100  +  it. In this case, the token matches all terms that begin with the characters
          101  +  of the token, not just those that exactly match it. For example, the 
          102  +  following query:
          103  +
          104  +    <col> MATCH 'sql*'
          105  +
          106  +  matches all rows that contain the term "SQLite", as well as those that
          107  +  contain "SQL".
          108  +
          109  +  A token that is not part of a quoted phrase may be preceded by a '-'
          110  +  character, which indicates that matching rows must not contain the 
          111  +  specified term. For example, the following:
          112  +
          113  +    <col> MATCH '"database engine" -sqlite'
          114  +
          115  +  matches rows that contain the phrase "database engine" but do not contain
          116  +  the term "sqlite". If the '-' character occurs inside a quoted phrase,
          117  +  it is ignored. It is possible to use both the '-' prefix and the '*' postfix
          118  +  on a single term. At this time, all Fts3 queries must contain at least
          119  +  one term or phrase that is not preceded by the '-' prefix.
          120  +
          121  +  Regardless of whether or not a table name or column name is used on the 
          122  +  left hand side of the MATCH operator, a specific column of the fts3 table
          123  +  may be associated with each token in a query by preceding a token with
          124  +  a column name followed by a ':' character. For example, regardless of what
          125  +  is specified for <col>, the following query requires that column "col1"
          126  +  of the table contains the term "hello", and that column "col2" of the
          127  +  table contains the term "world". If the table does not contain columns
          128  +  named "col1" and "col2", then an error is returned and the query is
          129  +  not run.
          130  +
          131  +    <col> MATCH 'col1:hello col2:world'
          132  +
          133  +  It is not possible to associate a specific table column with a quoted 
          134  +  phrase or a term preceded by a '-' operator. A '*' character may be
          135  +  appended to a term associated with a specific column for prefix matching.
          136  +
          137  +3. ENHANCED QUERY SYNTAX
          138  +
          139  +  The enhanced query syntax is quite similar to the standard query syntax,
          140  +  with the following four differences:
          141  +
          142  +  1) Parenthesis are supported. When using the enhanced query syntax,
          143  +     parenthesis may be used to overcome the built-in precedence of the
          144  +     supplied binary operators. For example, the following query:
          145  +
          146  +       <col> MATCH '(hello world) OR (simple example)'
          147  +
          148  +     matches documents that contain both "hello" and "world", and documents
          149  +     that contain both "simple" and "example". It is not possible to forumlate
          150  +     such a query using the standard syntax.
          151  +
          152  +  2) Instead of separating tokens and phrases by whitespace, an AND operator
          153  +     may be explicitly specified. This does not change query processing at
          154  +     all, but may be used to improve readability. For example, the following
          155  +     query is handled identically to the one above:
          156  +
          157  +       <col> MATCH '(hello AND world) OR (simple AND example)'
          158  +
          159  +     As with the OR and NEAR operators, the AND operator must be specified
          160  +     in upper case. The word "and" specified in lower or mixed case is 
          161  +     handled as a regular token.
          162  +
          163  +  3) The '-' token prefix is not supported. Instead, a new binary operator,
          164  +     NOT, is included. The NOT operator requires that the query specified
          165  +     as its left-hand operator matches, but that the query specified as the
          166  +     right-hand operator does not. For example, to query for all rows that
          167  +     contain the term "example" but not the term "simple", the following
          168  +     query could be used:
          169  +
          170  +       <col> MATCH 'example NOT simple'
          171  +
          172  +     As for all other operators, the NOT operator must be specified in
          173  +     upper case. Otherwise it will be treated as a regular token.
          174  +
          175  +  4) Unlike in the standard syntax, where the OR operator has a higher
          176  +     precedence than the implicit AND operator, when using the enhanced
          177  +     syntax implicit and explict AND operators have a higher precedence
          178  +     than OR operators. Using the enhanced syntax, the following two
          179  +     queries are equivalent:
          180  +
          181  +       <col> MATCH 'sqlite fantastic OR impressive'
          182  +       <col> MATCH '(sqlite AND fantastic) OR impressive'
          183  +
          184  +     however, when using the standard syntax, the query:
          185  +
          186  +       <col> MATCH 'sqlite fantastic OR impressive'
          187  +
          188  +     is equivalent to the enhanced syntax query:
          189  +
          190  +       <col> MATCH 'sqlite AND (fantastic OR impressive)'
          191  +
          192  +     The precedence of all enhanced syntax operators, in order from highest
          193  +     to lowest, is:
          194  +
          195  +       NEAR       (highest precedence, tightest grouping)
          196  +       NOT
          197  +       AND
          198  +       OR         (lowest precedence, loosest grouping)
          199  +
          200  +  Using the advanced syntax, it is possible to specify expressions enclosed
          201  +  in parenthesis as operands to the NOT, AND and OR operators. However both
          202  +  the left and right hand side operands of NEAR operators must be either
          203  +  tokens or phrases. Attempting the following query will return an error:
          204  +
          205  +    <col> MATCH 'sqlite NEAR (fantastic OR impressive)'
          206  +
          207  +  Queries of this form must be re-written as:
          208  +
          209  +    <col> MATCH 'sqlite NEAR fantastic OR sqlite NEAR impressive'
          210  +  
          211  +
          212  +