Documentation Source Text

Check-in [aee1b746ba]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos and add clarification in the fts3tokenize documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: aee1b746ba2db0b8767dae922bee3fe514c6fecc
User & Date: drh 2013-04-26 14:47:01.002
Context
2013-04-26
15:21
Update the change log for 3.7.17 after reviewing the timeline. (check-in: 7e7024c429 user: drh tags: trunk)
14:47
Fix typos and add clarification in the fts3tokenize documentation. (check-in: aee1b746ba user: drh tags: trunk)
14:38
Add documentation for the fts3tokenize table. (check-in: a6e655aa62 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fts3.in.
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165

2166




2167
2168
2169
2170
2171
2172
2173
   tokenizer.  The following SQL demonstrates how to create an instance 
   of the fts3tokenize virtual table:

<codeblock>
CREATE VIRTUAL TABLE tok1 USING fts3tokenize('porter');
</codeblock>

<p>The name of the desired tokenizer should be substitued in place of
   'porter' in the example, of course.  Once the virtual table is created,
   it can be queried as follows:

<codeblock>
SELECT token, start, end, position 
  FROM tok1
 WHERE input='This is a test sentence.';
</codeblock>

<p>The virtual table will return one row of output for each token in the
   input string.  The "token" column is the text of the token.  The "start"
   and "end" columns are the byte offset to the beginning and end of the
   token in the original input string.  The "pos" column is the sequence number

   of the token in the original input string.  The example above generates




   the following output:

<codeblock>
thi|0|4|0
is|5|7|1
a|8|9|2
test|10|14|3







|












|
>
|
>
>
>
>







2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
   tokenizer.  The following SQL demonstrates how to create an instance 
   of the fts3tokenize virtual table:

<codeblock>
CREATE VIRTUAL TABLE tok1 USING fts3tokenize('porter');
</codeblock>

<p>The name of the desired tokenizer should be substituted in place of
   'porter' in the example, of course.  Once the virtual table is created,
   it can be queried as follows:

<codeblock>
SELECT token, start, end, position 
  FROM tok1
 WHERE input='This is a test sentence.';
</codeblock>

<p>The virtual table will return one row of output for each token in the
   input string.  The "token" column is the text of the token.  The "start"
   and "end" columns are the byte offset to the beginning and end of the
   token in the original input string.  
   The "position" column is the sequence number
   of the token in the original input string.  There is also an "input"
   column which is simply a copy of the input string that is specified in
   the WHERE clause.  Note that a constraint of the form "input=?" must
   appear in the WHERE clause or else the virtual table will have no input
   to tokenize and will return no rows.  The example above generates
   the following output:

<codeblock>
thi|0|4|0
is|5|7|1
a|8|9|2
test|10|14|3