Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Fix a bug in the description of the 'simple' FTS tokenizer. Underscores (codepoint 95) are divider characters not token characters. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
73a0dac5840af9808ae908dfa6029f12 |
User & Date: | dan 2012-02-27 07:06:24.152 |
Context
2012-03-05
| ||
19:14 | Documentation for the content= and langaugeid= options for FTS4. (check-in: 16b58c9eb4 user: drh tags: trunk) | |
2012-02-27
| ||
07:06 | Fix a bug in the description of the 'simple' FTS tokenizer. Underscores (codepoint 95) are divider characters not token characters. (check-in: 73a0dac584 user: dan tags: trunk) | |
2012-02-23
| ||
14:40 | Documentation of the SQLITE_FCNTL_PRAGMA file-control. Point out that disabling compound SELECT statements also disables multi-value INSERT. (check-in: cf86dcee73 user: drh tags: trunk) | |
Changes
Changes to pages/fts3.in.
︙ | ︙ | |||
1492 1493 1494 1495 1496 1497 1498 | VIRTUAL TABLE statement used to create the FTS table, the default tokenizer, "simple", is used. The simple tokenizer extracts tokens from a document or basic FTS full-text query according to the following rules: <ul> <li><p> A term is a contiguous sequence of eligible characters, where | | | | | | 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 | VIRTUAL TABLE statement used to create the FTS table, the default tokenizer, "simple", is used. The simple tokenizer extracts tokens from a document or basic FTS full-text query according to the following rules: <ul> <li><p> A term is a contiguous sequence of eligible characters, where eligible characters are all alphanumeric characters and all characters with UTF codepoints greater than or equal to 128. All other characters are discarded when splitting a document into terms. Their only contribution is to separate adjacent terms. <li><p> All uppercase characters within the ASCII range (UTF codepoints less than 128), are transformed to their lowercase equivalents as part of the tokenization process. Thus, full-text queries are case-insensitive when using the simple tokenizer. </ul> |
︙ | ︙ |