Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Update fts5.html to node that the unicode61 tokenizer is compatible with the fts3 tokenizer of the same name. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
3cc1ff72a41ca47fb4324676c8ab5f3a |
User & Date: | dan 2016-01-15 08:33:20.421 |
Context
2016-01-25
| ||
13:55 | Merge 3.10.2 changes. Add the SQLITE_EXTRA_DURABLE=1 compile-time option. (check-in: d39c6c7cfc user: drh tags: trunk) | |
2016-01-15
| ||
08:33 | Update fts5.html to node that the unicode61 tokenizer is compatible with the fts3 tokenizer of the same name. (check-in: 3cc1ff72a4 user: dan tags: trunk) | |
2016-01-14
| ||
18:26 | Merge updates from the 3.10 branch. (check-in: 885e891911 user: drh tags: trunk) | |
Changes
Changes to pages/fts5.in.
︙ | ︙ | |||
536 537 538 539 540 541 542 543 544 545 546 547 548 549 | <i>-- Create an FTS5 table that does not remove diacritics from Latin -- script characters, and that considers hyphens and underscore characters -- to be part of tokens. </i> CREATE VIRTUAL TABLE ft USING fts5(a, b, tokenize = "unicode61 remove_diacritics 0 tokenchars '-_'" ); </codeblock> <h3>Ascii Tokenizer</h3> <p> The Ascii tokenizer is similar to the Unicode61 tokenizer, except that: <ul> <li> All non-ASCII characters (those with codepoints greater than 127) are | > > > | 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 | <i>-- Create an FTS5 table that does not remove diacritics from Latin -- script characters, and that considers hyphens and underscore characters -- to be part of tokens. </i> CREATE VIRTUAL TABLE ft USING fts5(a, b, tokenize = "unicode61 remove_diacritics 0 tokenchars '-_'" ); </codeblock> <p> The fts5 unicode61 tokenizer is byte-for-byte compatible with the fts3/4 unicode61 tokenizer. <h3>Ascii Tokenizer</h3> <p> The Ascii tokenizer is similar to the Unicode61 tokenizer, except that: <ul> <li> All non-ASCII characters (those with codepoints greater than 127) are |
︙ | ︙ |