Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add docs for fts5 unicode61 tokenizer option "categories". |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
e32e87711917c5f124548bc46d956e59 |
User & Date: | dan 2018-07-13 20:31:10.656 |
Context
2018-07-23
| ||
10:51 | Changes to information on support packages. (check-in: 7e685f86a9 user: drh tags: trunk) | |
2018-07-13
| ||
20:31 | Add docs for fts5 unicode61 tokenizer option "categories". (check-in: e32e877119 user: dan tags: trunk) | |
2018-07-11
| ||
11:08 | Update the keyword list with all of the new keywords added for UPSERT and window functions. (check-in: 824b38d28e user: drh tags: trunk) | |
Changes
Changes to pages/fts5.in.
︙ | |||
563 564 565 566 567 568 569 | 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 | - - - + + + + + + + + + + + + + + + + + + | <p> It is also possible to create custom tokenizers for FTS5. The API for doing so is [custom tokenizers | described here]. <h3>Unicode61 Tokenizer</h3> <p> The unicode tokenizer classifies all unicode characters as either "separator" or "token" characters. By default all space and punctuation characters, as defined by Unicode 6.1, are considered separators, and all |
︙ | |||
606 607 608 609 610 611 612 613 614 615 616 617 618 619 | 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 | + + + + + + + + + + | -- script characters, and that considers hyphens and underscore characters -- to be part of tokens. </i> CREATE VIRTUAL TABLE ft USING fts5(a, b, tokenize = "unicode61 remove_diacritics 0 tokenchars '-_'" ); </codeblock> <p> or: <codeblock> <i>-- Create an FTS5 table that, as well as the default token character classes,</i> <i>-- considers characters in class "Mn" to be token characters.</i> CREATE VIRTUAL TABLE ft USING fts5(a, b, tokenize = "unicode61 categories 'L* N* Co Mn'" ); </codeblock> <p> The fts5 unicode61 tokenizer is byte-for-byte compatible with the fts3/4 unicode61 tokenizer. <h3>Ascii Tokenizer</h3> <p> The Ascii tokenizer is similar to the Unicode61 tokenizer, except that: |
︙ |