Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.
|Comment:||Fix the description of the case folding performed by the unicode61 tokenizer in FTS3.|
|Timelines:||family | ancestors | descendants | both | branch-3.11|
|Files:||files | file ages | folders|
|User & Date:||drh 2016-03-21 20:02:27|
|23:12||Update TH3 license information. Leaf check-in: 84f9b8afc2 user: drh tags: branch-3.11|
|17:23||Merge fixes off of the 3.11 branch. check-in: ad0172e592 user: drh tags: trunk|
|20:02||Fix the description of the case folding performed by the unicode61 tokenizer in FTS3. check-in: 37a01760c6 user: drh tags: branch-3.11|
|14:56||Update the wal.html document at the bigwal anchor to reflect improvements to large transactions in WAL mode added to version 3.11.0. check-in: e273b7fdae user: drh tags: branch-3.11|
Changes to pages/fts3.in.
2200 2200 processing is required, for example to implement stemming or 2201 2201 discard punctuation, this can be done by creating a tokenizer 2202 2202 implementation that uses the ICU tokenizer as part of its implementation. 2203 2203 2204 2204 <tcl>hd_fragment unicode61 unicode61</tcl> 2205 2205 <p> 2206 2206 The "unicode61" tokenizer is available beginning with SQLite [version 3.7.13]. 2207 - Unicode61 works very much like "simple" except that it does full unicode 2207 + Unicode61 works very much like "simple" except that it does simple unicode 2208 2208 case folding according to rules in Unicode Version 6.1 and it recognizes 2209 2209 unicode space and punctuation characters and uses those to separate tokens. 2210 2210 The simple tokenizer only does case folding of ASCII characters and only 2211 2211 recognizes ASCII space and punctuation characters as token separators. 2212 2212 2213 2213 <p> 2214 2214 By default, "unicode61" also removes all diacritics from Latin script