Documentation Source Text

Check-in [37a01760c6]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix the description of the case folding performed by the unicode61 tokenizer in FTS3.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | branch-3.11
Files: files | file ages | folders
SHA1: 37a01760c60712bd4441ba0d92896e04621ea3ee
User & Date: drh 2016-03-21 20:02:27.373
Context
2016-03-26
23:12
Update TH3 license information. (Leaf check-in: 84f9b8afc2 user: drh tags: branch-3.11)
2016-03-22
17:23
Merge fixes off of the 3.11 branch. (check-in: ad0172e592 user: drh tags: trunk)
2016-03-21
20:02
Fix the description of the case folding performed by the unicode61 tokenizer in FTS3. (check-in: 37a01760c6 user: drh tags: branch-3.11)
2016-03-18
14:56
Update the wal.html document at the bigwal anchor to reflect improvements to large transactions in WAL mode added to version 3.11.0. (check-in: e273b7fdae user: drh tags: branch-3.11)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fts3.in.
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
  processing is required, for example to implement stemming or
  discard punctuation, this can be done by creating a tokenizer
  implementation that uses the ICU tokenizer as part of its implementation.

<tcl>hd_fragment unicode61 unicode61</tcl>
<p>
  The "unicode61" tokenizer is available beginning with SQLite [version 3.7.13].
  Unicode61 works very much like "simple" except that it does full unicode
  case folding according to rules in Unicode Version 6.1 and it recognizes
  unicode space and punctuation characters and uses those to separate tokens.
  The simple tokenizer only does case folding of ASCII characters and only
  recognizes ASCII space and punctuation characters as token separators.

<p>
  By default, "unicode61" also removes all diacritics from Latin script







|







2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
  processing is required, for example to implement stemming or
  discard punctuation, this can be done by creating a tokenizer
  implementation that uses the ICU tokenizer as part of its implementation.

<tcl>hd_fragment unicode61 unicode61</tcl>
<p>
  The "unicode61" tokenizer is available beginning with SQLite [version 3.7.13].
  Unicode61 works very much like "simple" except that it does simple unicode
  case folding according to rules in Unicode Version 6.1 and it recognizes
  unicode space and punctuation characters and uses those to separate tokens.
  The simple tokenizer only does case folding of ASCII characters and only
  recognizes ASCII space and punctuation characters as token separators.

<p>
  By default, "unicode61" also removes all diacritics from Latin script