Documentation Source Text

Check-in [7e55864b0a]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Document the byte-order-mark limitation of fts3/4.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 7e55864b0a74d16f4d0c86bee88a185695c7bc502837465b0bb203e7c056d6ab
User & Date: dan 2020-01-29 20:19:32.773
Context
2020-04-14
14:08
Fix a typo in limits.in. (check-in: 2664eaab37 user: dan tags: trunk)
2020-01-29
21:29
Add the SQLITE_OMIT_AUTOINIT compile-time option to the set of recommended compile-time options. (check-in: f250d55692 user: drh tags: trunk)
20:19
Document the byte-order-mark limitation of fts3/4. (check-in: 7e55864b0a user: dan tags: trunk)
2020-01-27
20:02
Version 3.31.1 (check-in: 2ab23690d8 user: drh tags: trunk, release, version-3.31.1)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fts3.in.
2790
2791
2792
2793
2794
2795
2796





































2797
2798
2799
2800
2801
2802
2803

<p>
  For doclists for which the term appears in more than one column of the FTS
  virtual table, term-offset lists within the doclist are stored in column 
  number order. This ensures that the term-offset list associated with 
  column 0 (if any) is always first, allowing the first two fields of the
  term-offset list to be omitted in this case.






































<h1 id=appendix_a nonumber tags="search application tips">
  Appendix A: Search Application Tips
</h1>

<p>
  FTS is primarily designed to support Boolean full-text queries - queries







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
2800
2801
2802
2803
2804
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
2827
2828
2829
2830
2831
2832
2833
2834
2835
2836
2837
2838
2839
2840

<p>
  For doclists for which the term appears in more than one column of the FTS
  virtual table, term-offset lists within the doclist are stored in column 
  number order. This ensures that the term-offset list associated with 
  column 0 (if any) is always first, allowing the first two fields of the
  term-offset list to be omitted in this case.

<h1 tags="bugs">Limitations</h1>

<h2> UTF-16 byte-order-mark problem </h2>

For UTF-16 databases, when using the "simple" tokenizer, it is possible to use
malformed unicode strings to cause the integrity-check to falsely report
corruption, or for auxiliary functions to return incorrect results. More
specifically, the bug can be triggered by any of the following:

<ul>
  <li><p>A UTF-16 byte-order-mark is embedded at the beginning of an SQL string
       literal value inserted into an FTS3 table. For example:

<codeblock>
    INSERT INTO fts_table(col) VALUES('<b>{BOM}</b>text...');
</codeblock>
      <p>where {BOM} is a UTF-16 byte-order-mark, a 16-bit integer value 0xFFFE
      in either big or little endian format.

  <li><p>Malformed UTF-8 that SQLite converts to a UTF-16 byte-order-mark is
       embedded at the beginning of an SQL string literal value inserted 
       into an FTS3 table.

  <li><p>A text value created by casting a blob that begins with the two
       bytes 0xFF and 0xFE, in either possible order, is inserted into an
       FTS3 table. For example:
       
<codeblock>
    INSERT INTO fts_table(col) VALUES(CAST(X'FEFF' AS TEXT));
</codeblock>
</ul>

No problems occur if all unicode strings used with FTS3/4 are well-formed.
UTF-16 byte-order-marks may be safely used at the start of strings passed
to [sqlite3_bind_text16()], [sqlite3_prepare16()] and other similar APIs.


<h1 id=appendix_a nonumber tags="search application tips">
  Appendix A: Search Application Tips
</h1>

<p>
  FTS is primarily designed to support Boolean full-text queries - queries
3060
3061
3062
3063
3064
3065
3066


  return;

<i>  /* Jump here if the wrong number of arguments are passed to this function */</i>
wrong_number_args:
  sqlite3_result_error(pCtx, "wrong number of arguments to function rank()", -1);
}
</codeblock>









>
>
3097
3098
3099
3100
3101
3102
3103
3104
3105
  return;

<i>  /* Jump here if the wrong number of arguments are passed to this function */</i>
wrong_number_args:
  sqlite3_result_error(pCtx, "wrong number of arguments to function rank()", -1);
}
</codeblock>