Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Enhancements to the FTS3/4 documentation about the limitations of BOMs in string literals with the simple tokenizer in a UTF16 database. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
4d11f30804f6b8ae8cd364732ce5a435 |
User & Date: | drh 2020-01-29 23:04:30.305 |
Context
2020-01-31
| ||
20:15 | Attempt to clarify the CREATE TABLE documentation. In the generated column documentation, make it clear that SQLite deliberately ignores the collating sequence of the AS expression when determining the collating sequence of the generated column. (check-in: 91fa93da86 user: drh tags: trunk) | |
2020-01-29
| ||
23:04 | Enhancements to the FTS3/4 documentation about the limitations of BOMs in string literals with the simple tokenizer in a UTF16 database. (check-in: 4d11f30804 user: drh tags: trunk) | |
21:29 | Add the SQLITE_OMIT_AUTOINIT compile-time option to the set of recommended compile-time options. (check-in: f250d55692 user: drh tags: trunk) | |
Changes
Changes to pages/fts3.in.
︙ | ︙ | |||
2796 2797 2798 2799 2800 2801 2802 | term-offset list to be omitted in this case. <h1 tags="bugs">Limitations</h1> <h2> UTF-16 byte-order-mark problem </h2> For UTF-16 databases, when using the "simple" tokenizer, it is possible to use | | > | | | | < < | > > | | | > > > > > > > > > > > | 2796 2797 2798 2799 2800 2801 2802 2803 2804 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 | term-offset list to be omitted in this case. <h1 tags="bugs">Limitations</h1> <h2> UTF-16 byte-order-mark problem </h2> For UTF-16 databases, when using the "simple" tokenizer, it is possible to use malformed unicode strings to cause the <a href="#integcheck">integrity-check special command</a> to falsely report corruption, or for <a href="#snippet">auxiliary functions</a> to return incorrect results. More specifically, the bug can be triggered by any of the following: <ul> <li><p>A UTF-16 byte-order-mark (BOM) is embedded at the beginning of an SQL string literal value inserted into an FTS3 table. For example: <codeblock> INSERT INTO fts_table(col) VALUES(char(0xfeff)||'text...'); </codeblock> <li><p>Malformed UTF-8 that SQLite converts to a UTF-16 byte-order-mark is embedded at the beginning of an SQL string literal value inserted into an FTS3 table. <li><p>A text value created by casting a blob that begins with the two bytes 0xFF and 0xFE, in either possible order, is inserted into an FTS3 table. For example: <codeblock> INSERT INTO fts_table(col) VALUES(CAST(X'FEFF' AS TEXT)); </codeblock> </ul> Everything works correctly if any of the following are true: <ul> <li> The [encoding|database encoding] is UTF-8. <li> All text strings are insert using one of the [sqlite3_bind_text()] family of functions. <li> Literal strings contain no byte-order-marks. <li> A tokenizer is used that recognizes byte-order-marks as whitespace. (The default "simple" tokenizer for FTS3/4 does not think that BOMs are whitespace, but the unicode tokenizer does.) </ul> All of the above conditions must be false in order for problems to occur. And even if all of the conditiona above are false, most things will still operator correctly. Only the <a href="#integcheck">integrity-check</a> command and the <a href="#snippet">auxiliary functions</a> might given unexpected results. <h1 id=appendix_a nonumber tags="search application tips"> Appendix A: Search Application Tips </h1> <p> FTS is primarily designed to support Boolean full-text queries - queries |
︙ | ︙ | |||
3097 3098 3099 3100 3101 3102 3103 | return; <i> /* Jump here if the wrong number of arguments are passed to this function */</i> wrong_number_args: sqlite3_result_error(pCtx, "wrong number of arguments to function rank()", -1); } </codeblock> | < < | 3109 3110 3111 3112 3113 3114 3115 | return; <i> /* Jump here if the wrong number of arguments are passed to this function */</i> wrong_number_args: sqlite3_result_error(pCtx, "wrong number of arguments to function rank()", -1); } </codeblock> |