Documentation Source Text

Check-in [1483127f32]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updated documentation related to fts3_tokenizer() and the new SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 1483127f3230b4c6d32c6fdfd16911577824f941
User & Date: drh 2016-02-26 15:54:27.531
Context
2016-02-27
14:20
Include "OR ROLLBACK" among the examples of how to omit statement journals. (check-in: 237bf9365d user: drh tags: trunk)
2016-02-26
15:54
Updated documentation related to fts3_tokenizer() and the new SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER. (check-in: 1483127f32 user: drh tags: trunk)
2016-02-23
00:26
New entries in the change log. (check-in: 38697e2bb0 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/changes.in.
23
24
25
26
27
28
29






30
31
32
33
34
35
36
chng {2016-04-00 (3.12.0)} {
<p><b>Performance enhancements:</b>
<li>Enhancements to the [https://www.sqlite.org/src/doc/trunk/doc/lemon.html|Lemon]
    parser generator so that it creates a smaller and faster SQL parser.
<li>Only create [master journal] files if two or more attached databases are
(1) modified, (2) do not have [PRAGMA synchronous] set to OFF, and
(3) do not have the [journal_mode] set to OFF, MEMORY, or WAL.






<p><b>Bug fixes:</b>
<li>Make sure the [sqlite3_set_auxdata()] values from multiple triggers
    within a single statement do not interfere with one another.
    Fix for ticket [https://www.sqlite.org/src/info/dc9b1c91|dc9b1c91].
}

chng {2016-02-15 (3.11.0)} {







>
>
>
>
>
>







23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
chng {2016-04-00 (3.12.0)} {
<p><b>Performance enhancements:</b>
<li>Enhancements to the [https://www.sqlite.org/src/doc/trunk/doc/lemon.html|Lemon]
    parser generator so that it creates a smaller and faster SQL parser.
<li>Only create [master journal] files if two or more attached databases are
(1) modified, (2) do not have [PRAGMA synchronous] set to OFF, and
(3) do not have the [journal_mode] set to OFF, MEMORY, or WAL.
<p><b>New Features:</b>
<li>Added the [SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER] option to [sqlite3_db_config()]
    which allows the two-argument version of the [fts3_tokenizer()] SQL function to
    be enabled or disabled at run-time.
<li>The [PRAGMA defer_foreign_keys=ON] statement now also disables 
    [foreign key actions|RESTRICT actions] on foreign key.
<p><b>Bug fixes:</b>
<li>Make sure the [sqlite3_set_auxdata()] values from multiple triggers
    within a single statement do not interfere with one another.
    Fix for ticket [https://www.sqlite.org/src/info/dc9b1c91|dc9b1c91].
}

chng {2016-02-15 (3.11.0)} {
Changes to pages/compile.in.
645
646
647
648
649
650
651
652

653
654
655




656
657
658
659
660
661
662

COMPILE_OPTION {SQLITE_ENABLE_FTS3_TOKENIZER} {
  This option enables the two-argument version of the [fts3_tokenizer()]
  interface.  The second argument to fts3_tokenizer() is suppose to be a
  pointer to a function (encoded as a BLOB) that implements an
  application defined tokenizer.  If hostile actors are able to run
  the two-argument version of fts3_tokenizer() with an arbitrary second
  argument, they could use crash or take control of the process.  Because 

  of ongoing security concerns about this (seldom-used) feature, it is
  disabled beginning with [Version 3.11.0] unless this compile-time
  option is used.




}

COMPILE_OPTION {SQLITE_ENABLE_FTS4} {
  When this option is defined in the [amalgamation], versions 3 and 4
  of the full-text search engine is added to the build automatically.
}








|
>
|
|

>
>
>
>







645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667

COMPILE_OPTION {SQLITE_ENABLE_FTS3_TOKENIZER} {
  This option enables the two-argument version of the [fts3_tokenizer()]
  interface.  The second argument to fts3_tokenizer() is suppose to be a
  pointer to a function (encoded as a BLOB) that implements an
  application defined tokenizer.  If hostile actors are able to run
  the two-argument version of fts3_tokenizer() with an arbitrary second
  argument, they could use crash or take control of the process. 
  <p>
  Because of security concerns, the two-argument fts3_tokenizer() feature 
  was disabled beginning with [Version 3.11.0] unless this compile-time
  option is used.
  [Version 3.12.0] added the 
  [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER],1,0) interface
  that activates the two-argument version of [fts3_tokenizer()]
  for a specific [database connection] at run-time.
}

COMPILE_OPTION {SQLITE_ENABLE_FTS4} {
  When this option is defined in the [amalgamation], versions 3 and 4
  of the full-text search engine is added to the build automatically.
}

Changes to pages/fts3.in.
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294

<p>
  The arguments passed to the "tokenchars=" or "separators=" options are 
  case-sensitive. In the example above, specifying that "X" is a separator
  character does not affect the way "x" is handled.

<tcl>hd_fragment f3tknzr {fts3_tokenizer}</tcl>
<h2>Custom (User Implemented) Tokenizers</h2>

<p>
  As well as the built-in "simple", "porter" and (possibly) "icu" and
  "unicode61" tokenizers, if the library is compiled with the following 
  compiler option:

<codeblock>
  -DSQLITE_ENABLE_FTS3_TOKENIZER
</codeblock>

<p>
  then FTS exports an interface that allows users to implement custom
  tokenizers using C. The interface used to create a new tokenizer is defined
  and described in the fts3_tokenizer.h source file.

<p>
  Registering a new FTS tokenizer is similar to registering a new
  virtual table module with SQLite. The user passes a pointer to a
  structure containing pointers to various callback functions that
  make up the implementation of the new tokenizer type. For tokenizers,







|


|
|
<
|
<
<
<
<
<
<
|







2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278

2279






2280
2281
2282
2283
2284
2285
2286
2287

<p>
  The arguments passed to the "tokenchars=" or "separators=" options are 
  case-sensitive. In the example above, specifying that "X" is a separator
  character does not affect the way "x" is handled.

<tcl>hd_fragment f3tknzr {fts3_tokenizer}</tcl>
<h2>Custom (Application Defined) Tokenizers</h2>

<p>
  In addition to providing built-in "simple", "porter" and (possibly) "icu" and
  "unicode61" tokenizers,

  FTS provides an interface for applications to implement and register custom






  tokenizers written in C.  The interface used to create a new tokenizer is defined
  and described in the fts3_tokenizer.h source file.

<p>
  Registering a new FTS tokenizer is similar to registering a new
  virtual table module with SQLite. The user passes a pointer to a
  structure containing pointers to various callback functions that
  make up the implementation of the new tokenizer type. For tokenizers,
2315
2316
2317
2318
2319
2320
2321

2322
2323
2324
2325


2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337

2338
2339
2340
2341
2342
2343
2344
2345
  it is registered as tokenizer &lt;tokenizer-name&gt; and a copy of it
  returned. If only one argument is passed, a pointer to the tokenizer
  implementation currently registered as &lt;tokenizer-name&gt; is returned,
  encoded as a blob. Or, if no such tokenizer exists, an SQL exception
  (error) is raised.

<p>

  As of SQLite version 3.11.0, the second form of the fts3_tokenizer() function
  is only available if the library is compiled with the
  -DSQLITE_ENABLE_FTS3_TOKENIZER compiler switch. In earlier versions it was
  always available.



<p>
  <b>SECURITY WARNING</b>: 
  If a version of the fts3/4 extension that supports the second form of
  fts3_tokenizer() is deployed in an environment where potentially malicious
  users may execute arbitrary SQL, they should be prevented from invoking the
  fts3_tokenizer() function, possibly using the 
  [sqlite3_set_authorizer()|authorization callback].

<p>
  <b>SECURITY UPDATE</b> for [Version 3.11.0]:
  Because of continuing concern, the two-argument version of fts3_tokenizer()

  is disabled unless SQLite is compiled with [SQLITE_ENABLE_FTS3_TOKENIZER].

<p>
  The following block contains an example of calling the fts3_tokenizer()
  function from C code:

<codeblock>
  <i>/*







>
|
|
|
|
>
>



|
|
|
|
|
|
<
<
<
>
|







2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330



2331
2332
2333
2334
2335
2336
2337
2338
2339
  it is registered as tokenizer &lt;tokenizer-name&gt; and a copy of it
  returned. If only one argument is passed, a pointer to the tokenizer
  implementation currently registered as &lt;tokenizer-name&gt; is returned,
  encoded as a blob. Or, if no such tokenizer exists, an SQL exception
  (error) is raised.

<p>
  Because of security concerns, SQLite version 3.11.0 only enabled the
  second form of the fts3_tokenizer() function when the library is compiled
  with the [SQLITE_ENABLE_FTS3_TOKENIZER | -DSQLITE_ENABLE_FTS3_TOKENIZER]
  option. In earlier versions it was
  always available.  Beginning with SQLite version 3.12.0, the second form of
  fts3_tokenizer() can also be activated at run-time by calling
  [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER],1,0).

<p>
  <b>SECURITY WARNING</b>: 
  If a version of the fts3/4 extension that supports the two-argument form of
  fts3_tokenizer() is deployed in an environment where malicious users can
  run arbitrary SQL, then those users should be prevented from invoking the 
  two-argument fts3_tokenizer() function.
  This can be done using the [sqlite3_set_authorizer()|authorization callback], 
  or by disabling the two-argument fts3_tokenizer() interface using a



  call to
  [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_FTS3_TOKENIZER],0,0).

<p>
  The following block contains an example of calling the fts3_tokenizer()
  function from C code:

<codeblock>
  <i>/*