Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Update fts3.html with the new "order" and "prefix" options. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
074255f9eb46a364d5cac4281731bda6 |
User & Date: | dan 2011-06-14 19:38:18.048 |
Context
2011-06-15
| ||
12:23 | Update the limits.html document for the new upper bound on SQLITE_MAX_ATTACHED and to describe the maximum database size and the maximum number of rows in a table. (check-in: b91558ccbf user: drh tags: trunk) | |
2011-06-14
| ||
19:38 | Update fts3.html with the new "order" and "prefix" options. (check-in: 074255f9eb user: dan tags: trunk) | |
2011-06-08
| ||
15:11 | Fix a typo in the how-to-corrupt document. (check-in: a1408d26c6 user: drh tags: trunk) | |
Changes
Changes to pages/fts3.in.
︙ | ︙ | |||
180 181 182 183 184 185 186 | <codeblock> <i>-- Create, then immediately drop, an FTS4 table.</i> CREATE VIRTUAL TABLE data USING fts4(); DROP TABLE data; </codeblock> | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | <codeblock> <i>-- Create, then immediately drop, an FTS4 table.</i> CREATE VIRTUAL TABLE data USING fts4(); DROP TABLE data; </codeblock> <h2>Populating FTS Tables</h2> <p> FTS tables are populated using [INSERT], [UPDATE] and [DELETE] statements in the same way as ordinary SQLite tables are. <p> |
︙ | ︙ | |||
1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 | During a transaction, some of the data written to an FTS table may be cached in memory and written to the database only when the transaction is committed. However the implementation of the fts4aux module is only able to read data from the database. In practice this means that if an fts4aux table is queried from within a transaction in which the associated FTS table has been modified, the results of the query are likely to reflect only a (possibly empty) subset of the changes made. <h1 id=tokenizer tags="tokenizer">Tokenizers</h1> <p> An FTS tokenizer is a set of rules for extracting terms from a document or basic FTS full-text query. | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 | During a transaction, some of the data written to an FTS table may be cached in memory and written to the database only when the transaction is committed. However the implementation of the fts4aux module is only able to read data from the database. In practice this means that if an fts4aux table is queried from within a transaction in which the associated FTS table has been modified, the results of the query are likely to reflect only a (possibly empty) subset of the changes made. <h1 id=fts4_options tags="FTS4 options">FTS4 Options</h1> <p> <tcl>hd_fragment *matchinfo_fts3 {matchinfo_fts3}</tcl> If the "CREATE VIRTUAL TABLE" statement specifies module FTS4 (not FTS3), then special directives - FTS4 options - similar to the "tokenize=*" option may also appear in place of column names. An FTS4 option consists of the option name, followed by an "=" character, followed by the option value. The option value may optionally be enclosed in single or double quotes, with embedded quote characters escaped in the same way as for SQL literals. There may not be whitespace on either side of the "=" character. For example, to create an FTS4 table with the value of option "matchinfo" set to "fts3": <codeblock> <i>-- Create a reduced-footprint FTS4 table.</i> CREATE VIRTUAL TABLE papers USING fts4(author, document, matchinfo=fts3); </codeblock> <p> FTS4 currently supports the following three options: <table striped=1> <tr><th>Option<th>Interpretation <tr><td>matchinfo<td> This option may only be set to the value "fts3". Attempting to set it otherwise is an error. If this option is specified, then some of the extra information stored by FTS4 is omitted. This reduces the amount of disk space consumed by an FTS4 table until it is almost the same as the amount that would be used by the equivalent FTS3 table, but also means that the data accessed by passing the 'l' flag to the [matchinfo()] function is not available. <tr><td>compress<td> This option is used to specify the compress function. It is an error to specify a compress function without also specifying an uncompress function. [fts4 compress option|See below] for details. <tr><td>uncompress<td> This option is used to specify the uncompress function. It is an error to specify an uncompress function without also specifying a compress function. [fts4 compress option|See below] for details. <tr><td>order<td> The "order" option may be set to either "DESC" or "ASC" (in upper or lower case). If it is set to "DESC", then FTS4 stores its data in such a way as to optimize returning results in descending order by docid. If it is set to "ASC" (the default), then the data structures are optimized for returning results in ascending order by docid. In other words, if many of the queries run against the FTS4 table use "ORDER BY docid DESC", then it may improve performance to add the "order=desc" option to the CREATE VIRTUAL TABLE statement. <tr><td>prefix<td> This option may be set to a comma-separated list of positive non-zero integers. For each integer N in the list, a separate index is created in the database file to optimize [FTS MATCH|term prefix] queries where the query term is N bytes in length, not including the '*' character, when encoded using UTF-8. [fts4 prefix option|See below] for details. </table> <p> When using FTS4, specifying a column name that contains an "=" character and is not either a "tokenize=*" specification or a recognized FTS4 option is an error. With FTS3, the first token in the unrecognized directive is interpreted as a column name. Similarly, specifying multiple "tokenize=*" directives in a single table declaration is an error when using FTS4, whereas the second and subsequent "tokenize=*" directives are interpreted as column names by FTS3. For example: <codeblock> <i>-- An error. FTS4 does not recognize the directive "xyz=abc".</i> CREATE VIRTUAL TABLE papers USING fts4(author, document, xyz=abc); <i>-- Create an FTS3 table with three columns - "author", "document"</i> <i>-- and "xyz".</i> CREATE VIRTUAL TABLE papers USING fts3(author, document, xyz=abc); <i>-- An error. FTS4 does not allow multiple tokenize=* directives</i> CREATE VIRTUAL TABLE papers USING fts4(tokenize=porter, tokenize=simple); <i>-- Create an FTS3 table with a single column named "tokenize". The</i> <i>-- table uses the "porter" tokenizer.</i> CREATE VIRTUAL TABLE papers USING fts3(tokenize=porter, tokenize=simple); <i>-- An error. Cannot create a table with two columns named "tokenize".</i> CREATE VIRTUAL TABLE papers USING fts3(tokenize=porter, tokenize=simple, tokenize=icu); </codeblock> <tcl>hd_fragment *fts4compression {compressed FTS4 content}</tcl> <h2 tags="fts4 compress option"> Compress/Uncompress </h2> <p> The compress and uncompress options allow FTS4 content to be stored in the database in a compressed form. Both options should be set to the name of an SQL scalar function registered using [sqlite3_create_function()] that accepts a single argument. <p> The compress function should return a compressed version of the value passed to it as an argument. Each time data is written to the FTS4 table, each column value is passed to the compress function and the result value stored in the database. The compress function may return any type of SQLite value (blob, text, real, integer or null). <p> The uncompress function should uncompress data previously compressed by the compress function. In other words, for all SQLite values X, it should be true that uncompress(compress(X)) equals X. When data that has been compressed by the compress function is read from the database by FTS4, it is passed to the uncompress function before it is used. <p> If the specified compress or uncompress functions do not exist, the table may still be created. An error is not returned until the FTS4 table is read (if the uncompress function does not exist) or written (if it is the compress function that does not exist). <codeblock> <i>-- Create an FTS4 table that stores data in compressed form. This</i> <i>-- assumes that the scalar functions zip() and unzip() have been (or</i> <i>-- will be) added to the database handle.</i> CREATE VIRTUAL TABLE papers USING fts4(author, document, compress=zip, uncompress=unzip); </codeblock> <h2 tags="fts4 prefix option">Prefix</h2> <p> The FTS4 prefix option causes FTS to index term prefixes of specified lengths in the same way that it always indexes complete terms. The prefix option must be set to a comma separated list of positive non-zero integers. For each value N in the list, prefixes of length N bytes (when encoded using UTF-8) are indexed. FTS4 uses term prefix indexes to speed up term prefix queries. The cost, of course, is that indexing term prefixes as well as complete terms increases the database size and slows down write operations on the FTS4 table. <p> Prefix indexes may be used to optimize term prefix queries in two cases. If the query is for a prefix of N bytes, then a prefix index created with "prefix=N" provides the best optimization. Or, if no "prefix=N" index is available, a "prefix=N+1" index imay be used instead. Using a "prefix=N+1" index is less efficient than a "prefix=N" index, but is better than no prefix index at all. <codeblock> <i>-- Create an FTS4 table with indexes to optimize 2 and 4 byte term prefix queries.</i> CREATE VIRTUAL TABLE t1 USING fts4(c1, c2, prefix="2,4"); <i>-- The following two queries are both optimized using the prefix indexes.</i> SELECT * FROM t1 WHERE t1 MATCH 'ab*'; SELECT * FROM t1 WHERE t1 MATCH 'abcd*'; <i>-- The following two queries are both partially optimized using the prefix</i> <i>-- indexes. The optimization is not as pronounced as it is for the queries</i> <i>-- above, but still an improvement over no prefix indexes at all.</i> SELECT * FROM t1 WHERE t1 MATCH 'a*'; SELECT * FROM t1 WHERE t1 MATCH 'abc*'; </codeblock> <h1 id=tokenizer tags="tokenizer">Tokenizers</h1> <p> An FTS tokenizer is a set of rules for extracting terms from a document or basic FTS full-text query. |
︙ | ︙ |