Small. Fast. Reliable.
Choose any three.

This information is obsolete. You are looking at the CVSTrac source management system display for SQLite that was replaced by Fossil on 2009-08-11. The information shown here has not been updated since that cut-over. These pages are retained for historical reference only.

This page documents the SQL-level interface to the full-text-search module. fts3 usage is documented, fts2 and fts1 are mostly identical.

Status: DRAFT. I consider this information correct, but haven't done a final review.

Creating and Destroying Tables

    sql-command ::= CREATE VIRTUAL TABLE [ database-name .] table-name USING fts3 [( [ argument [, argument ]*] )]

    argument ::= name | TOKENIZE tokenizer

    tokenizer ::= SIMPLE | PORTER | user-defined

If no tokenizer is requested, then the SIMPLE tokenizer is used. Additional user-defined tokenizers may be built into a particular SQLite library, SIMPLE and PORTER are the only tokenizers provided by default. If no column names are requested, then the single column content is provided. Thus, the following statements all create a table with a single column content using tokenizer SIMPLE:

  CREATE VIRTUAL TABLE t USING fts3;
  CREATE VIRTUAL TABLE t USING fts3();
  CREATE VIRTUAL TABLE t USING fts3(content);
  CREATE VIRTUAL TABLE t USING fts3(TOKENIZE simple);
  CREATE VIRTUAL TABLE t USING fts3(content, TOKENIZE simple);

The column names docid and tokenize are reserved and should not be used in fts3 create statements. Multiple column names can be provided:

  CREATE VIRTUAL TABLE t USING fts3(name, address);

Only the first word of a column definition is considered in creating the table - all extraneous text is ignored. So the following two statements are identical:

  CREATE VIRTUAL TABLE t USING fts3(name);
  CREATE VIRTUAL TABLE t USING fts3(name NOT NULL UNIQUE);

Any additional information, including constraints and type information, is ignored.

Destroying an fts table uses the standard SQLite DROP TABLE syntax.

Accessing fts tables

fts tables are accessed using SELECT, INSERT, UPDATE, and DELETE statements exactly as regular SQLite tables. The set of columns which can be referenced are exactly those enumerated in the create statement, all of which will appear to be of type TEXT. The table has the implicit unique index on rowid, plus the full-text index (see next section), but otherwise allows for no indices. A query such as:

  SELECT rowid FROM t WHERE content LIKE 'This%';

will perform a full table scan over all rows.

In addition to the explicitly-enumerated columns, the following columns are implicitly present:

  rowid - a unique 64-bit integer per row, as for a regular table.
  docid - the preferred alias for rowid.
  <table-name> - a "magic" column with the same name as the table, used for snippet generation.

These columns should not be referenced in the create statement, and will not appear to exist except when explicitly referenced by name in SELECT or INSERT statements.

When inserting into an fts table, if no docid is provided, then one is generated in the same way as values are generated for INTEGER PRIMARY KEY columns in regular SQLite tables. This docid can be accessed using LAST_INSERT_ROWID():

  INSERT INTO t VALUES ('This is a test');
  SELECT LAST_INSERT_ROWID();

The rowid or docid can be provided to INSERT statements, but providing both is an error:

  INSERT INTO t (rowid, content) VALUES (10, 'This is a test');
  INSERT INTO t (docid, content) VALUES (11, 'This is a test');
  -- Generates a new docid
  INSERT INTO t (docid, content) VALUES (null, 'This is a test');
  -- Throws an error
  INSERT INTO t (docid, rowid, content) VALUES (10, 11, 'This is a test');

fts tables do not allow UPDATE on docid, the following statements throw an error:

  UPDATE t SET docid = docid + 1 WHERE content = 'This is a test';
  UPDATE t SET rowid = rowid + 1 WHERE content = 'This is a test';

Using the Fulltext Index

fts tokenizes its input data and builds an index over the resulting terms. To do queries using this index, a new MATCH operator is introduced. An example fts table:

  CREATE VIRTUAL TABLE recipe USING fts3(name, ingredients);
  INSERT INTO recipe VALUES (
    'broccoli stew', 'broccoli peppers cheese tomatoes'
  );
  INSERT INTO recipe VALUES (
    'pumpkin stew', 'pumpkin onions garlic celery'
  );
  INSERT INTO recipe VALUES (
    'broccoli pie', 'broccoli cheese onions flour'
  );
  INSERT INTO recipe VALUES (
    'pumpkin pie', 'pumpkin sugar flour butter'
  );

To find recipes which are made with onions and cheese:

  SELECT docid FROM recipe WHERE ingredients MATCH 'onions cheese';

The string to the right of the MATCH is tokenized, each token is found in the index, and the set of documents where both tokens are present is returned. Since the MATCH has column ingredients on the left, this only finds recipes where the tokens are present in that column. A search over all columns can be done using the column with the same name as the table:

  SELECT docid FROM recipe WHERE recipe MATCH 'stew';

The query language for the right-hand-side of the MATCH operator has a few variants, here described by example:

  -- Recipes with 'onions' and 'cheese' each in any column of the row.
  SELECT * FROM recipe WHERE recipe MATCH 'onions cheese';

  -- Recipes with either 'onions' or 'cheese' each in any column of
  -- the row.  OR must be upper-case.
  SELECT * FROM recipe WHERE recipe MATCH 'onions OR cheese';

  -- Recipes with 'stew' in the name column and 'onions' in the
  -- ingredients column.
  SELECT * FROM recipe WHERE recipe MATCH 'name:stew ingredients:onions';

  -- Recipes with 'green' and 'onions' next to each other in that
  -- order, in any field.  Known as "phrase search".
  SELECT * FROM recipe WHERE recipe MATCH '"green onions"';

  -- Recipes which use onions but not cheese.
  SELECT * FROM recipe WHERE ingredients MATCH 'onions -cheese';

  -- Recipes which have 'onions' in any field and 'cheese' in none.
  SELECT * FROM recipe WHERE recipe MATCH 'onions -cheese';

  -- Prefix search, finds recipes with words that start with the
  -- characters 'bu' (such as "butter").
  SELECT * FROM recipe WHERE recipe MATCH 'bu*';

NOTE that the table-named column will not participate if you alias a table in a statement. So this statement will not work:

  SELECT docid FROM recipe AS ra WHERE ra MATCH 'stew';

it must be written as:

  SELECT docid FROM recipe AS ra WHERE recipe MATCH 'stew';

NOTE that MATCH cannot be used multiple times against the same table. So the following query, while valid syntax, will cause an error:

  SELECT docid FROM recipe WHERE recipe MATCH 'onions' AND recipe MATCH 'cheese';

It must be written as:

  SELECT docid FROM recipe WHERE recipe MATCH 'onions cheese';

Attachments:

  • snippetsize.c 4872 bytes added by anonymous on 2009-May-09 22:32:54 UTC.
    Here is a modification to FTS3 that allows the snippet text around size to be user defined:

    sqlite> select name, snippet(poem, '[', ']', '%%', 200) from poem where text match 'land';