Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Corrections to the spellfix1 documentation. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
aa19d1fdb59e88acdfcafd2a01617ed9 |
User & Date: | drh 2013-12-12 13:23:46.200 |
Context
2013-12-12
| ||
14:10 | Fix missing "is" in the query planner document. (check-in: 200941601a user: drh tags: trunk) | |
13:23 | Corrections to the spellfix1 documentation. (check-in: aa19d1fdb5 user: drh tags: trunk) | |
2013-12-06
| ||
15:50 | Version 3.8.2 (check-in: e795bba129 user: drh tags: trunk, release, version-3.8.2) | |
Changes
Changes to pages/spellfix1.in.
1 2 3 4 5 6 7 8 9 10 11 12 | <title>The spellfix1 virtual table</title> <tcl> hd_keywords {spellfix1}</tcl> <h1 align='center'>The Spellfix1 Virtual Table</h1> <p>This spellfix1 [virtual table] can be used to search a large vocabulary for close matches. For example, spellfix1 can be used to suggest corrections to misspelled words. Or, it could be used with [FTS4] to do full-text search using potentially misspelled words. <p>The implementation for the spellfix1 virtual table is held in the | | > > | | < < < < | | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | <title>The spellfix1 virtual table</title> <tcl> hd_keywords {spellfix1}</tcl> <h1 align='center'>The Spellfix1 Virtual Table</h1> <p>This spellfix1 [virtual table] can be used to search a large vocabulary for close matches. For example, spellfix1 can be used to suggest corrections to misspelled words. Or, it could be used with [FTS4] to do full-text search using potentially misspelled words. <p>The implementation for the spellfix1 virtual table is held in the SQLite source tree in the miscellaneous extensions folder and in particular in the file [http://www.sqlite.org/src/finfo?name=ext/misc/spellfix.c|ext/misc/spellfix1.c]. The spellfix1 virtual table is not included in the SQLite [amalgamation] and is not a part of any standard SQLite build. It is a [loadable extension]. <p>Once the spellfix1 extension is loaded, an instance of the spellfix1 virtual table is created like this: <blockquote><pre> CREATE VIRTUAL TABLE demo USING spellfix1; </pre></blockquote> <p>The "spellfix1" term is the name of the spellfix module and must be entered as shown. The "demo" term is the name of the virtual table you will be creating and can be altered to suit the needs of your application. The virtual table is initially empty. In order for the virtual table to be useful, you will need to populate it with your vocabulary. Suppose you have a list of words in a table named "big_vocabulary". Then do this: <blockquote><pre> INSERT INTO demo(word) SELECT word FROM big_vocabulary; </pre></blockquote> <p>If you intend to use this virtual table in cooperation with an [FTS4] table (for spelling correctly of search terms) then you might extract the vocabulary using an [fts4aux] table: <blockquote><pre> INSERT INTO demo(word) SELECT term FROM search_aux WHERE col='*'; </pre></blockquote> <p>You can also provide the virtual table with a "rank" for each word. The "rank" is an estimate of how common the word is. Larger numbers |
︙ | ︙ | |||
136 137 138 139 140 141 142 | the search will be against language 0 (English in the example above.) All spellfix1 searches are against a single language id. There is no way to search all languages at once. <h2>Virtual Table Details</h2> | > | | | 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | the search will be against language 0 (English in the example above.) All spellfix1 searches are against a single language id. There is no way to search all languages at once. <h2>Virtual Table Details</h2> <p>Each row in the spellfix1 virtual table has a unique rowid with seven columns plus five extra hidden columns. The columns are as follows: <blockquote><dl> <dt><p><b>rowid</b><dd> A unique integer number associated with each vocabulary item in the table. This can be used as a foreign key on other tables in the database. |
︙ | ︙ | |||
220 221 222 223 224 225 226 | SPELLINGS section below for details. <dt><p><b>command</b><dd> (HIDDEN) The value of the "command" column is always NULL. However, applications can insert special strings into the "command" column in order to provoke certain behaviors in the spellfix1 virtual table. For example, inserting the string 'reset' into the "command" column | | | 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | SPELLINGS section below for details. <dt><p><b>command</b><dd> (HIDDEN) The value of the "command" column is always NULL. However, applications can insert special strings into the "command" column in order to provoke certain behaviors in the spellfix1 virtual table. For example, inserting the string 'reset' into the "command" column will cause the virtual table to reread its edit distance weights (if there are any). </dl></blockquote> <h2>Algorithm</h2> <p>The spellfix1 virtual table creates a single shadow table named "%_vocab" (where the % is replaced by the name of |
︙ | ︙ | |||
275 276 277 278 279 280 281 | Levenshtein distance between a pattern and a word. This function is exposed as spellfix1_editdist(X,Y). The edit distance function returns the "cost" of converting X into Y. Some transformations cost more than others. Changing one vowel into a different vowel, for example is relatively cheap, as is doubling a constant, or omitting the second character of a double-constant. Other transformations or more expensive. The idea is that the edit distance function returns | | | 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 | Levenshtein distance between a pattern and a word. This function is exposed as spellfix1_editdist(X,Y). The edit distance function returns the "cost" of converting X into Y. Some transformations cost more than others. Changing one vowel into a different vowel, for example is relatively cheap, as is doubling a constant, or omitting the second character of a double-constant. Other transformations or more expensive. The idea is that the edit distance function returns a low cost for words that are similar and a higher cost for words that are further apart. In this implementation, the maximum cost of any single-character edit (delete, insert, or substitute) is 100, with lower costs for some edits (such as transforming vowels). <p>The "score" for a comparison is the edit distance between the pattern and the word, adjusted down by the base-2 logarithm of the word rank. For example, a match with distance 100 but rank 1000 would have a |
︙ | ︙ | |||
422 423 424 425 426 427 428 | <p>The source code module that implements the spellfix1 virtual table also implements several SQL functions that might be useful to applications that employ spellfix1 or for testing or diagnostic work while developing applications that use spellfix1. The following auxiliary functions are available: <blockquote><dl> | | | 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | <p>The source code module that implements the spellfix1 virtual table also implements several SQL functions that might be useful to applications that employ spellfix1 or for testing or diagnostic work while developing applications that use spellfix1. The following auxiliary functions are available: <blockquote><dl> <dt><p><b>editdist3(P,W)<br>editdist3(P,W,L)<br>editdist3(T)</b><dd> These routines provide direct access to the version of the Wagner edit-distance function that allows for application-defined weights on edit operations. The first two forms of this function compare pattern P against word W and return the edit distance. In the first function, the langid is assumed to be 0 and in the second, the langid is given by the L parameter. The third form of this function reloads edit distance coefficients from the table named by T. |
︙ | ︙ |