Documentation Source Text

Check-in [aa19d1fdb5]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Corrections to the spellfix1 documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: aa19d1fdb59e88acdfcafd2a01617ed9690913a6
User & Date: drh 2013-12-12 13:23:46.200
Context
2013-12-12
14:10
Fix missing "is" in the query planner document. (check-in: 200941601a user: drh tags: trunk)
13:23
Corrections to the spellfix1 documentation. (check-in: aa19d1fdb5 user: drh tags: trunk)
2013-12-06
15:50
Version 3.8.2 (check-in: e795bba129 user: drh tags: trunk, release, version-3.8.2)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/spellfix1.in.
1
2
3
4
5
6
7
8
9
10
11
12
13


14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
<title>The spellfix1 virtual table</title>
<tcl>
hd_keywords {spellfix1}</tcl>
<h1 align='center'>The Spellfix1 Virtual Table</h1>

<p>This spellfix1 [virtual table] can be used to search
a large vocabulary for close matches.  For example, spellfix1
can be used to suggest corrections to misspelled words.  Or,
it could be used with [FTS4] to do full-text search using potentially
misspelled words.

<p>The implementation for the spellfix1 virtual table is held in the
canonical SQLite source tree in the file src/test_spellfix1.c.  The


spellfix1 virtual table is not included in the SQLite [amalgamation]
and is not a part of any standard SQLite build.  Applications that
want to make use of spellfix1 should obtain a copy of the src/test_spellfix1.c
source file and compile it as a shared library or DLL.  Then use the
[sqlite3_load_extension()] interface at run-time to load this extension
into the application.

<p>Once the extension is loaded, an instance of the spellfix1 virtual table
is created like this:

<blockquote><pre>
CREATE VIRTUAL TABLE demo USING spellfix1;
</pre></blockquote>

<p>The "spellfix1" term is the name of this module and must be entered as
shown.  The "demo" term is the
name of the virtual table you will be creating and can be altered
to suit the needs of your application.  The virtual table is initially
empty.  In order for the virtual table to be useful, you will need to
populate it with your vocabulary.  Suppose you
have a list of words in a table named "big_vocabulary".  Then do this:

<blockquote><pre>
INSERT INTO demo(word) SELECT word FROM big_vocabulary;
</pre></blockquote>

<p>If you intend to use this virtual table in cooperation with an FTS4
table (for spelling correctly of search terms) then you might extract
the vocabulary using an fts3aux table:

<blockquote><pre>
INSERT INTO demo(word) SELECT term FROM search_aux WHERE col='*';
</pre></blockquote>

<p>You can also provide the virtual table with a "rank" for each word.
The "rank" is an estimate of how common the word is.  Larger numbers












|
>
>
|
|
<
<
<
<

|
|





|
|










|

|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17




18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
<title>The spellfix1 virtual table</title>
<tcl>
hd_keywords {spellfix1}</tcl>
<h1 align='center'>The Spellfix1 Virtual Table</h1>

<p>This spellfix1 [virtual table] can be used to search
a large vocabulary for close matches.  For example, spellfix1
can be used to suggest corrections to misspelled words.  Or,
it could be used with [FTS4] to do full-text search using potentially
misspelled words.

<p>The implementation for the spellfix1 virtual table is held in the
SQLite source tree in the miscellaneous extensions folder and in
particular in the file 
[http://www.sqlite.org/src/finfo?name=ext/misc/spellfix.c|ext/misc/spellfix1.c].
The spellfix1 virtual table is not included in the SQLite [amalgamation]
and is not a part of any standard SQLite build.  It is a [loadable extension].





<p>Once the spellfix1 extension is loaded, an instance of the spellfix1 
virtual table is created like this:

<blockquote><pre>
CREATE VIRTUAL TABLE demo USING spellfix1;
</pre></blockquote>

<p>The "spellfix1" term is the name of the spellfix module and must be 
entered as shown.  The "demo" term is the
name of the virtual table you will be creating and can be altered
to suit the needs of your application.  The virtual table is initially
empty.  In order for the virtual table to be useful, you will need to
populate it with your vocabulary.  Suppose you
have a list of words in a table named "big_vocabulary".  Then do this:

<blockquote><pre>
INSERT INTO demo(word) SELECT word FROM big_vocabulary;
</pre></blockquote>

<p>If you intend to use this virtual table in cooperation with an [FTS4]
table (for spelling correctly of search terms) then you might extract
the vocabulary using an [fts4aux] table:

<blockquote><pre>
INSERT INTO demo(word) SELECT term FROM search_aux WHERE col='*';
</pre></blockquote>

<p>You can also provide the virtual table with a "rank" for each word.
The "rank" is an estimate of how common the word is.  Larger numbers
136
137
138
139
140
141
142

143
144
145
146
147
148
149
150
151
the search will be against language 0 (English in the example above.)
All spellfix1 searches are against a single language id.  There is no
way to search all languages at once.
 

<h2>Virtual Table Details</h2>


<p>The virtual table actually has a unique rowid with seven columns plus five
extra hidden columns.  The columns are as follows:

<blockquote><dl>
<dt><p><b>rowid</b><dd>
A unique integer number associated with each
vocabulary item in the table.  This can be used
as a foreign key on other tables in the database.








>
|
|







134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
the search will be against language 0 (English in the example above.)
All spellfix1 searches are against a single language id.  There is no
way to search all languages at once.
 

<h2>Virtual Table Details</h2>

<p>Each row in the spellfix1 virtual table has a unique rowid 
with seven columns plus five extra hidden columns.
The columns are as follows:

<blockquote><dl>
<dt><p><b>rowid</b><dd>
A unique integer number associated with each
vocabulary item in the table.  This can be used
as a foreign key on other tables in the database.

220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
SPELLINGS section below for details.

<dt><p><b>command</b><dd>
(HIDDEN)  The value of the "command" column is always NULL.  However,
applications can insert special strings into the "command" column in order
to provoke certain behaviors in the spellfix1 virtual table.
For example, inserting the string 'reset' into the "command" column
will cause the virtual table will reread its edit distance weights
(if there are any).
</dl></blockquote>

<h2>Algorithm</h2>

<p>The spellfix1 virtual table creates a single
shadow table named "%_vocab" (where the % is replaced by the name of







|







219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
SPELLINGS section below for details.

<dt><p><b>command</b><dd>
(HIDDEN)  The value of the "command" column is always NULL.  However,
applications can insert special strings into the "command" column in order
to provoke certain behaviors in the spellfix1 virtual table.
For example, inserting the string 'reset' into the "command" column
will cause the virtual table to reread its edit distance weights
(if there are any).
</dl></blockquote>

<h2>Algorithm</h2>

<p>The spellfix1 virtual table creates a single
shadow table named "%_vocab" (where the % is replaced by the name of
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
Levenshtein distance between a pattern and a word.  This function
is exposed as spellfix1_editdist(X,Y).  The edit distance function
returns the "cost" of converting X into Y.  Some transformations
cost more than others.  Changing one vowel into a different vowel,
for example is relatively cheap, as is doubling a constant, or
omitting the second character of a double-constant.  Other transformations
or more expensive.  The idea is that the edit distance function returns
a low cost of words that are similar and a higher cost for words
that are further apart.  In this implementation, the maximum cost
of any single-character edit (delete, insert, or substitute) is 100,
with lower costs for some edits (such as transforming vowels).

<p>The "score" for a comparison is the edit distance between the pattern
and the word, adjusted down by the base-2 logarithm of the word rank.
For example, a match with distance 100 but rank 1000 would have a







|







274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
Levenshtein distance between a pattern and a word.  This function
is exposed as spellfix1_editdist(X,Y).  The edit distance function
returns the "cost" of converting X into Y.  Some transformations
cost more than others.  Changing one vowel into a different vowel,
for example is relatively cheap, as is doubling a constant, or
omitting the second character of a double-constant.  Other transformations
or more expensive.  The idea is that the edit distance function returns
a low cost for words that are similar and a higher cost for words
that are further apart.  In this implementation, the maximum cost
of any single-character edit (delete, insert, or substitute) is 100,
with lower costs for some edits (such as transforming vowels).

<p>The "score" for a comparison is the edit distance between the pattern
and the word, adjusted down by the base-2 logarithm of the word rank.
For example, a match with distance 100 but rank 1000 would have a
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
<p>The source code module that implements the spellfix1 virtual table also
implements several SQL functions that might be useful to applications
that employ spellfix1 or for testing or diagnostic work while developing
applications that use spellfix1.  The following auxiliary functions are
available:

<blockquote><dl>
<dt><p><b>editdist3(P,W)<br>editdist2(P,W,L)<br>editdist3(T)</b><dd>
These routines provide direct access to the version of the Wagner
edit-distance function that allows for application-defined weights
on edit operations.  The first two forms of this function compare
pattern P against word W and return the edit distance.  In the first
function, the langid is assumed to be 0 and in the second, the
langid is given by the L parameter.  The third form of this function
reloads edit distance coefficients from the table named by T.







|







421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
<p>The source code module that implements the spellfix1 virtual table also
implements several SQL functions that might be useful to applications
that employ spellfix1 or for testing or diagnostic work while developing
applications that use spellfix1.  The following auxiliary functions are
available:

<blockquote><dl>
<dt><p><b>editdist3(P,W)<br>editdist3(P,W,L)<br>editdist3(T)</b><dd>
These routines provide direct access to the version of the Wagner
edit-distance function that allows for application-defined weights
on edit operations.  The first two forms of this function compare
pattern P against word W and return the edit distance.  In the first
function, the langid is assumed to be 0 and in the second, the
langid is given by the L parameter.  The third form of this function
reloads edit distance coefficients from the table named by T.