Documentation Source Text

Check-in [92cb32fb50]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the spellfix1 documentation regarding the k1 column of the %_vocab table.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 92cb32fb5007ee121f2704a5f2de327b6d47116d3224c81790d47562fbe789b3
User & Date: drh 2018-02-14 14:14:05.588
Context
2018-02-14
18:59
Avoid extra / characters in the redirect to /index.html from / in althttpd.c. (check-in: 4f48a846f6 user: drh tags: trunk)
14:14
Update the spellfix1 documentation regarding the k1 column of the %_vocab table. (check-in: 92cb32fb50 user: drh tags: trunk)
2018-02-13
22:16
Fix an issue with not-found processing in althttpd.c. (check-in: 72c8b8c6ff user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/spellfix1.in.
255
256
257
258
259
260
261






262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
characters into ASCII.  Examples: "æ" -> "ae",
"þ" -> "th", "ß" -> "ss", "á" -> "a", ...  The
accessory function spellfix1_translit(X) will do
the non-ASCII to ASCII mapping.  The built-in lower(X)
function will convert to lower-case.  Thus:
k1 = lower(spellfix1_translit(word)).







<dt><p><b>k2</b><dd>
This field holds a phonetic code derived from k1.  Letters
that have similar sounds are mapped into the same symbol.
For example, all vowels and vowel clusters become the
single symbol "A".  And the letters "p", "b", "f", and
"v" all become "B".  All nasal sounds are represented
as "N".  And so forth.  The mapping is base on
ideas found in Soundex, Metaphone, and other
long-standing phonetic matching systems.  This key can
be generated by the function spellfix1_phonehash(X).  
Hence: k2 = spellfix1_phonehash(k1)
</dl>

<p>There is also a function for computing the Wagner edit distance or the
Levenshtein distance between a pattern and a word.  This function
is exposed as spellfix1_editdist(X,Y).  The edit distance function
returns the "cost" of converting X into Y.  Some transformations
cost more than others.  Changing one vowel into a different vowel,







>
>
>
>
>
>

|
|



|



|







255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
characters into ASCII.  Examples: "æ" -> "ae",
"þ" -> "th", "ß" -> "ss", "á" -> "a", ...  The
accessory function spellfix1_translit(X) will do
the non-ASCII to ASCII mapping.  The built-in lower(X)
function will convert to lower-case.  Thus:
k1 = lower(spellfix1_translit(word)).

If the word is already all lower-case ASCII, then the k1 column
will contain a NULL.  This reduces the storage requirements for
the %_vocab table and helps spellfix to run a little faster.
Therefore, it is advantageous to populate as much of the spellfix
table as possible using lower-case ASCII vocabulary.

<dt><p><b>k2</b><dd>
This field holds a phonetic code derived from coalesce(k1,word).
Letters that have similar sounds are mapped into the same symbol.
For example, all vowels and vowel clusters become the
single symbol "A".  And the letters "p", "b", "f", and
"v" all become "B".  All nasal sounds are represented
as "N".  And so forth.  The mapping is based on
ideas found in Soundex, Metaphone, and other
long-standing phonetic matching systems.  This key can
be generated by the function spellfix1_phonehash(X).  
Hence: k2 = spellfix1_phonehash(coalesce(k1,word))
</dl>

<p>There is also a function for computing the Wagner edit distance or the
Levenshtein distance between a pattern and a word.  This function
is exposed as spellfix1_editdist(X,Y).  The edit distance function
returns the "cost" of converting X into Y.  Some transformations
cost more than others.  Changing one vowel into a different vowel,