Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.
|Comment:||Remove spellfix virtual table documentation from the source tree. Reference the separate documentation on the website instead.|
|Downloads:||Tarball | ZIP archive | SQL archive|
|Timelines:||family | ancestors | descendants | both | trunk|
|Files:||files | file ages | folders|
|User & Date:||drh 2013-04-27 18:06:40|
|07:01||Update 'fuzzerfault' test for its new module loading command. Fix several test names in 'io.test' and make sure the database gets closed between tests. check-in: e81e9ca1 user: mistachkin tags: trunk|
|01:44||Initial work on reforming usage of OSTRACE by the Windows VFS. check-in: b347b5d1 user: mistachkin tags: winOsTrace|
|18:06||Remove spellfix virtual table documentation from the source tree. Reference the separate documentation on the website instead. check-in: adcf7890 user: drh tags: trunk|
|12:13||Untested fix for building on VxWorks. check-in: f14d55cf user: drh tags: trunk|
1 -<title>The editdist3 algorithm</title> 2 - 3 -The editdist3 algorithm is a function that computes the minimum edit distance 4 -(a.k.a. the Levenshtein distance) between two input strings. Features of 5 -editdist3 include: 6 - 7 - * It works with unicode (UTF8) text. 8 - 9 - * A table of insertion, deletion, and substitution costs can be 10 - provided by the application. 11 - 12 - * Multi-character insertsions, deletions, and substitutions can be 13 - enumerated in the cost table. 14 - 15 -<h2>The COST table</h2> 16 - 17 -To program the costs of editdist3, create a table such as the following: 18 - 19 -<blockquote><pre> 20 -CREATE TABLE editcost( 21 - iLang INT, -- The language ID 22 - cFrom TEXT, -- Convert text from this 23 - cTo TEXT, -- Convert text into this 24 - iCost INT -- The cost of doing the conversionnn 25 -); 26 -</pre></blockquote> 27 - 28 -The cost table can be named anything you want - it does not have to be called 29 -"editcost". And the table can contain additional columns. However, it the 30 -table must contain the four columns show above, with exactly the names shown. 31 - 32 -The iLang column is a non-negative integer that identifies a set of costs 33 -appropriate for a particular language. The editdist3 function will only use 34 -a single iLang value for any given edit-distance computation. The default 35 -value is 0. It is recommended that applications that only need to use a 36 -single langauge always use iLang==0 for all entries. 37 - 38 -The iCost column is the numeric cost of transforming cFrom into cTo. This 39 -value should be a non-negative integer, and should probably be less than 100. 40 -The default single-character insertion and deletion costs are 100 and the 41 -default single-character to single-character substitution cost is 150. A 42 -cost of 10000 or more is considered "infinite" and causes the rule to be 43 -ignored. 44 - 45 -The cFrom and cTo columns show edit transformation strings. Either or both 46 -columns may contain more than one character. Or either column (but not both) 47 -may hold an empty string. When cFrom is empty, that is the cost of inserting 48 -cTo. When cTo is empty, that is the cost of deleting cFrom. 49 - 50 -In the spellfix1 algorithm, cFrom is the text as the user entered it and 51 -cTo is the correctly spelled text as it exists in the database. The goal 52 -of the editdist3 algorithm is to determine how close the user-entered text is 53 -to the dictionary text. 54 - 55 -There are three special-case entries in the cost table: 56 - 57 -<table border=1> 58 -<tr><th>cFrom</th><th>cTo</th><th>Meaning</th></tr> 59 -<tr><td>''</td><td>'?'</td><td>The default insertion cost</td></tr> 60 -<tr><td>'?'</td><td>''</td><td>The default deletion cost</td></tr> 61 -<tr><td>'?'</td><td>'?'</td><td>The default substitution cost</td></tr> 62 -</table> 63 - 64 -If any of the special-case entries shows above are omitted, then the 65 -value of 100 is used for insertion and deletion and 150 is used for 66 -substitution. To disable the default insertion, deletion, and/or substitution 67 -set their respective cost to 10000 or more. 68 - 69 -Other entries in the cost table specific transforms for particular characters. 70 -The cost of specific transforms should be less than the default costs, or else 71 -the default costs will take precedence and the specific transforms will never 72 -be used. 73 - 74 -Some example, cost table entries: 75 - 76 -<blockquote><pre> 77 -INSERT INTO editcost(iLang, cFrom, cTo, iCost) 78 -VALUES(0, 'a', 'ä', 5); 79 -</pre></blockquote> 80 - 81 -The rule above says that the letter "a" in user input can be matched against 82 -the letter "ä" in the dictionary with a penalty of 5. 83 - 84 -<blockquote><pre> 85 -INSERT INTO editcost(iLang, cFrom, cTo, iCost) 86 -VALUES(0, 'ss', 'ß', 8); 87 -</pre></blockquote> 88 - 89 -The number of characters in cFrom and cTo do not need to be the same. The 90 -rule above says that "ss" on user input will match "ß" with a penalty of 8. 91 - 92 -<h2>Experimenting with the editcost3() function</h2> 93 - 94 -The [./spellfix1.wiki | spellfix1 virtual table] 95 -uses editdist3 if the "edit_cost_table=TABLE" option 96 -is specified as an argument when the spellfix1 virtual table is created. 97 -But editdist3 can also be tested directly using the built-in "editdist3()" 98 -SQL function. The editdist3() SQL function has 3 forms: 99 - 100 - 1. editdist3('TABLENAME'); 101 - 2. editdist3('string1', 'string2'); 102 - 3. editdist3('string1', 'string2', langid); 103 - 104 -The first form loads the edit distance coefficients from a table called 105 -'TABLENAME'. Any prior coefficients are discarded. So when experimenting 106 -with weights and the weight table changes, simply rerun the single-argument 107 -form of editdist3() to reload revised coefficients. Note that the 108 -edit distance 109 -weights used by the editdist3() SQL function are independent from the 110 -weights used by the spellfix1 virtual table. 111 - 112 -The second and third forms return the computed edit distance between strings 113 -'string1' and "string2'. In the second form, an language id of 0 is used. 114 -The language id is specified in the third form.
Changes to ext/misc/spellfix.c.
8 8 ** May you find forgiveness for yourself and forgive others. 9 9 ** May you share freely, never taking more than you give. 10 10 ** 11 11 ************************************************************************* 12 12 ** 13 13 ** This module implements the spellfix1 VIRTUAL TABLE that can be used 14 14 ** to search a large vocabulary for close matches. See separate 15 -** documentation files (spellfix1.wiki and editdist3.wiki) for details. 15 +** documentation (http://www.sqlite.org/spellfix1.html) for details. 16 16 */ 17 17 #include "sqlite3ext.h" 18 18 SQLITE_EXTENSION_INIT1 19 19 20 20 #ifndef SQLITE_AMALGAMATION 21 21 # include <string.h> 22 22 # include <stdio.h>
1 -<title>The Spellfix1 Virtual Table</title> 2 - 3 -This spellfix1 virtual table is used to search 4 -a large vocabulary for close matches. For example, spellfix1 5 -can be used to suggest corrections to misspelled words. Or, 6 -it could be used with FTS4 to do full-text search using potentially 7 -misspelled words. 8 - 9 -Create an instance of the spellfix1 virtual table like this: 10 - 11 -<blockquote><pre> 12 -CREATE VIRTUAL TABLE demo USING spellfix1; 13 -</pre></blockquote> 14 - 15 -The "spellfix1" term is the name of this module and must be entered as 16 -shown. The "demo" term is the 17 -name of the virtual table you will be creating and can be altered 18 -to suit the needs of your application. The virtual table is initially 19 -empty. In order for the virtual table to be useful, you will need to 20 -populate it with your vocabulary. Suppose you 21 -have a list of words in a table named "big_vocabulary". Then do this: 22 - 23 -<blockquote><pre> 24 -INSERT INTO demo(word) SELECT word FROM big_vocabulary; 25 -</pre></blockquote> 26 - 27 -If you intend to use this virtual table in cooperation with an FTS4 28 -table (for spelling correctly of search terms) then you might extract 29 -the vocabulary using an fts3aux table: 30 - 31 -<blockquote><pre> 32 -INSERT INTO demo(word) SELECT term FROM search_aux WHERE col='*'; 33 -</pre></blockquote> 34 - 35 -You can also provide the virtual table with a "rank" for each word. 36 -The "rank" is an estimate of how common the word is. Larger numbers 37 -mean the word is more common. If you omit the rank when populating 38 -the table, then a rank of 1 is assumed. But if you have rank 39 -information, you can supply it and the virtual table will show a 40 -slight preference for selecting more commonly used terms. To 41 -populate the rank from an fts4aux table "search_aux" do something 42 -like this: 43 - 44 -<blockquote><pre> 45 -INSERT INTO demo(word,rank) 46 - SELECT term, documents FROM search_aux WHERE col='*'; 47 -</pre></blockquote> 48 - 49 -To query the virtual table, include a MATCH operator in the WHERE 50 -clause. For example: 51 - 52 -<blockquote><pre> 53 -SELECT word FROM demo WHERE word MATCH 'kennasaw'; 54 -</pre></blockquote> 55 - 56 -Using a dataset of American place names (derived from 57 -[http://geonames.usgs.gov/domestic/download_data.htm]) the query above 58 -returns 20 results beginning with: 59 - 60 -<blockquote><pre> 61 -kennesaw 62 -kenosha 63 -kenesaw 64 -kenaga 65 -keanak 66 -</pre></blockquote> 67 - 68 -If you append the character '*' to the end of the pattern, then 69 -a prefix search is performed. For example: 70 - 71 -<blockquote><pre> 72 -SELECT word FROM demo WHERE word MATCH 'kennes*'; 73 -</pre></blockquote> 74 - 75 -Yields 20 results beginning with: 76 - 77 -<blockquote><pre> 78 -kennesaw 79 -kennestone 80 -kenneson 81 -kenneys 82 -keanes 83 -keenes 84 -</pre></blockquote> 85 - 86 -<h2>Search Refinements</h2> 87 - 88 -By default, the spellfix1 table returns no more than 20 results. 89 -(It might return less than 20 if there were fewer good matches.) 90 -You can change the upper bound on the number of returned rows by 91 -adding a "top=N" term to the WHERE clause of your query, where N 92 -is the new maximum. For example, to see the 5 best matches: 93 - 94 -<blockquote><pre> 95 -SELECT word FROM demo WHERE word MATCH 'kennes*' AND top=5; 96 -</pre></blockquote> 97 - 98 -Each entry in the spellfix1 virtual table is associated with a 99 -a particular language, identified by the integer "langid" column. 100 -The default langid is 0 and if no other actions are taken, the 101 -entire vocabulary is a part of the 0 language. But if your application 102 -needs to operate in multiple languages, then you can specify different 103 -vocabulary items for each language by specifying the langid field 104 -when populating the table. For example: 105 - 106 -<blockquote><pre> 107 -INSERT INTO demo(word,langid) SELECT word, 0 FROM en_vocabulary; 108 -INSERT INTO demo(word,langid) SELECT word, 1 FROM de_vocabulary; 109 -INSERT INTO demo(word,langid) SELECT word, 2 FROM fr_vocabulary; 110 -INSERT INTO demo(word,langid) SELECT word, 3 FROM ru_vocabulary; 111 -INSERT INTO demo(word,langid) SELECT word, 4 FROM cn_vocabulary; 112 -</pre></blockquote> 113 - 114 -After the virtual table has been populated with items from multiple 115 -languages, specify the language of interest using a "langid=N" term 116 -in the WHERE clause of the query: 117 - 118 -<blockquote><pre> 119 -SELECT word FROM demo WHERE word MATCH 'hildes*' AND langid=1; 120 -</pre></blockquote> 121 - 122 -Note that if you do not include the "langid=N" term in the WHERE clause, 123 -the search will be against language 0 (English in the example above.) 124 -All spellfix1 searches are against a single language id. There is no 125 -way to search all languages at once. 126 - 127 - 128 -<h2>Virtual Table Details</h2> 129 - 130 -The virtual table actually has a unique rowid with seven columns plus five 131 -extra hidden columns. The columns are as follows: 132 - 133 -<blockquote><dl> 134 -<dt><b>rowid</b><dd> 135 -A unique integer number associated with each 136 -vocabulary item in the table. This can be used 137 -as a foreign key on other tables in the database. 138 - 139 -<dt><b>word</b><dd> 140 -The text of the word that matches the pattern. 141 -Both word and pattern can contains unicode characters 142 -and can be mixed case. 143 - 144 -<dt><b>rank</b><dd> 145 -This is the rank of the word, as specified in the 146 -original INSERT statement. 147 - 148 - 149 -<dt><b>distance</b><dd> 150 -This is an edit distance or Levensthein distance going 151 -from the pattern to the word. 152 - 153 -<dt><b>langid</b><dd> 154 -This is the language-id of the word. All queries are 155 -against a single language-id, which defaults to 0. 156 -For any given query this value is the same on all rows. 157 - 158 -<dt><b>score</b><dd> 159 -The score is a combination of rank and distance. The 160 -idea is that a lower score is better. The virtual table 161 -attempts to find words with the lowest score and 162 -by default (unless overridden by ORDER BY) returns 163 -results in order of increasing score. 164 - 165 -<dt><b>matchlen</b><dd> 166 -In a prefix search, the matchlen is the number of characters in 167 -the string that match against the prefix. For a non-prefix search, 168 -this is the same as length(word). 169 - 170 -<dt><b>phonehash</b><dd> 171 -This column shows the phonetic hash prefix that was used to restrict 172 -the search. For any given query, this column should be the same for 173 -every row. This information is available for diagnostic purposes and 174 -is not normally considered useful in real applications. 175 - 176 -<dt><b>top</b><dd> 177 -(HIDDEN) For any query, this value is the same on all 178 -rows. It is an integer which is the maximum number of 179 -rows that will be output. The actually number of rows 180 -output might be less than this number, but it will never 181 -be greater. The default value for top is 20, but that 182 -can be changed for each query by including a term of 183 -the form "top=N" in the WHERE clause of the query. 184 - 185 -<dt><b>scope</b><dd> 186 -(HIDDEN) For any query, this value is the same on all 187 -rows. The scope is a measure of how widely the virtual 188 -table looks for matching words. Smaller values of 189 -scope cause a broader search. The scope is normally 190 -choosen automatically and is capped at 4. Applications 191 -can change the scope by including a term of the form 192 -"scope=N" in the WHERE clause of the query. Increasing 193 -the scope will make the query run faster, but will reduce 194 -the possible corrections. 195 - 196 -<dt><b>srchcnt</b><dd> 197 -(HIDDEN) For any query, this value is the same on all 198 -rows. This value is an integer which is the number of 199 -of words examined using the edit-distance algorithm to 200 -find the top matches that are ultimately displayed. This 201 -value is for diagnostic use only. 202 - 203 -<dt><b>soundslike</b><dd> 204 -(HIDDEN) When inserting vocabulary entries, this field 205 -can be set to an spelling that matches what the word 206 -sounds like. See the DEALING WITH UNUSUAL AND DIFFICULT 207 -SPELLINGS section below for details. 208 - 209 -<dt><b>command</b><dd> 210 -(HIDDEN) The value of the "command" column is always NULL. However, 211 -applications can insert special strings into the "command" column in order 212 -to provoke certain behaviors in the spellfix1 virtual table. 213 -For example, inserting the string 'reset' into the "command" column 214 -will cause the virtual table will reread its edit distance weights 215 -(if there are any). 216 -</dl></blockquote> 217 - 218 -<h2>Algorithm</h2> 219 - 220 -The spellfix1 virtual table creates a single 221 -shadow table named "%_vocab" (where the % is replaced by the name of 222 -the virtual table; Ex: "demo_vocab" for the "demo" virtual table). 223 -the shadow table contains the following columns: 224 - 225 -<blockquote><dl> 226 -<dt><b>id</b><dd> 227 -The unique id (INTEGER PRIMARY KEY) 228 - 229 -<dt><b>rank</b><dd> 230 -The rank of word. 231 - 232 -<dt><b>langid</b><dd> 233 -The language id for this entry. 234 - 235 -<dt><b>word</b><dd> 236 -The original UTF8 text of the vocabulary word 237 - 238 -<dt><b>k1</b><dd> 239 -The word transliterated into lower-case ASCII. 240 -There is a standard table of mappings from non-ASCII 241 -characters into ASCII. Examples: "æ" -> "ae", 242 -"þ" -> "th", "ß" -> "ss", "á" -> "a", ... The 243 -accessory function spellfix1_translit(X) will do 244 -the non-ASCII to ASCII mapping. The built-in lower(X) 245 -function will convert to lower-case. Thus: 246 -k1 = lower(spellfix1_translit(word)). 247 - 248 -<dt><b>k2</b><dd> 249 -This field holds a phonetic code derived from k1. Letters 250 -that have similar sounds are mapped into the same symbol. 251 -For example, all vowels and vowel clusters become the 252 -single symbol "A". And the letters "p", "b", "f", and 253 -"v" all become "B". All nasal sounds are represented 254 -as "N". And so forth. The mapping is base on 255 -ideas found in Soundex, Metaphone, and other 256 -long-standing phonetic matching systems. This key can 257 -be generated by the function spellfix1_phonehash(X). 258 -Hence: k2 = spellfix1_phonehash(k1) 259 -</dl></blockquote> 260 - 261 -There is also a function for computing the Wagner edit distance or the 262 -Levenshtein distance between a pattern and a word. This function 263 -is exposed as spellfix1_editdist(X,Y). The edit distance function 264 -returns the "cost" of converting X into Y. Some transformations 265 -cost more than others. Changing one vowel into a different vowel, 266 -for example is relatively cheap, as is doubling a constant, or 267 -omitting the second character of a double-constant. Other transformations 268 -or more expensive. The idea is that the edit distance function returns 269 -a low cost of words that are similar and a higher cost for words 270 -that are futher apart. In this implementation, the maximum cost 271 -of any single-character edit (delete, insert, or substitute) is 100, 272 -with lower costs for some edits (such as transforming vowels). 273 - 274 -The "score" for a comparison is the edit distance between the pattern 275 -and the word, adjusted down by the base-2 logorithm of the word rank. 276 -For example, a match with distance 100 but rank 1000 would have a 277 -score of 122 (= 100 - log2(1000) + 32) where as a match with distance 278 -100 with a rank of 1 would have a score of 131 (100 - log2(1) + 32). 279 -(NB: The constant 32 is added to each score to keep it from going 280 -negative in case the edit distance is zero.) In this way, frequently 281 -used words get a slightly lower cost which tends to move them toward 282 -the top of the list of alternative spellings. 283 - 284 -A straightforward implementation of a spelling corrector would be 285 -to compare the search term against every word in the vocabulary 286 -and select the 20 with the lowest scores. However, there will 287 -typically be hundreds of thousands or millions of words in the 288 -vocabulary, and so this approach is not fast enough. 289 - 290 -Suppose the term that is being spell-corrected is X. To limit 291 -the search space, X is converted to a k2-like key using the 292 -equivalent of: 293 - 294 -<blockquote><pre> 295 - key = spellfix1_phonehash(lower(spellfix1_translit(X))) 296 -</pre></blockquote> 297 - 298 -This key is then limited to "scope" characters. The default scope 299 -value is 4, but an alternative scope can be specified using the 300 -"scope=N" term in the WHERE clause. After the key has been truncated, 301 -the edit distance is run against every term in the vocabulary that 302 -has a k2 value that begins with the abbreviated key. 303 - 304 -For example, suppose the input word is "Paskagula". The phonetic 305 -key is "BACACALA" which is then truncated to 4 characters "BACA". 306 -The edit distance is then run on the 4980 entries (out of 307 -272,597 entries total) of the vocabulary whose k2 values begin with 308 -BACA, yielding "Pascagoula" as the best match. 309 - 310 -Only terms of the vocabulary with a matching langid are searched. 311 -Hence, the same table can contain entries from multiple languages 312 -and only the requested language will be used. The default langid 313 -is 0. 314 - 315 -<h2>Configurable Edit Distance</h2> 316 - 317 -The built-in Wagner edit-distance function with fixed weights can be 318 -replaced by the [./editdist3.wiki | editdist3()] edit-distance function 319 -with application-defined weights and support for unicode, by specifying 320 -the "edit_cost_table=<i>TABLENAME</i>" parameter to the spellfix1 module 321 -when the virtual table is created. 322 -For example: 323 - 324 -<blockquote><pre> 325 -CREATE VIRTUAL TABLE demo2 USING spellfix1(edit_cost_table=APPCOST); 326 -</pre></blockquote> 327 - 328 -In the example above, the APPCOST table would be interrogated to find 329 -the edit distance coefficients. It is the presence of the "edit_cost_table=" 330 -parameter to the spellfix1 module name that causes editdist3() to be used 331 -in place of the built-in edit distance function. 332 - 333 -The edit distance coefficients are normally read from the APPCOST table 334 -once and there after stored in memory. Hence, run-time changes to the 335 -APPCOST table will not normally effect the edit distance results. 336 -However, inserting the special string 'reset' into the "command" column of the 337 -virtual table causes the edit distance coefficients to be reread the 338 -APPCOST table. Hence, applications should run a SQL statement similar 339 -to the following when changes to the APPCOST table occur: 340 - 341 -<blockquote> 342 -INSERT INTO demo2(command) VALUES('reset'); 343 -</blockquote> 344 - 345 -The tables used for edit distance costs can be changed using a command 346 -like the following: 347 - 348 -<blockquote> 349 -INSERT INTO demo2(command) VALUES('edit_cost_table=APPCOST2'); 350 -</blockquote> 351 - 352 -In the example above, any prior edit distance costs would be discarded and 353 -all future queries would use the costs found in the APPCOST2 table. If the 354 -name of the table specified by the "edit_cost_table" command is "NULL", then 355 -theh built-in Wagner edit-distance function will be used instead of the 356 -editdist3() function in all future queries. 357 - 358 -<h2>Dealing With Unusual And Difficult Spellings</h2> 359 - 360 -The algorithm above works quite well for most cases, but there are 361 -exceptions. These exceptions can be dealt with by making additional 362 -entries in the virtual table using the "soundslike" column. 363 - 364 -For example, many words of Greek origin begin with letters "ps" where 365 -the "p" is silent. Ex: psalm, pseudonym, psoriasis, psyche. In 366 -another example, many Scottish surnames can be spelled with an 367 -initial "Mac" or "Mc". Thus, "MacKay" and "McKay" are both pronounced 368 -the same. 369 - 370 -Accommodation can be made for words that are not spelled as they 371 -sound by making additional entries into the virtual table for the 372 -same word, but adding an alternative spelling in the "soundslike" 373 -column. For example, the canonical entry for "psalm" would be this: 374 - 375 -<blockquote><pre> 376 - INSERT INTO demo(word) VALUES('psalm'); 377 -</pre></blockquote> 378 - 379 -To enhance the ability to correct the spelling of "salm" into 380 -"psalm", make an addition entry like this: 381 - 382 -<blockquote><pre> 383 - INSERT INTO demo(word,soundslike) VALUES('psalm','salm'); 384 -</pre></blockquote> 385 - 386 -It is ok to make multiple entries for the same word as long as 387 -each entry has a different soundslike value. Note that if no 388 -soundslike value is specified, the soundslike defaults to the word 389 -itself. 390 - 391 -Listed below are some cases where it might make sense to add additional 392 -soundslike entries. The specific entries will depend on the application 393 -and the target language. 394 - 395 - * Silent "p" in words beginning with "ps": psalm, psyche 396 - 397 - * Silent "p" in words beginning with "pn": pneumonia, pneumatic 398 - 399 - * Silent "p" in words beginning with "pt": pterodactyl, ptolemaic 400 - 401 - * Silent "d" in words beginning with "dj": djinn, Djikarta 402 - 403 - * Silent "k" in words beginning with "kn": knight, Knuthson 404 - 405 - * Silent "g" in words beginning with "gn": gnarly, gnome, gnat 406 - 407 - * "Mac" versus "Mc" beginning Scottish surnames 408 - 409 - * "Tch" sounds in Slavic words: Tchaikovsky vs. Chaykovsky 410 - 411 - * The letter "j" pronounced like "h" in Spanish: LaJolla 412 - 413 - * Words beginning with "wr" versus "r": write vs. rite 414 - 415 - * Miscellanous problem words such as "debt", "tsetse", 416 - "Nguyen", "Van Nuyes". 417 - 418 -<h2>Auxiliary Functions</h2> 419 - 420 -The source code module that implements the spellfix1 virtual table also 421 -implements several SQL functions that might be useful to applications 422 -that employ spellfix1 or for testing or diagnostic work while developing 423 -applications that use spellfix1. The following auxiliary functions are 424 -available: 425 - 426 -<blockquote><dl> 427 -<dt><b>editdist3(P,W)<br>editdist2(P,W,L)<br>editdist3(T)</b><dd> 428 -These routines provide direct access to the version of the Wagner 429 -edit-distance function that allows for application-defined weights 430 -on edit operations. The first two forms of this function compare 431 -pattern P against word W and return the edit distance. In the first 432 -function, the langid is assumed to be 0 and in the second, the 433 -langid is given by the L parameter. The third form of this function 434 -reloads edit distance coefficience from the table named by T. 435 - 436 -<dt><b>spellfix1_editdist(P,W)</b><dd> 437 -This routine provides access to the built-in Wagner edit-distance 438 -function that uses default, fixed costs. The value returned is 439 -the edit distance needed to transform W into P. 440 - 441 -<dt><b>spellfix1_phonehash(X)</b><dd> 442 -This routine constructs a phonetic hash of the pure ascii input word X 443 -and returns that hash. This routine is used internally by spellfix1 in 444 -order to transform the K1 column of the shadow table into the K2 445 -column. 446 - 447 -<dt><b>spellfix1_scriptcode(X)</b><dd> 448 -Given an input string X, this routine attempts to determin the dominant 449 -script of that input and returns the ISO-15924 numeric code for that 450 -script. The current implementation understands the following scripts: 451 -<ul> 452 -<li> 215 - Latin 453 -<li> 220 - Cyrillic 454 -<li> 200 - Greek 455 -</ul> 456 -Additional language codes might be added in future releases. 457 - 458 -<dt><b>spellfix1_translit(X)</b><dd> 459 -This routine transliterates unicode text into pure ascii, returning 460 -the pure ascii representation of the input text X. This is the function 461 -that is used internally to transform vocabulary words into the K1 462 -column of the shadow table. 463 - 464 -</dl></blockquote>