/ Check-in [6954fef0]
Login
SQLite training in Houston TX on 2019-11-05 (details)
Part of the 2019 Tcl Conference

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the spellfix virtual table to the latest development code.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 6954fef006431d153de6e63e362b8d260ebeb1c6
User & Date: drh 2012-08-14 17:29:27
Context
2012-08-14
18:43
Add an assert() to the btree rebalancer in order to silence a clang/scan-build warning. check-in: 6730579c user: drh tags: trunk
17:29
Update the spellfix virtual table to the latest development code. check-in: 6954fef0 user: drh tags: trunk
01:45
Refer to the file mapping Win32 API functions only when absolutely necessary. check-in: 1de2237d user: mistachkin tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/test_spellfix.c.

     6      6   **
     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12     12   **
    13         -** This module implements a VIRTUAL TABLE that can be used to search
    14         -** a large vocabulary for close matches.  For example, this virtual
    15         -** table can be used to suggest corrections to misspelled words.  Or,
    16         -** it could be used with FTS4 to do full-text search using potentially
    17         -** misspelled words.
    18         -**
    19         -** Create an instance of the virtual table this way:
    20         -**
    21         -**    CREATE VIRTUAL TABLE demo USING spellfix1;
    22         -**
    23         -** The "spellfix1" term is the name of this module.  The "demo" is the
    24         -** name of the virtual table you will be creating.  The table is initially
    25         -** empty.  You have to populate it with your vocabulary.  Suppose you
    26         -** have a list of words in a table named "big_vocabulary".  Then do this:
    27         -**
    28         -**    INSERT INTO demo(word) SELECT word FROM big_vocabulary;
    29         -**
    30         -** If you intend to use this virtual table in cooperation with an FTS4
    31         -** table (for spelling correctly of search terms) then you can extract
    32         -** the vocabulary using an fts3aux table:
    33         -**
    34         -**    INSERT INTO demo(word) SELECT term FROM search_aux WHERE col='*';
    35         -**
    36         -** You can also provide the virtual table with a "rank" for each word.
    37         -** The "rank" is an estimate of how common the word is.  Larger numbers
    38         -** mean the word is more common.  If you omit the rank when populating
    39         -** the table, then a rank of 1 is assumed.  But if you have rank 
    40         -** information, you can supply it and the virtual table will show a
    41         -** slight preference for selecting more commonly used terms.  To
    42         -** populate the rank from an fts4aux table "search_aux" do something
    43         -** like this:
    44         -**
    45         -**    INSERT INTO demo(word,rank)
    46         -**        SELECT term, documents FROM search_aux WHERE col='*';
    47         -**
    48         -** To query the virtual table, include a MATCH operator in the WHERE
    49         -** clause.  For example:
    50         -**
    51         -**    SELECT word FROM demo WHERE word MATCH 'kennasaw';
    52         -**
    53         -** Using a dataset of American place names (derived from
    54         -** http://geonames.usgs.gov/domestic/download_data.htm) the query above
    55         -** returns 20 results beginning with:
    56         -**
    57         -**    kennesaw
    58         -**    kenosha
    59         -**    kenesaw
    60         -**    kenaga
    61         -**    keanak
    62         -**
    63         -** If you append the character '*' to the end of the pattern, then
    64         -** a prefix search is performed.  For example:
    65         -**
    66         -**    SELECT word FROM demo WHERE word MATCH 'kennes*';
    67         -**
    68         -** Yields 20 results beginning with:
    69         -**
    70         -**    kennesaw
    71         -**    kennestone
    72         -**    kenneson
    73         -**    kenneys
    74         -**    keanes
    75         -**    keenes
    76         -**
    77         -** The virtual table actually has a unique rowid with five columns plus three
    78         -** extra hidden columns.  The columns are as follows:
    79         -**
    80         -**    rowid         A unique integer number associated with each
    81         -**                  vocabulary item in the table.  This can be used
    82         -**                  as a foreign key on other tables in the database.
    83         -**
    84         -**    word          The text of the word that matches the pattern.
    85         -**                  Both word and pattern can contains unicode characters
    86         -**                  and can be mixed case.
    87         -**
    88         -**    rank          This is the rank of the word, as specified in the
    89         -**                  original INSERT statement.
    90         -**
    91         -**    distance      This is an edit distance or Levensthein distance going
    92         -**                  from the pattern to the word.
    93         -**
    94         -**    langid        This is the language-id of the word.  All queries are
    95         -**                  against a single language-id, which defaults to 0.
    96         -**                  For any given query this value is the same on all rows.
    97         -**
    98         -**    score         The score is a combination of rank and distance.  The
    99         -**                  idea is that a lower score is better.  The virtual table
   100         -**                  attempts to find words with the lowest score and 
   101         -**                  by default (unless overridden by ORDER BY) returns
   102         -**                  results in order of increasing score.
   103         -**
   104         -**    matchlen      For prefix queries, the number of characters in the prefix
   105         -**                  of the returned value (word) that matched the query term.
   106         -**                  For non-prefix queries, the number of characters in the 
   107         -**                  returned value.
   108         -**
   109         -**    top           (HIDDEN)  For any query, this value is the same on all
   110         -**                  rows.  It is an integer which is the maximum number of
   111         -**                  rows that will be output.  The actually number of rows
   112         -**                  output might be less than this number, but it will never
   113         -**                  be greater.  The default value for top is 20, but that
   114         -**                  can be changed for each query by including a term of
   115         -**                  the form "top=N" in the WHERE clause of the query.
   116         -**
   117         -**    scope         (HIDDEN)  For any query, this value is the same on all
   118         -**                  rows.  The scope is a measure of how widely the virtual
   119         -**                  table looks for matching words.  Smaller values of
   120         -**                  scope cause a broader search.  The scope is normally
   121         -**                  choosen automatically and is capped at 4.  Applications
   122         -**                  can change the scope by including a term of the form
   123         -**                  "scope=N" in the WHERE clause of the query.  Increasing
   124         -**                  the scope will make the query run faster, but will reduce
   125         -**                  the possible corrections.
   126         -**
   127         -**    srchcnt       (HIDDEN)  For any query, this value is the same on all
   128         -**                  rows.  This value is an integer which is the number of
   129         -**                  of words examined using the edit-distance algorithm to
   130         -**                  find the top matches that are ultimately displayed.  This
   131         -**                  value is for diagnostic use only.
   132         -**
   133         -**    soundslike    (HIDDEN)  When inserting vocabulary entries, this field
   134         -**                  can be set to an spelling that matches what the word
   135         -**                  sounds like.  See the DEALING WITH UNUSUAL AND DIFFICULT
   136         -**                  SPELLINGS section below for details.
   137         -**
   138         -** When inserting into or updating the virtual table, only the rowid, word,
   139         -** rank, and langid may be changes.  Any attempt to set or modify the values
   140         -** of distance, score, top, scope, or srchcnt is silently ignored.
   141         -**
   142         -** ALGORITHM
   143         -**
   144         -** A shadow table named "%_vocab" (where the % is replaced by the name of
   145         -** the virtual table; Ex: "demo_vocab" for the "demo" virtual table) is
   146         -** constructed with these columns:
   147         -**
   148         -**    id            The unique id (INTEGER PRIMARY KEY)
   149         -**
   150         -**    rank          The rank of word.
   151         -**
   152         -**    langid        The language id for this entry.
   153         -**
   154         -**    word          The original UTF8 text of the vocabulary word
   155         -**
   156         -**    k1            The word transliterated into lower-case ASCII.  
   157         -**                  There is a standard table of mappings from non-ASCII
   158         -**                  characters into ASCII.  Examples: "æ" -> "ae",
   159         -**                  "þ" -> "th", "ß" -> "ss", "á" -> "a", ...  The
   160         -**                  accessory function spellfix1_translit(X) will do
   161         -**                  the non-ASCII to ASCII mapping.  The built-in lower(X)
   162         -**                  function will convert to lower-case.  Thus:
   163         -**                  k1 = lower(spellfix1_translit(word)).
   164         -**
   165         -**    k2            This field holds a phonetic code derived from k1.  Letters
   166         -**                  that have similar sounds are mapped into the same symbol.
   167         -**                  For example, all vowels and vowel clusters become the
   168         -**                  single symbol "A".  And the letters "p", "b", "f", and
   169         -**                  "v" all become "B".  All nasal sounds are represented
   170         -**                  as "N".  And so forth.  The mapping is base on
   171         -**                  ideas found in Soundex, Metaphone, and other
   172         -**                  long-standing phonetic matching systems.  This key can
   173         -**                  be generated by the function spellfix1_phonehash(X).  
   174         -**                  Hence: k2 = spellfix1_phonehash(k1)
   175         -**
   176         -** There is also a function for computing the Wagner edit distance or the
   177         -** Levenshtein distance between a pattern and a word.  This function
   178         -** is exposed as spellfix1_editdist(X,Y).  The edit distance function
   179         -** returns the "cost" of converting X into Y.  Some transformations
   180         -** cost more than others.  Changing one vowel into a different vowel,
   181         -** for example is relatively cheap, as is doubling a constant, or
   182         -** omitting the second character of a double-constant.  Other transformations
   183         -** or more expensive.  The idea is that the edit distance function returns
   184         -** a low cost of words that are similar and a higher cost for words
   185         -** that are futher apart.  In this implementation, the maximum cost
   186         -** of any single-character edit (delete, insert, or substitute) is 100,
   187         -** with lower costs for some edits (such as transforming vowels).
   188         -**
   189         -** The "score" for a comparison is the edit distance between the pattern
   190         -** and the word, adjusted down by the base-2 logorithm of the word rank.
   191         -** For example, a match with distance 100 but rank 1000 would have a
   192         -** score of 122 (= 100 - log2(1000) + 32) where as a match with distance
   193         -** 100 with a rank of 1 would have a score of 131 (100 - log2(1) + 32).
   194         -** (NB:  The constant 32 is added to each score to keep it from going
   195         -** negative in case the edit distance is zero.)  In this way, frequently
   196         -** used words get a slightly lower cost which tends to move them toward
   197         -** the top of the list of alternative spellings.
   198         -**
   199         -** A straightforward implementation of a spelling corrector would be
   200         -** to compare the search term against every word in the vocabulary
   201         -** and select the 20 with the lowest scores.  However, there will 
   202         -** typically be hundreds of thousands or millions of words in the
   203         -** vocabulary, and so this approach is not fast enough.
   204         -**
   205         -** Suppose the term that is being spell-corrected is X.  To limit
   206         -** the search space, X is converted to a k2-like key using the
   207         -** equivalent of:
   208         -**
   209         -**    key = spellfix1_phonehash(lower(spellfix1_translit(X)))
   210         -**
   211         -** This key is then limited to "scope" characters.  The default scope
   212         -** value is 4, but an alternative scope can be specified using the
   213         -** "scope=N" term in the WHERE clause.  After the key has been truncated,
   214         -** the edit distance is run against every term in the vocabulary that
   215         -** has a k2 value that begins with the abbreviated key.
   216         -**
   217         -** For example, suppose the input word is "Paskagula".  The phonetic 
   218         -** key is "BACACALA" which is then truncated to 4 characters "BACA".
   219         -** The edit distance is then run on the 4980 entries (out of
   220         -** 272,597 entries total) of the vocabulary whose k2 values begin with
   221         -** BACA, yielding "Pascagoula" as the best match.
   222         -** 
   223         -** Only terms of the vocabulary with a matching langid are searched.
   224         -** Hence, the same table can contain entries from multiple languages
   225         -** and only the requested language will be used.  The default langid
   226         -** is 0.
   227         -**
   228         -** DEALING WITH UNUSUAL AND DIFFICULT SPELLINGS
   229         -**
   230         -** The algorithm above works quite well for most cases, but there are
   231         -** exceptions.  These exceptions can be dealt with by making additional
   232         -** entries in the virtual table using the "soundslike" column.
   233         -**
   234         -** For example, many words of Greek origin begin with letters "ps" where
   235         -** the "p" is silent.  Ex:  psalm, pseudonym, psoriasis, psyche.  In
   236         -** another example, many Scottish surnames can be spelled with an
   237         -** initial "Mac" or "Mc".  Thus, "MacKay" and "McKay" are both pronounced
   238         -** the same.
   239         -**
   240         -** Accommodation can be made for words that are not spelled as they
   241         -** sound by making additional entries into the virtual table for the
   242         -** same word, but adding an alternative spelling in the "soundslike"
   243         -** column.  For example, the canonical entry for "psalm" would be this:
   244         -**
   245         -**   INSERT INTO demo(word) VALUES('psalm');
   246         -**
   247         -** To enhance the ability to correct the spelling of "salm" into
   248         -** "psalm", make an addition entry like this:
   249         -**
   250         -**   INSERT INTO demo(word,soundslike) VALUES('psalm','salm');
   251         -**
   252         -** It is ok to make multiple entries for the same word as long as
   253         -** each entry has a different soundslike value.  Note that if no
   254         -** soundslike value is specified, the soundslike defaults to the word
   255         -** itself.
   256         -**
   257         -** Listed below are some cases where it might make sense to add additional
   258         -** soundslike entries.  The specific entries will depend on the application
   259         -** and the target language.
   260         -**
   261         -**   *   Silent "p" in words beginning with "ps":  psalm, psyche
   262         -**
   263         -**   *   Silent "p" in words beginning with "pn":  pneumonia, pneumatic
   264         -**
   265         -**   *   Silent "p" in words beginning with "pt":  pterodactyl, ptolemaic
   266         -**
   267         -**   *   Silent "d" in words beginning with "dj":  djinn, Djikarta
   268         -**
   269         -**   *   Silent "k" in words beginning with "kn":  knight, Knuthson
   270         -**
   271         -**   *   Silent "g" in words beginning with "gn":  gnarly, gnome, gnat
   272         -**
   273         -**   *   "Mac" versus "Mc" beginning Scottish surnames
   274         -**
   275         -**   *   "Tch" sounds in Slavic words:  Tchaikovsky vs. Chaykovsky
   276         -**
   277         -**   *   The letter "j" pronounced like "h" in Spanish:  LaJolla
   278         -**
   279         -**   *   Words beginning with "wr" versus "r":  write vs. rite
   280         -**
   281         -**   *   Miscellanous problem words such as "debt", "tsetse",
   282         -**       "Nguyen", "Van Nuyes".
           13  +** This module implements the spellfix1 VIRTUAL TABLE that can be used
           14  +** to search a large vocabulary for close matches.  See separate
           15  +** documentation files (spellfix1.wiki and editdist3.wiki) for details.
   283     16   */
   284     17   #if SQLITE_CORE
   285     18   # include "sqliteInt.h"
   286     19   #else
   287     20   # include <string.h>
   288     21   # include <stdio.h>
   289     22   # include <stdlib.h>
................................................................................
   302     35   **   4   'D'       Alveolar stops:  D T
   303     36   **   5   'H'       Letter H at the beginning of a word
   304     37   **   6   'L'       Glide:  L
   305     38   **   7   'R'       Semivowel:  R
   306     39   **   8   'M'       Nasals:  M N
   307     40   **   9   'W'       Letter W at the beginning of a word
   308     41   **   10  'Y'       Letter Y at the beginning of a word.
   309         -**   11  '9'       A digit: 0 1 2 3 4 5 6 7 8 9
           42  +**   11  '9'       Digits: 0 1 2 3 4 5 6 7 8 9
   310     43   **   12  ' '       White space
   311     44   **   13  '?'       Other.
   312     45   */
   313     46   #define CCLASS_SILENT         0
   314     47   #define CCLASS_VOWEL          1
   315     48   #define CCLASS_B              2
   316     49   #define CCLASS_C              3
................................................................................
   462    195         case 'g': 
   463    196         case 'k': {
   464    197           if( zIn[1]=='n' ){ zIn++; nIn--; }
   465    198           break;
   466    199         }
   467    200       }
   468    201     }
   469         -  if( zIn[0]=='k' && zIn[1]=='n' ){ zIn++, nIn--; }
   470    202     for(i=0; i<nIn; i++){
   471    203       unsigned char c = zIn[i];
   472    204       if( i+1<nIn ){
   473    205         if( c=='w' && zIn[i+1]=='r' ) continue;
   474    206         if( c=='d' && (zIn[i+1]=='j' || zIn[i+1]=='g') ) continue;
   475    207         if( i+2<nIn ){
   476    208           if( c=='t' && zIn[i+1]=='c' && zIn[i+2]=='h' ) continue;
................................................................................
   582    314       /* differ only in case */
   583    315       return 0;
   584    316     }
   585    317     classFrom = characterClass(cPrev, cFrom);
   586    318     classTo = characterClass(cPrev, cTo);
   587    319     if( classFrom==classTo ){
   588    320       /* Same character class */
   589         -    return classFrom=='A' ? 25 : 40;
          321  +    return 40;
   590    322     }
   591    323     if( classFrom>=CCLASS_B && classFrom<=CCLASS_Y
   592    324         && classTo>=CCLASS_B && classTo<=CCLASS_Y ){
   593    325       /* Convert from one consonant to another, but in a different class */
   594    326       return 75;
   595    327     }
   596    328     /* Any other subsitution */
................................................................................
   612    344   **
   613    345   ** If pnMatch is not NULL, then *pnMatch is set to the number of bytes
   614    346   ** of zB that matched the pattern in zA. If zA does not end with a '*',
   615    347   ** then this value is always the number of bytes in zB (i.e. strlen(zB)).
   616    348   ** If zA does end in a '*', then it is the number of bytes in the prefix
   617    349   ** of zB that was deemed to match zA.
   618    350   */
   619         -static int editdist1(const char *zA, const char *zB, int iLangId, int *pnMatch){
          351  +static int editdist1(const char *zA, const char *zB, int *pnMatch){
   620    352     int nA, nB;            /* Number of characters in zA[] and zB[] */
   621    353     int xA, xB;            /* Loop counters for zA[] and zB[] */
   622    354     char cA, cB;           /* Current character of zA and zB */
   623    355     char cAprev, cBprev;   /* Previous character of zA and zB */
   624    356     char cAnext, cBnext;   /* Next character in zA and zB */
   625    357     int d;                 /* North-west cost value */
   626    358     int dc = 0;            /* North-west character value */
................................................................................
   641    373   
   642    374   #if 0
   643    375     printf("A=\"%s\" B=\"%s\" dc=%c\n", zA, zB, dc?dc:' ');
   644    376   #endif
   645    377   
   646    378     /* Verify input strings and measure their lengths */
   647    379     for(nA=0; zA[nA]; nA++){
   648         -    if( zA[nA]>127 ) return -2;
          380  +    if( zA[nA]&0x80 ) return -2;
   649    381     }
   650    382     for(nB=0; zB[nB]; nB++){
   651         -    if( zB[nB]>127 ) return -2;
          383  +    if( zB[nB]&0x80 ) return -2;
   652    384     }
   653    385   
   654    386     /* Special processing if either string is empty */
   655    387     if( nA==0 ){
   656    388       cBprev = dc;
   657    389       for(xB=res=0; (cB = zB[xB])!=0; xB++){
   658    390         res += insertOrDeleteCost(cBprev, cB, zB[xB+1])/FINAL_INS_COST_DIV;
................................................................................
   752    484         if( m[xB]<res ){
   753    485           res = m[xB];
   754    486           if( pnMatch ) *pnMatch = xB+nMatch;
   755    487         }
   756    488       }
   757    489     }else{
   758    490       res = m[nB];
   759         -    if( pnMatch ) *pnMatch = -1;
          491  +    /* In the current implementation, pnMatch is always NULL if zA does
          492  +    ** not end in "*" */
          493  +    assert( pnMatch==0 );
   760    494     }
   761    495     sqlite3_free(toFree);
   762    496     return res;
   763    497   }
   764    498   
   765    499   /*
   766    500   ** Function:    editdist(A,B)
   767         -**              editdist(A,B,langid)
   768    501   **
   769    502   ** Return the cost of transforming string A into string B.  Both strings
   770    503   ** must be pure ASCII text.  If A ends with '*' then it is assumed to be
   771    504   ** a prefix of B and extra characters on the end of B have minimal additional
   772    505   ** cost.
   773    506   */
   774    507   static void editdistSqlFunc(
   775    508     sqlite3_context *context,
   776    509     int argc,
   777    510     sqlite3_value **argv
   778    511   ){
   779         -  int langid = argc==2 ? 0 : sqlite3_value_int(argv[2]);
   780    512     int res = editdist1(
   781    513                       (const char*)sqlite3_value_text(argv[0]),
   782    514                       (const char*)sqlite3_value_text(argv[1]),
   783         -                    langid, 0);
          515  +                    0);
   784    516     if( res<0 ){
   785    517       if( res==(-3) ){
   786    518         sqlite3_result_error_nomem(context);
   787    519       }else if( res==(-2) ){
   788    520         sqlite3_result_error(context, "non-ASCII input to editdist()", -1);
   789    521       }else{
   790    522         sqlite3_result_error(context, "NULL input to editdist()", -1);
................................................................................
   920    652   */
   921    653   static int editDist3ConfigLoad(
   922    654     EditDist3Config *p,      /* The edit distance configuration to load */
   923    655     sqlite3 *db,            /* Load from this database */
   924    656     const char *zTable      /* Name of the table from which to load */
   925    657   ){
   926    658     sqlite3_stmt *pStmt;
   927         -  int rc;
          659  +  int rc, rc2;
   928    660     char *zSql;
   929    661     int iLangPrev = -9999;
   930    662     EditDist3Lang *pLang;
   931    663   
   932    664     zSql = sqlite3_mprintf("SELECT iLang, cFrom, cTo, iCost"
   933    665                            " FROM \"%w\" WHERE iLang>=0 ORDER BY iLang", zTable);
   934    666     if( zSql==0 ) return SQLITE_NOMEM;
................................................................................
   935    667     rc = sqlite3_prepare(db, zSql, -1, &pStmt, 0);
   936    668     sqlite3_free(zSql);
   937    669     if( rc ) return rc;
   938    670     editDist3ConfigClear(p);
   939    671     while( sqlite3_step(pStmt)==SQLITE_ROW ){
   940    672       int iLang = sqlite3_column_int(pStmt, 0);
   941    673       const char *zFrom = (const char*)sqlite3_column_text(pStmt, 1);
   942         -    int nFrom = sqlite3_column_bytes(pStmt, 1);
          674  +    int nFrom = zFrom ? sqlite3_column_bytes(pStmt, 1) : 0;
   943    675       const char *zTo = (const char*)sqlite3_column_text(pStmt, 2);
   944         -    int nTo = sqlite3_column_bytes(pStmt, 2);
          676  +    int nTo = zTo ? sqlite3_column_bytes(pStmt, 2) : 0;
   945    677       int iCost = sqlite3_column_int(pStmt, 3);
   946    678   
   947         -    if( nFrom>100 || nFrom<0 || nTo>100 || nTo<0 ) continue;
          679  +    assert( zFrom!=0 || nFrom==0 );
          680  +    assert( zTo!=0 || nTo==0 );
          681  +    if( nFrom>100 || nTo>100 ) continue;
   948    682       if( iCost<0 ) continue;
   949    683       if( iLang!=iLangPrev ){
   950    684         EditDist3Lang *pNew;
   951         -      p->nLang++;
   952         -      pNew = sqlite3_realloc(p->a, p->nLang*sizeof(p->a[0]));
          685  +      pNew = sqlite3_realloc(p->a, (p->nLang+1)*sizeof(p->a[0]));
   953    686         if( pNew==0 ){ rc = SQLITE_NOMEM; break; }
   954    687         p->a = pNew;
   955         -      pLang = &p->a[p->nLang-1];
          688  +      pLang = &p->a[p->nLang];
          689  +      p->nLang++;
   956    690         pLang->iLang = iLang;
   957    691         pLang->iInsCost = 100;
   958    692         pLang->iDelCost = 100;
   959         -      pLang->iSubCost = 200;
          693  +      pLang->iSubCost = 150;
   960    694         pLang->pCost = 0;
   961    695         iLangPrev = iLang;
   962    696       }
   963    697       if( nFrom==1 && zFrom[0]=='?' && nTo==0 ){
   964    698         pLang->iDelCost = iCost;
   965    699       }else if( nFrom==0 && nTo==1 && zTo[0]=='?' ){
   966    700         pLang->iInsCost = iCost;
................................................................................
   977    711         pCost->iCost = iCost;
   978    712         memcpy(pCost->a, zFrom, nFrom);
   979    713         memcpy(pCost->a + nFrom, zTo, nTo);
   980    714         pCost->pNext = pLang->pCost;
   981    715         pLang->pCost = pCost; 
   982    716       }
   983    717     }
   984         -  sqlite3_finalize(pStmt);
          718  +  rc2 = sqlite3_finalize(pStmt);
          719  +  if( rc==SQLITE_OK ) rc = rc2;
   985    720     return rc;
   986    721   }
   987    722   
   988    723   /*
   989    724   ** Return the length (in bytes) of a utf-8 character.  Or return a maximum
   990    725   ** of N.
   991    726   */
................................................................................
  1015    750   }
  1016    751   
  1017    752   /*
  1018    753   ** Return TRUE (non-zero) of the To side of the given cost matches
  1019    754   ** the given string.
  1020    755   */
  1021    756   static int matchFrom(EditDist3Cost *p, const char *z, int n){
  1022         -  if( p->nFrom>n ) return 0;
          757  +  assert( p->nFrom<=n );
  1023    758     if( memcmp(p->a, z, p->nFrom)!=0 ) return 0;
  1024    759     return 1;
  1025    760   }
  1026    761   
  1027    762   /*
  1028    763   ** Return TRUE (non-zero) of the next FROM character and the next TO
  1029    764   ** character are the same.
................................................................................
  1062    797     const char *z,
  1063    798     int n
  1064    799   ){
  1065    800     EditDist3FromString *pStr;
  1066    801     EditDist3Cost *p;
  1067    802     int i;
  1068    803   
          804  +  if( z==0 ) return 0;
  1069    805     if( n<0 ) n = (int)strlen(z);
  1070    806     pStr = sqlite3_malloc( sizeof(*pStr) + sizeof(pStr->a[0])*n + n + 1 );
  1071    807     if( pStr==0 ) return 0;
  1072    808     pStr->a = (EditDist3From*)&pStr[1];
          809  +  memset(pStr->a, 0, sizeof(pStr->a[0])*n);
  1073    810     pStr->n = n;
  1074    811     pStr->z = (char*)&pStr->a[n];
  1075    812     memcpy(pStr->z, z, n+1);
  1076    813     if( n && z[n-1]=='*' ){
  1077    814       pStr->isPrefix = 1;
  1078    815       n--;
  1079    816       pStr->n--;
................................................................................
  1109    846         pStr = 0;
  1110    847         break;
  1111    848       }
  1112    849     }
  1113    850     return pStr;
  1114    851   }
  1115    852   
  1116         -#if 0 /* No longer used */
  1117         -/*
  1118         -** Return the number of bytes in the common prefix of two UTF8 strings.
  1119         -** Only complete characters are considered.
  1120         -*/
  1121         -static int editDist3PrefixLen(const char *z1, const char *z2){
  1122         -  int n = 0;
  1123         -  while( z1[n] && z1[n]==z2[n] ){ n++; }
  1124         -  while( n && (z1[n]&0xc0)==0x80 ){ n--; }
  1125         -  return n;
  1126         -}
  1127         -
  1128         -/*
  1129         -** Return the number of bytes in the common suffix of two UTF8 strings.
  1130         -** Only complete characters are considered.
  1131         -*/
  1132         -static int editDist3SuffixLen(const char *z1, int n1, const char *z2, int n2){
  1133         -  int origN1 = n1;
  1134         -  while( n1>0 && n2>0 && z1[n1-1]==z2[n2-1] ){ n1--; n2--; }
  1135         -  while( n1<origN1 && (z1[n1]&0xc0)==0x80 ){ n1++; n2++; }
  1136         -  return origN1 - n1;
  1137         -}
  1138         -#endif /* 0 */
  1139         -
  1140    853   /*
  1141    854   ** Update entry m[i] such that it is the minimum of its current value
  1142    855   ** and m[j]+iCost.
  1143    856   **
  1144    857   ** If the iCost is 1,000,000 or greater, then consider the cost to be
  1145    858   ** infinite and skip the update.
  1146    859   */
................................................................................
  1181    894     EditDist3FromString f = *pFrom;
  1182    895     EditDist3To *a2;
  1183    896     unsigned int *m;
  1184    897     int szRow;
  1185    898     EditDist3Cost *p;
  1186    899     int res;
  1187    900   
  1188         -#if 0
  1189         -  /* Remove comment prefix and suffix */
  1190         -  n = editDist3PrefixLen(f.z, z2);
  1191         -  if( f.n==n2 && n2==n ) return 0;  /* Identical strings */
  1192         -  f.n -= n;
  1193         -  f.z += n;
  1194         -  f.a += n;
  1195         -  n2 -= n;
  1196         -  z2 += n;
  1197         -  if( f.isPrefix==0 ){
  1198         -    n = editDist3SuffixLen(f.z, f.n, z2, n2);
  1199         -    f.n -= n;
  1200         -    n2 -= n;
  1201         -  }
  1202         -#endif
  1203         -
  1204    901     /* allocate the Wagner matrix and the aTo[] array for the TO string */
  1205    902     n = (f.n+1)*(n2+1);
  1206    903     n = (n+1)&~1;
  1207    904     m = sqlite3_malloc( n*sizeof(m[0]) + sizeof(a2[0])*n2 );
  1208    905     if( m==0 ) return -1;            /* Out of memory */
  1209    906     a2 = (EditDist3To*)&m[n];
  1210    907     memset(a2, 0, sizeof(a2[0])*n2);
................................................................................
  1280    977           if( matchTo(p, z2+i2, n2-i2) ){
  1281    978             updateCost(m, cxd+p->nFrom+szRow*p->nTo, cxd, p->iCost);
  1282    979           }
  1283    980         }
  1284    981       }
  1285    982     }
  1286    983   
  1287         -#if 0
          984  +#if 0  /* Enable for debugging */
  1288    985     printf("         ^");
  1289    986     for(i1=0; i1<f.n; i1++) printf(" %c-%2x", f.z[i1], f.z[i1]&0xff);
  1290    987     printf("\n   ^:");
  1291    988     for(i1=0; i1<szRow; i1++){
  1292    989       int v = m[i1];
  1293    990       if( v>9999 ) printf(" ****");
  1294    991       else         printf(" %4d", v);
................................................................................
  1379   1076       pFrom = editDist3FromStringNew(pLang, zA, nA);
  1380   1077       if( pFrom==0 ){
  1381   1078         sqlite3_result_error_nomem(context);
  1382   1079         return;
  1383   1080       }
  1384   1081       dist = editDist3Core(pFrom, zB, nB, pLang, 0);
  1385   1082       editDist3FromStringDelete(pFrom);
  1386         -    sqlite3_result_int(context, dist);
         1083  +    if( dist==(-1) ){
         1084  +      sqlite3_result_error_nomem(context);
         1085  +    }else{
         1086  +      sqlite3_result_int(context, dist);
         1087  +    }
  1387   1088     } 
  1388   1089   }
  1389   1090   
  1390   1091   /*
  1391   1092   ** Register the editDist3 function with SQLite
  1392   1093   */
  1393   1094   static int editDist3Install(sqlite3 *db){
................................................................................
  1435   1136   
  1436   1137   /*
  1437   1138   ** Return the value of the first UTF-8 character in the string.
  1438   1139   */
  1439   1140   static int utf8Read(const unsigned char *z, int n, int *pSize){
  1440   1141     int c, i;
  1441   1142   
  1442         -  if( n==0 ){
         1143  +  /* All callers to this routine (in the current implementation)
         1144  +  ** always have n>0. */
         1145  +  if( NEVER(n==0) ){
  1443   1146       c = i = 0;
  1444   1147     }else{
  1445   1148       c = z[0];
  1446   1149       i = 1;
  1447   1150       if( c>=0xc0 ){
  1448   1151         c = sqlite3Utf8Trans1[c-0xc0];
  1449   1152         while( i<n && (z[i] & 0xc0)==0x80 ){
................................................................................
  1876   1579   ** The returned string might contain more characters than the input.
  1877   1580   **
  1878   1581   ** Space to hold the returned string comes from sqlite3_malloc() and
  1879   1582   ** should be freed by the caller.
  1880   1583   */
  1881   1584   static unsigned char *transliterate(const unsigned char *zIn, int nIn){
  1882   1585     unsigned char *zOut = sqlite3_malloc( nIn*4 + 1 );
  1883         -  int i, c, sz, nOut;
         1586  +  int c, sz, nOut;
  1884   1587     if( zOut==0 ) return 0;
  1885         -  i = nOut = 0;
  1886         -  while( i<nIn ){
         1588  +  nOut = 0;
         1589  +  while( nIn>0 ){
  1887   1590       c = utf8Read(zIn, nIn, &sz);
  1888   1591       zIn += sz;
  1889   1592       nIn -= sz;
  1890   1593       if( c<=127 ){
  1891   1594         zOut[nOut++] = c;
  1892   1595       }else{
  1893   1596         int xTop, xBtm, x;
................................................................................
  2031   1734     }
  2032   1735     sqlite3_result_int(context, res);
  2033   1736   }
  2034   1737   
  2035   1738   /* End transliterate
  2036   1739   ******************************************************************************
  2037   1740   ******************************************************************************
  2038         -** Begin Polloc & Zamora SPEEDCOP style keying functions.
  2039         -*/
  2040         -/*
  2041         -** The Pollock & Zamora skeleton function.  Move all consonants to the
  2042         -** front and all vowels to the end, removing duplicates.  Except if the
  2043         -** first letter is a vowel then it remains as the first letter.
  2044         -*/
  2045         -static void pollockSkeletonKey(const char *zIn, char *zOut){
  2046         -  int i, j;
  2047         -  unsigned char c;
  2048         -  char seen[26];
  2049         -  static const unsigned char isVowel[] = { 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
  2050         -    0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 };
  2051         -  memset(seen, 0, sizeof(seen));
  2052         -  for(i=j=0; (c = (unsigned char)zIn[i])!=0; i++){
  2053         -    if( c<'a' || c>'z' ) continue;
  2054         -    if( j>0 || isVowel[c-'a'] ) continue;
  2055         -    if( seen[c-'a'] ) continue;
  2056         -    seen[c-'a'] = 1;
  2057         -    zOut[j++] = c;
  2058         -  }
  2059         -  for(i=0; (c = (unsigned char)zIn[i])!=0; i++){
  2060         -    if( c<'a' || c>'z' ) continue;
  2061         -    if( seen[c-'a'] ) continue;
  2062         -    if( !isVowel[c-'a'] ) continue;
  2063         -    seen[c-'a'] = 1;
  2064         -    zOut[j++] = c;
  2065         -  }
  2066         -  zOut[j] = 0;
  2067         -}
  2068         -
  2069         -/*
  2070         -** Function:    pollock_skeleton(X)
  2071         -**
  2072         -** Return the Pollock and Zamora skeleton key for a string X of all
  2073         -** lower-case letters.
  2074         -*/
  2075         -static void pollockSkeletonSqlFunc(
  2076         -  sqlite3_context *context,
  2077         -  int argc,
  2078         -  sqlite3_value **argv
  2079         -){
  2080         -  const char *zIn = (const char*)sqlite3_value_text(argv[0]);
  2081         -  int nIn = sqlite3_value_bytes(argv[0]);
  2082         -  char *zOut;
  2083         -  if( zIn ){
  2084         -    zOut = sqlite3_malloc( nIn + 1 );
  2085         -    if( zOut==0 ){
  2086         -      sqlite3_result_error_nomem(context);
  2087         -    }else{
  2088         -      pollockSkeletonKey(zIn, zOut);
  2089         -      sqlite3_result_text(context, (char*)zOut, -1, sqlite3_free);
  2090         -    }
  2091         -  }
  2092         -}  
  2093         -
  2094         -/*
  2095         -** The Pollock & Zamora omission key.
  2096         -**
  2097         -** The key consists of unique consonants in the following order:
  2098         -**
  2099         -**         jkqxzvwybfmgpdhclntsr
  2100         -**
  2101         -** These are followed by unique vowels in input order.
  2102         -*/
  2103         -static void pollockOmissionKey(const char *zIn, char *zOut){
  2104         -  int i, j;
  2105         -  unsigned char c;
  2106         -  char seen[26];
  2107         -  static const unsigned char isVowel[] = { 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
  2108         -    0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0 };
  2109         -  static const unsigned char constOrder[] = "jkqxzvwybfmgpdhclntsr";
  2110         -
  2111         -  memset(seen, 0, sizeof(seen));
  2112         -  for(i=j=0; (c = (unsigned char)zIn[i])!=0; i++){
  2113         -    if( c<'a' || c>'z' ) continue;
  2114         -    if( isVowel[c-'a'] ) continue;
  2115         -    if( seen[c-'a'] ) continue;
  2116         -    seen[c-'a'] = 1;
  2117         -  }
  2118         -  for(i=0; (c = constOrder[i])!=0; i++){
  2119         -    if( seen[c-'a'] ) zOut[j++] = c;
  2120         -  }
  2121         -  for(i=0; (c = (unsigned char)zIn[i])!=0; i++){
  2122         -    if( c<'a' || c>'z' ) continue;
  2123         -    if( seen[c-'a'] ) continue;
  2124         -    if( !isVowel[c-'a'] ) continue;
  2125         -    seen[c-'a'] = 1;
  2126         -    zOut[j++] = c;
  2127         -  }
  2128         -  zOut[j] = 0;
  2129         -}
  2130         -
  2131         -/*
  2132         -** Function:    pollock_omission(X)
  2133         -**
  2134         -** Return the Pollock and Zamora omission key for a string X of all
  2135         -** lower-case letters.
  2136         -*/
  2137         -static void pollockOmissionSqlFunc(
  2138         -  sqlite3_context *context,
  2139         -  int argc,
  2140         -  sqlite3_value **argv
  2141         -){
  2142         -  const char *zIn = (const char*)sqlite3_value_text(argv[0]);
  2143         -  int nIn = sqlite3_value_bytes(argv[0]);
  2144         -  char *zOut;
  2145         -  if( zIn ){
  2146         -    zOut = sqlite3_malloc( nIn + 1 );
  2147         -    if( zOut==0 ){
  2148         -      sqlite3_result_error_nomem(context);
  2149         -    }else{
  2150         -      pollockOmissionKey(zIn, zOut);
  2151         -      sqlite3_result_text(context, (char*)zOut, -1, sqlite3_free);
  2152         -    }
  2153         -  }
  2154         -}  
  2155         -
  2156         -
  2157         -/* End SPEEDCOP keying functions
  2158         -******************************************************************************
  2159         -******************************************************************************
  2160   1741   ** Begin spellfix1 virtual table.
  2161   1742   */
  2162   1743   
  2163   1744   /* Maximum length of a phonehash used for querying the shadow table */
  2164   1745   #define SPELLFIX_MX_HASH  8
  2165   1746   
  2166   1747   /* Maximum number of hash strings to examine per query */
  2167         -#define SPELLFIX_MX_RUN   8
         1748  +#define SPELLFIX_MX_RUN   1
  2168   1749   
  2169   1750   typedef struct spellfix1_vtab spellfix1_vtab;
  2170   1751   typedef struct spellfix1_cursor spellfix1_cursor;
  2171   1752   
  2172   1753   /* Fuzzy-search virtual table object */
  2173   1754   struct spellfix1_vtab {
  2174   1755     sqlite3_vtab base;         /* Base class - must be first */
................................................................................
  2183   1764   struct spellfix1_cursor {
  2184   1765     sqlite3_vtab_cursor base;    /* Base class - must be first */
  2185   1766     spellfix1_vtab *pVTab;       /* The table to which this cursor belongs */
  2186   1767     char *zPattern;              /* rhs of MATCH clause */
  2187   1768     int nRow;                    /* Number of rows of content */
  2188   1769     int nAlloc;                  /* Number of allocated rows */
  2189   1770     int iRow;                    /* Current row of content */
  2190         -  int iLang;                   /* Value of the lang= constraint */
         1771  +  int iLang;                   /* Value of the langid= constraint */
  2191   1772     int iTop;                    /* Value of the top= constraint */
  2192   1773     int iScope;                  /* Value of the scope= constraint */
  2193   1774     int nSearch;                 /* Number of vocabulary items checked */
         1775  +  sqlite3_stmt *pFullScan;     /* Shadow query for a full table scan */
  2194   1776     struct spellfix1_row {       /* For each row of content */
  2195   1777       sqlite3_int64 iRowid;         /* Rowid for this row */
  2196   1778       char *zWord;                  /* Text for this row */
  2197   1779       int iRank;                    /* Rank for this row */
  2198   1780       int iDistance;                /* Distance from pattern for this row */
  2199   1781       int iScore;                   /* Score for sorting */
  2200   1782       int iMatchlen;                /* Value of matchlen column (or -1) */
................................................................................
  2263   1845     char *zOut;
  2264   1846     int i, j;
  2265   1847     char c;
  2266   1848     while( isspace(zIn[0]) ) zIn++;
  2267   1849     zOut = sqlite3_mprintf("%s", zIn);
  2268   1850     if( zOut==0 ) return 0;
  2269   1851     i = (int)strlen(zOut);
         1852  +#if 0  /* The parser will never leave spaces at the end */
  2270   1853     while( i>0 && isspace(zOut[i-1]) ){ i--; }
         1854  +#endif
  2271   1855     zOut[i] = 0;
  2272   1856     c = zOut[0];
  2273   1857     if( c=='\'' || c=='"' ){
  2274         -    for(i=1, j=0; zOut[i]; i++){
         1858  +    for(i=1, j=0; ALWAYS(zOut[i]); i++){
  2275   1859         zOut[j++] = zOut[i];
  2276   1860         if( zOut[i]==c ){
  2277   1861           if( zOut[i+1]==c ){
  2278   1862             i++;
  2279   1863           }else{
  2280   1864             zOut[j-1] = 0;
  2281   1865             break;
................................................................................
  2307   1891     const char *zModule = argv[0];
  2308   1892     const char *zDbName = argv[1];
  2309   1893     const char *zTableName = argv[2];
  2310   1894     int nDbName;
  2311   1895     int rc = SQLITE_OK;
  2312   1896     int i;
  2313   1897   
  2314         -  if( argc<3 ){
  2315         -    *pzErr = sqlite3_mprintf(
  2316         -        "%s: wrong number of CREATE VIRTUAL TABLE arguments", argv[0]
  2317         -    );
  2318         -    rc = SQLITE_ERROR;
         1898  +  nDbName = strlen(zDbName);
         1899  +  pNew = sqlite3_malloc( sizeof(*pNew) + nDbName + 1);
         1900  +  if( pNew==0 ){
         1901  +    rc = SQLITE_NOMEM;
  2319   1902     }else{
  2320         -    nDbName = strlen(zDbName);
  2321         -    pNew = sqlite3_malloc( sizeof(*pNew) + nDbName + 1);
  2322         -    if( pNew==0 ){
         1903  +    memset(pNew, 0, sizeof(*pNew));
         1904  +    pNew->zDbName = (char*)&pNew[1];
         1905  +    memcpy(pNew->zDbName, zDbName, nDbName+1);
         1906  +    pNew->zTableName = sqlite3_mprintf("%s", zTableName);
         1907  +    pNew->db = db;
         1908  +    if( pNew->zTableName==0 ){
  2323   1909         rc = SQLITE_NOMEM;
  2324   1910       }else{
  2325         -      memset(pNew, 0, sizeof(*pNew));
  2326         -      pNew->zDbName = (char*)&pNew[1];
  2327         -      memcpy(pNew->zDbName, zDbName, nDbName+1);
  2328         -      pNew->zTableName = sqlite3_mprintf("%s", zTableName);
  2329         -      pNew->db = db;
  2330         -      if( pNew->zTableName==0 ){
  2331         -        rc = SQLITE_NOMEM;
  2332         -      }else{
  2333         -        rc = sqlite3_declare_vtab(db, 
  2334         -             "CREATE TABLE x(word,rank,distance,langid, "
  2335         -             "score, matchlen, phonehash, "
  2336         -             "top HIDDEN, scope HIDDEN, srchcnt HIDDEN, "
  2337         -             "soundslike HIDDEN, command HIDDEN)"
  2338         -        );
         1911  +      rc = sqlite3_declare_vtab(db, 
         1912  +           "CREATE TABLE x(word,rank,distance,langid, "
         1913  +           "score, matchlen, phonehash HIDDEN, "
         1914  +           "top HIDDEN, scope HIDDEN, srchcnt HIDDEN, "
         1915  +           "soundslike HIDDEN, command HIDDEN)"
         1916  +      );
  2339   1917   #define SPELLFIX_COL_WORD            0
  2340   1918   #define SPELLFIX_COL_RANK            1
  2341   1919   #define SPELLFIX_COL_DISTANCE        2
  2342   1920   #define SPELLFIX_COL_LANGID          3
  2343   1921   #define SPELLFIX_COL_SCORE           4
  2344   1922   #define SPELLFIX_COL_MATCHLEN        5
  2345   1923   #define SPELLFIX_COL_PHONEHASH       6
  2346   1924   #define SPELLFIX_COL_TOP             7
  2347   1925   #define SPELLFIX_COL_SCOPE           8
  2348   1926   #define SPELLFIX_COL_SRCHCNT         9
  2349   1927   #define SPELLFIX_COL_SOUNDSLIKE     10
  2350   1928   #define SPELLFIX_COL_COMMAND        11
  2351         -      }
  2352         -      if( rc==SQLITE_OK && isCreate ){
  2353         -        sqlite3_uint64 r;
  2354         -        spellfix1DbExec(&rc, db,
  2355         -           "CREATE TABLE IF NOT EXISTS \"%w\".\"%w_vocab\"(\n"
  2356         -           "  id INTEGER PRIMARY KEY,\n"
  2357         -           "  rank INT,\n"
  2358         -           "  langid INT,\n"
  2359         -           "  word TEXT,\n"
  2360         -           "  k1 TEXT,\n"
  2361         -           "  k2 TEXT\n"
  2362         -           ");\n",
  2363         -           zDbName, zTableName
  2364         -        );
  2365         -        sqlite3_randomness(sizeof(r), &r);
  2366         -        spellfix1DbExec(&rc, db,
  2367         -           "CREATE INDEX IF NOT EXISTS \"%w\".\"%w_index_%llx\" "
  2368         -              "ON \"%w_vocab\"(langid,k2);",
  2369         -           zDbName, zModule, r, zTableName
  2370         -        );
         1929  +    }
         1930  +    if( rc==SQLITE_OK && isCreate ){
         1931  +      sqlite3_uint64 r;
         1932  +      spellfix1DbExec(&rc, db,
         1933  +         "CREATE TABLE IF NOT EXISTS \"%w\".\"%w_vocab\"(\n"
         1934  +         "  id INTEGER PRIMARY KEY,\n"
         1935  +         "  rank INT,\n"
         1936  +         "  langid INT,\n"
         1937  +         "  word TEXT,\n"
         1938  +         "  k1 TEXT,\n"
         1939  +         "  k2 TEXT\n"
         1940  +         ");\n",
         1941  +         zDbName, zTableName
         1942  +      );
         1943  +      sqlite3_randomness(sizeof(r), &r);
         1944  +      spellfix1DbExec(&rc, db,
         1945  +         "CREATE INDEX IF NOT EXISTS \"%w\".\"%w_index_%llx\" "
         1946  +            "ON \"%w_vocab\"(langid,k2);",
         1947  +         zDbName, zModule, r, zTableName
         1948  +      );
         1949  +    }
         1950  +    for(i=3; rc==SQLITE_OK && i<argc; i++){
         1951  +      if( memcmp(argv[i],"edit_cost_table=",16)==0 && pNew->zCostTable==0 ){
         1952  +        pNew->zCostTable = spellfix1Dequote(&argv[i][16]);
         1953  +        if( pNew->zCostTable==0 ) rc = SQLITE_NOMEM;
         1954  +        continue;
  2371   1955         }
  2372         -      for(i=3; rc==SQLITE_OK && i<argc; i++){
  2373         -        if( memcmp(argv[i],"edit_cost_table=",16)==0 && pNew->zCostTable==0 ){
  2374         -          pNew->zCostTable = spellfix1Dequote(&argv[i][16]);
  2375         -          if( pNew->zCostTable==0 ) rc = SQLITE_NOMEM;
  2376         -          continue;
  2377         -        }
  2378         -        rc = SQLITE_ERROR; 
  2379         -      }
         1956  +      *pzErr = sqlite3_mprintf("bad argument to spellfix1(): \"%s\"", argv[i]);
         1957  +      rc = SQLITE_ERROR; 
  2380   1958       }
  2381   1959     }
  2382   1960   
  2383         -  *ppVTab = (sqlite3_vtab *)pNew;
         1961  +  if( rc && pNew ){
         1962  +    *ppVTab = 0;
         1963  +    spellfix1Uninit(0, &pNew->base);
         1964  +  }else{
         1965  +    *ppVTab = (sqlite3_vtab *)pNew;
         1966  +  }
  2384   1967     return rc;
  2385   1968   }
  2386   1969   
  2387   1970   /*
  2388   1971   ** The xConnect and xCreate methods
  2389   1972   */
  2390   1973   static int spellfix1Connect(
................................................................................
  2413   1996     int i;
  2414   1997     for(i=0; i<pCur->nRow; i++){
  2415   1998       sqlite3_free(pCur->a[i].zWord);
  2416   1999     }
  2417   2000     pCur->nRow = 0;
  2418   2001     pCur->iRow = 0;
  2419   2002     pCur->nSearch = 0;
         2003  +  if( pCur->pFullScan ){
         2004  +    sqlite3_finalize(pCur->pFullScan);
         2005  +    pCur->pFullScan = 0;
         2006  +  }
  2420   2007   }
  2421   2008   
  2422   2009   /*
  2423   2010   ** Resize the cursor to hold up to N rows of content
  2424   2011   */
  2425   2012   static void spellfix1ResizeCursor(spellfix1_cursor *pCur, int N){
  2426   2013     struct spellfix1_row *aNew;
................................................................................
  2531   2118         iDistTerm = i;
  2532   2119       }
  2533   2120     }
  2534   2121     if( iPlan&1 ){
  2535   2122       int idx = 2;
  2536   2123       pIdxInfo->idxNum = iPlan;
  2537   2124       if( pIdxInfo->nOrderBy==1
  2538         -     && pIdxInfo->aOrderBy[0].iColumn==4
         2125  +     && pIdxInfo->aOrderBy[0].iColumn==SPELLFIX_COL_SCORE
  2539   2126        && pIdxInfo->aOrderBy[0].desc==0
  2540   2127       ){
  2541   2128         pIdxInfo->orderByConsumed = 1;  /* Default order by iScore */
  2542   2129       }
  2543   2130       if( iPlan&2 ){
  2544   2131         pIdxInfo->aConstraintUsage[iLangTerm].argvIndex = idx++;
  2545   2132         pIdxInfo->aConstraintUsage[iLangTerm].omit = 1;
................................................................................
  2636   2223     int iScope = p->iScope;
  2637   2224     spellfix1_cursor *pCur = p->pCur;
  2638   2225     sqlite3_stmt *pStmt = p->pStmt;
  2639   2226     char zHash1[SPELLFIX_MX_HASH];
  2640   2227     char zHash2[SPELLFIX_MX_HASH];
  2641   2228     char *zClass;
  2642   2229     int nClass;
         2230  +  int rc;
  2643   2231   
  2644   2232     if( pCur->a==0 || p->rc ) return;   /* Prior memory allocation failure */
  2645         -  if( p->nRun>=SPELLFIX_MX_RUN ) return;
  2646   2233     zClass = (char*)phoneticHash((unsigned char*)zQuery, nQuery);
  2647   2234     if( zClass==0 ){
  2648   2235       p->rc = SQLITE_NOMEM;
  2649   2236       return;
  2650   2237     }
  2651   2238     nClass = strlen(zClass);
  2652   2239     if( nClass>SPELLFIX_MX_HASH-2 ){
................................................................................
  2662   2249     }
  2663   2250     memcpy(zHash1, zClass, iScope);
  2664   2251     sqlite3_free(zClass);
  2665   2252     zHash1[iScope] = 0;
  2666   2253     memcpy(zHash2, zHash1, iScope);
  2667   2254     zHash2[iScope] = 'Z';
  2668   2255     zHash2[iScope+1] = 0;
         2256  +#if SPELLFIX_MX_RUN>1
  2669   2257     for(i=0; i<p->nRun; i++){
  2670   2258       if( strcmp(p->azPrior[i], zHash1)==0 ) return;
  2671   2259     }
         2260  +#endif
         2261  +  assert( p->nRun<SPELLFIX_MX_RUN );
  2672   2262     memcpy(p->azPrior[p->nRun++], zHash1, iScope+1);
  2673         -  sqlite3_bind_text(pStmt, 1, zHash1, -1, SQLITE_STATIC);
  2674         -  sqlite3_bind_text(pStmt, 2, zHash2, -1, SQLITE_STATIC);
         2263  +  if( sqlite3_bind_text(pStmt, 1, zHash1, -1, SQLITE_STATIC)==SQLITE_NOMEM
         2264  +   || sqlite3_bind_text(pStmt, 2, zHash2, -1, SQLITE_STATIC)==SQLITE_NOMEM
         2265  +  ){
         2266  +    p->rc = SQLITE_NOMEM;
         2267  +    return;
         2268  +  }
         2269  +#if SPELLFIX_MX_RUN>1
  2675   2270     for(i=0; i<pCur->nRow; i++){
  2676   2271       if( pCur->a[i].iScore>iWorst ){
  2677   2272         iWorst = pCur->a[i].iScore;
  2678   2273         idxWorst = i;
  2679   2274       }
  2680   2275     }
         2276  +#endif
  2681   2277     while( sqlite3_step(pStmt)==SQLITE_ROW ){
  2682   2278       int iMatchlen = -1;
  2683   2279       iRank = sqlite3_column_int(pStmt, 2);
  2684   2280       if( p->pMatchStr3 ){
  2685   2281         int nWord = sqlite3_column_bytes(pStmt, 1);
  2686   2282         zWord = (const char*)sqlite3_column_text(pStmt, 1);
  2687   2283         iDist = editDist3Core(p->pMatchStr3, zWord, nWord, p->pLang, &iMatchlen);
  2688   2284       }else{
  2689   2285         zK1 = (const char*)sqlite3_column_text(pStmt, 3);
  2690   2286         if( zK1==0 ) continue;
  2691         -      iDist = editdist1(p->zPattern, zK1, pCur->iLang, 0);
         2287  +      iDist = editdist1(p->zPattern, zK1, 0);
         2288  +    }
         2289  +    if( iDist<0 ){
         2290  +      p->rc = SQLITE_NOMEM;
         2291  +      break;
  2692   2292       }
  2693   2293       pCur->nSearch++;
  2694   2294       iScore = spellfix1Score(iDist,iRank);
  2695   2295       if( p->iMaxDist>=0 ){
  2696   2296         if( iDist>p->iMaxDist ) continue;
  2697   2297         if( pCur->nRow>=pCur->nAlloc-1 ){
  2698   2298           spellfix1ResizeCursor(pCur, pCur->nAlloc*2 + 10);
................................................................................
  2704   2304       }else if( iScore<iWorst ){
  2705   2305         idx = idxWorst;
  2706   2306         sqlite3_free(pCur->a[idx].zWord);
  2707   2307       }else{
  2708   2308         continue;
  2709   2309       }
  2710   2310       pCur->a[idx].zWord = sqlite3_mprintf("%s", sqlite3_column_text(pStmt, 1));
         2311  +    if( pCur->a[idx].zWord==0 ){
         2312  +      p->rc = SQLITE_NOMEM;
         2313  +      break;
         2314  +    }
  2711   2315       pCur->a[idx].iRowid = sqlite3_column_int64(pStmt, 0);
  2712   2316       pCur->a[idx].iRank = iRank;
  2713   2317       pCur->a[idx].iDistance = iDist;
  2714   2318       pCur->a[idx].iScore = iScore;
  2715   2319       pCur->a[idx].iMatchlen = iMatchlen;
  2716   2320       memcpy(pCur->a[idx].zHash, zHash1, iScope+1);
  2717   2321       if( pCur->nRow<pCur->nAlloc ) pCur->nRow++;
................................................................................
  2723   2327           if( iWorst<iScore ){
  2724   2328             iWorst = iScore;
  2725   2329             idxWorst = i;
  2726   2330           }
  2727   2331         }
  2728   2332       }
  2729   2333     }
  2730         -  sqlite3_reset(pStmt);
         2334  +  rc = sqlite3_reset(pStmt);
         2335  +  if( rc ) p->rc = rc;
  2731   2336   }
  2732   2337   
  2733   2338   /*
  2734   2339   ** This version of the xFilter method work if the MATCH term is present
  2735   2340   ** and we are doing a scan.
  2736   2341   */
  2737   2342   static int spellfix1FilterForMatch(
................................................................................
  2744   2349     EditDist3FromString *pMatchStr3 = 0; /* zMatchThis as an editdist string */
  2745   2350     char *zPattern;                    /* Transliteration of zMatchThis */
  2746   2351     int nPattern;                      /* Length of zPattern */
  2747   2352     int iLimit = 20;                   /* Max number of rows of output */
  2748   2353     int iScope = 3;                    /* Use this many characters of zClass */
  2749   2354     int iLang = 0;                     /* Language code */
  2750   2355     char *zSql;                        /* SQL of shadow table query */
  2751         -  sqlite3_stmt *pStmt;               /* Shadow table query */
         2356  +  sqlite3_stmt *pStmt = 0;           /* Shadow table query */
  2752   2357     int rc;                            /* Result code */
  2753   2358     int idx = 1;                       /* Next available filter parameter */
  2754   2359     spellfix1_vtab *p = pCur->pVTab;   /* The virtual table that owns pCur */
  2755   2360     MatchQuery x;                      /* For passing info to RunQuery() */
  2756   2361   
  2757   2362     /* Load the cost table if we have not already done so */
  2758   2363     if( p->zCostTable!=0 && p->pConfig3==0 ){
................................................................................
  2786   2391     spellfix1ResetCursor(pCur);
  2787   2392     spellfix1ResizeCursor(pCur, iLimit);
  2788   2393     zMatchThis = sqlite3_value_text(argv[0]);
  2789   2394     if( zMatchThis==0 ) return SQLITE_OK;
  2790   2395     if( p->pConfig3 ){
  2791   2396       x.pLang = editDist3FindLang(p->pConfig3, iLang);
  2792   2397       pMatchStr3 = editDist3FromStringNew(x.pLang, (const char*)zMatchThis, -1);
         2398  +    if( pMatchStr3==0 ){
         2399  +      x.rc = SQLITE_NOMEM;
         2400  +      goto filter_exit;
         2401  +    }
  2793   2402     }else{
  2794   2403       x.pLang = 0;
  2795   2404     }
  2796   2405     zPattern = (char*)transliterate(zMatchThis, sqlite3_value_bytes(argv[0]));
  2797   2406     sqlite3_free(pCur->zPattern);
  2798   2407     pCur->zPattern = zPattern;
  2799         -  if( zPattern==0 ) return SQLITE_NOMEM;
         2408  +  if( zPattern==0 ){
         2409  +    x.rc = SQLITE_NOMEM;
         2410  +    goto filter_exit;
         2411  +  }
  2800   2412     nPattern = strlen(zPattern);
  2801   2413     if( zPattern[nPattern-1]=='*' ) nPattern--;
  2802   2414     zSql = sqlite3_mprintf(
  2803   2415        "SELECT id, word, rank, k1"
  2804   2416        "  FROM \"%w\".\"%w_vocab\""
  2805   2417        " WHERE langid=%d AND k2>=?1 AND k2<?2",
  2806   2418        p->zDbName, p->zTableName, iLang
  2807   2419     );
         2420  +  if( zSql==0 ){
         2421  +    x.rc = SQLITE_NOMEM;
         2422  +    pStmt = 0;
         2423  +    goto filter_exit;
         2424  +  }
  2808   2425     rc = sqlite3_prepare_v2(p->db, zSql, -1, &pStmt, 0);
  2809   2426     sqlite3_free(zSql);
  2810   2427     pCur->iLang = iLang;
  2811   2428     x.pCur = pCur;
  2812   2429     x.pStmt = pStmt;
  2813   2430     x.zPattern = zPattern;
  2814   2431     x.nPattern = nPattern;
................................................................................
  2816   2433     x.iLang = iLang;
  2817   2434     x.rc = rc;
  2818   2435     x.pConfig3 = p->pConfig3;
  2819   2436     if( x.rc==SQLITE_OK ){
  2820   2437       spellfix1RunQuery(&x, zPattern, nPattern);
  2821   2438     }
  2822   2439   
  2823         -#if 0
  2824         -  /* Convert "ght" to "t" in the original pattern and try again */
  2825         -  if( x.rc==SQLITE_OK ){
  2826         -    int i, j;                         /* Loop counters */
  2827         -    char zQuery[50];                  /* Space for alternative query string */
  2828         -    for(i=j=0; i<nPattern && i<sizeof(zQuery)-1; i++){
  2829         -      char c = zPattern[i];
  2830         -      if( c=='g' && i<nPattern-2 && zPattern[i+1]=='h' && zPattern[i+2]=='t' ){
  2831         -        i += 2;
  2832         -        c= 't';
  2833         -      }
  2834         -      zQuery[j++] = c;
  2835         -    }
  2836         -    zQuery[j] = 0;
  2837         -    if( j<i ){
  2838         -      spellfix1RunQuery(&x, zQuery, j);
  2839         -    }
  2840         -  }
  2841         -#endif
  2842         -
  2843   2440     if( pCur->a ){
  2844   2441       qsort(pCur->a, pCur->nRow, sizeof(pCur->a[0]), spellfix1RowCompare);
  2845   2442       pCur->iTop = iLimit;
  2846   2443       pCur->iScope = iScope;
         2444  +  }else{
         2445  +    x.rc = SQLITE_NOMEM;
  2847   2446     }
         2447  +
         2448  +filter_exit:
  2848   2449     sqlite3_finalize(pStmt);
  2849   2450     editDist3FromStringDelete(pMatchStr3);
  2850         -  return pCur->a ? x.rc : SQLITE_NOMEM;
         2451  +  return x.rc;
  2851   2452   }
  2852   2453   
  2853   2454   /*
  2854   2455   ** This version of xFilter handles a full-table scan case
  2855   2456   */
  2856   2457   static int spellfix1FilterForFullScan(
  2857   2458     spellfix1_cursor *pCur,
  2858   2459     int idxNum,
  2859   2460     int argc,
  2860   2461     sqlite3_value **argv
  2861   2462   ){
         2463  +  int rc;
         2464  +  char *zSql;
         2465  +  spellfix1_vtab *pVTab = pCur->pVTab;
  2862   2466     spellfix1ResetCursor(pCur);
  2863         -  spellfix1ResizeCursor(pCur, 0);
  2864         -  return SQLITE_OK;
         2467  +  zSql = sqlite3_mprintf(
         2468  +     "SELECT word, rank, NULL, langid, id FROM \"%w\".\"%w_vocab\"",
         2469  +     pVTab->zDbName, pVTab->zTableName);
         2470  +  if( zSql==0 ) return SQLITE_NOMEM;
         2471  +  rc = sqlite3_prepare_v2(pVTab->db, zSql, -1, &pCur->pFullScan, 0);
         2472  +  sqlite3_free(zSql);
         2473  +  pCur->nRow = pCur->iRow = 0;
         2474  +  if( rc==SQLITE_OK ){
         2475  +    rc = sqlite3_step(pCur->pFullScan);
         2476  +    if( rc==SQLITE_ROW ){ pCur->iRow = -1; rc = SQLITE_OK; }
         2477  +    if( rc==SQLITE_DONE ){ rc = SQLITE_OK; }
         2478  +  }else{
         2479  +    pCur->iRow = 0;
         2480  +  }
         2481  +  return rc;
  2865   2482   }
  2866   2483   
  2867   2484   
  2868   2485   /*
  2869   2486   ** Called to "rewind" a cursor back to the beginning so that
  2870   2487   ** it starts its output over again.  Always called at least once
  2871   2488   ** prior to any spellfix1Column, spellfix1Rowid, or spellfix1Eof call.
................................................................................
  2887   2504   
  2888   2505   
  2889   2506   /*
  2890   2507   ** Advance a cursor to its next row of output
  2891   2508   */
  2892   2509   static int spellfix1Next(sqlite3_vtab_cursor *cur){
  2893   2510     spellfix1_cursor *pCur = (spellfix1_cursor *)cur;
  2894         -  if( pCur->iRow < pCur->nRow ) pCur->iRow++;
         2511  +  if( pCur->iRow < pCur->nRow ){
         2512  +    if( pCur->pFullScan ){
         2513  +      int rc = sqlite3_step(pCur->pFullScan);
         2514  +      if( rc!=SQLITE_ROW ) pCur->iRow = pCur->nRow;
         2515  +    }else{
         2516  +      pCur->iRow++;
         2517  +    }
         2518  +  }
  2895   2519     return SQLITE_OK;
  2896   2520   }
  2897   2521   
  2898   2522   /*
  2899   2523   ** Return TRUE if we are at the end-of-file
  2900   2524   */
  2901   2525   static int spellfix1Eof(sqlite3_vtab_cursor *cur){
................................................................................
  2902   2526     spellfix1_cursor *pCur = (spellfix1_cursor *)cur;
  2903   2527     return pCur->iRow>=pCur->nRow;
  2904   2528   }
  2905   2529   
  2906   2530   /*
  2907   2531   ** Return columns from the current row.
  2908   2532   */
  2909         -static int spellfix1Column(sqlite3_vtab_cursor *cur, sqlite3_context *ctx, int i){
         2533  +static int spellfix1Column(
         2534  +  sqlite3_vtab_cursor *cur,
         2535  +  sqlite3_context *ctx,
         2536  +  int i
         2537  +){
  2910   2538     spellfix1_cursor *pCur = (spellfix1_cursor*)cur;
         2539  +  if( pCur->pFullScan ){
         2540  +    if( i<=SPELLFIX_COL_LANGID ){
         2541  +      sqlite3_result_value(ctx, sqlite3_column_value(pCur->pFullScan, i));
         2542  +    }else{
         2543  +      sqlite3_result_null(ctx);
         2544  +    }
         2545  +    return SQLITE_OK;
         2546  +  }
  2911   2547     switch( i ){
  2912   2548       case SPELLFIX_COL_WORD: {
  2913   2549         sqlite3_result_text(ctx, pCur->a[pCur->iRow].zWord, -1, SQLITE_STATIC);
  2914   2550         break;
  2915   2551       }
  2916   2552       case SPELLFIX_COL_RANK: {
  2917   2553         sqlite3_result_int(ctx, pCur->a[pCur->iRow].iRank);
................................................................................
  2937   2573           int nWord = strlen(zWord);
  2938   2574   
  2939   2575           if( nPattern>0 && pCur->zPattern[nPattern-1]=='*' ){
  2940   2576             char *zTranslit;
  2941   2577             int res;
  2942   2578             zTranslit = (char *)transliterate((unsigned char *)zWord, nWord);
  2943   2579             if( !zTranslit ) return SQLITE_NOMEM;
  2944         -          res = editdist1(pCur->zPattern, zTranslit, pCur->iLang, &iMatchlen);
         2580  +          res = editdist1(pCur->zPattern, zTranslit, &iMatchlen);
  2945   2581             sqlite3_free(zTranslit);
  2946   2582             if( res<0 ) return SQLITE_NOMEM;
  2947   2583             iMatchlen = translen_to_charlen(zWord, nWord, iMatchlen);
  2948   2584           }else{
  2949   2585             iMatchlen = utf8Charlen(zWord, nWord);
  2950   2586           }
  2951   2587         }
................................................................................
  2978   2614   }
  2979   2615   
  2980   2616   /*
  2981   2617   ** The rowid.
  2982   2618   */
  2983   2619   static int spellfix1Rowid(sqlite3_vtab_cursor *cur, sqlite_int64 *pRowid){
  2984   2620     spellfix1_cursor *pCur = (spellfix1_cursor*)cur;
  2985         -  *pRowid = pCur->a[pCur->iRow].iRowid;
         2621  +  if( pCur->pFullScan ){
         2622  +    *pRowid = sqlite3_column_int64(pCur->pFullScan, 4);
         2623  +  }else{
         2624  +    *pRowid = pCur->a[pCur->iRow].iRowid;
         2625  +  }
  2986   2626     return SQLITE_OK;
  2987   2627   }
  2988   2628   
  2989   2629   /*
  2990   2630   ** The xUpdate() method.
  2991   2631   */
  2992   2632   static int spellfix1Update(
................................................................................
  3062   2702                iRank, iLang, zWord, zK1, zK2
  3063   2703         );
  3064   2704         *pRowid = sqlite3_last_insert_rowid(db);
  3065   2705       }else{
  3066   2706         rowid = sqlite3_value_int64(argv[0]);
  3067   2707         newRowid = *pRowid = sqlite3_value_int64(argv[1]);
  3068   2708         spellfix1DbExec(&rc, db,
  3069         -             "UPDATE \"%w\".\"%w_vocab\" SET id=%lld, rank=%d, lang=%d,"
  3070         -             " word=%Q, rank=%d, k1=%Q, k2=%Q WHERE id=%lld",
         2709  +             "UPDATE \"%w\".\"%w_vocab\" SET id=%lld, rank=%d, langid=%d,"
         2710  +             " word=%Q, k1=%Q, k2=%Q WHERE id=%lld",
  3071   2711                p->zDbName, p->zTableName, newRowid, iRank, iLang,
  3072   2712                zWord, zK1, zK2, rowid
  3073   2713         );
  3074   2714       }
  3075   2715       sqlite3_free(zK1);
  3076   2716       sqlite3_free(zK2);
  3077   2717     }
................................................................................
  3092   2732     spellfix1DbExec(&rc, db, 
  3093   2733        "ALTER TABLE \"%w\".\"%w_vocab\" RENAME TO \"%w_vocab\"",
  3094   2734        p->zDbName, p->zTableName, zNewName
  3095   2735     );
  3096   2736     if( rc==SQLITE_OK ){
  3097   2737       sqlite3_free(p->zTableName);
  3098   2738       p->zTableName = zNewName;
         2739  +  }else{
         2740  +    sqlite3_free(zNewName);
  3099   2741     }
  3100   2742     return rc;
  3101   2743   }
  3102   2744   
  3103   2745   
  3104   2746   /*
  3105   2747   ** A virtual table module that provides fuzzy search.
................................................................................
  3133   2775   static int spellfix1Register(sqlite3 *db){
  3134   2776     int nErr = 0;
  3135   2777     int i;
  3136   2778     nErr += sqlite3_create_function(db, "spellfix1_translit", 1, SQLITE_UTF8, 0,
  3137   2779                                     transliterateSqlFunc, 0, 0);
  3138   2780     nErr += sqlite3_create_function(db, "spellfix1_editdist", 2, SQLITE_UTF8, 0,
  3139   2781                                     editdistSqlFunc, 0, 0);
  3140         -  nErr += sqlite3_create_function(db, "spellfix1_editdist", 3, SQLITE_UTF8, 0,
  3141         -                                  editdistSqlFunc, 0, 0);
  3142   2782     nErr += sqlite3_create_function(db, "spellfix1_phonehash", 1, SQLITE_UTF8, 0,
  3143   2783                                     phoneticHashSqlFunc, 0, 0);
  3144   2784     nErr += sqlite3_create_function(db, "spellfix1_scriptcode", 1, SQLITE_UTF8, 0,
  3145   2785                                     scriptCodeSqlFunc, 0, 0);
  3146         -  nErr += sqlite3_create_function(db, "pollock_skeleton", 1, SQLITE_UTF8, 0,
  3147         -                                  pollockSkeletonSqlFunc, 0, 0);
  3148         -  nErr += sqlite3_create_function(db, "pollock_omission", 1, SQLITE_UTF8, 0,
  3149         -                                  pollockOmissionSqlFunc, 0, 0);
  3150   2786     nErr += sqlite3_create_module(db, "spellfix1", &spellfix1Module, 0);
  3151   2787     nErr += editDist3Install(db);
  3152   2788   
  3153   2789     /* Verify sanity of the translit[] table */
  3154   2790     for(i=0; i<sizeof(translit)/sizeof(translit[0])-1; i++){
  3155   2791       assert( translit[i].cFrom<translit[i+1].cFrom );
  3156   2792     }

Changes to test/spellfix.test.

   133    133     foreach w $vocab {
   134    134       execsql { INSERT INTO t3(word) VALUES($w) }
   135    135     }
   136    136   } {}
   137    137   
   138    138   breakpoint
   139    139   foreach {tn word res} {
   140         -  1   kos*     {kosher 3 kiosk 4 kudo 2 kappa 1 keypad 1}
   141         -  2   kellj*   {killjoy 5 killed 4 killingly 4 kill 4 killer 4}
          140  +  1   kos*     {kosher 3 kiosk 4 kudo 2 kiss 3 kissed 3}
          141  +  2   kellj*   {killjoy 5 kill 4 killed 4 killer 4 killers 4}
   142    142     3   kellj    {kill 4 kills 5 killjoy 7 keel 4 killed 6}
   143    143   } {
   144    144     do_execsql_test 1.2.$tn {
   145         -    SELECT word, matchlen FROM t3 WHERE word MATCH $word LIMIT 5
          145  +    SELECT word, matchlen FROM t3 WHERE word MATCH $word
          146  +     ORDER BY score, word LIMIT 5
   146    147     } $res
   147    148   } 
   148    149   
   149    150   finish_test