/ Check-in [3e4a0082]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix up obsolete comments in FTS3 to conform to the latest nomenclature. Add new comments to better explain FTS3 operation.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 3e4a0082170155b5b779afd075a3ee650530ca68
User & Date: drh 2010-03-23 15:46:41
Context
2010-03-23
18:24
More commenting and documentation enhancements in FTS3. check-in: 892e2867 user: drh tags: trunk
15:46
Fix up obsolete comments in FTS3 to conform to the latest nomenclature. Add new comments to better explain FTS3 operation. check-in: 3e4a0082 user: drh tags: trunk
15:29
Close the auxiliary database db2 at the end of the crash8.test script. check-in: 0fbdc431 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to ext/fts3/fts3.c.

44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109


110
111
112
113
114
115
116
117
118
119
120
121
...
309
310
311
312
313
314
315














316
317
318
319
320
321
322
...
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
...
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
...
543
544
545
546
547
548
549




550
551
552
553
554
555
556
...
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
...
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
...
889
890
891
892
893
894
895





896
897
898
899
900
901
902
...
915
916
917
918
919
920
921











922
923
924
925
926
927
928
....
1050
1051
1052
1053
1054
1055
1056





1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077



1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088

















1089
1090
1091
1092
1093


1094



1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115

1116
1117
1118
1119
1120






1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136

1137
1138
1139
1140
1141
1142
1143
....
1147
1148
1149
1150
1151
1152
1153
1154




1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185

1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
....
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
** 14 bits - BA
** 21 bits - BBA
** and so on.
**
** This is similar in concept to how sqlite encodes "varints" but
** the encoding is not the same.  SQLite varints are big-endian
** are are limited to 9 bytes in length whereas FTS3 varints are
** little-endian and can be upt to 10 bytes in length (in theory).
**
** Example encodings:
**
**     1:    0x01
**   127:    0x7f
**   128:    0x81 0x00
**
**
**** Document lists ****
** A doclist (document list) holds a docid-sorted list of hits for a
** given term.  Doclists hold docids, and can optionally associate
** token positions and offsets with docids.  A position is the index
** of a word within the document.  The first word of the document has
** a position of 0.
**
** FTS3 used to optionally store character offsets using a compile-time
** option.  But that functionality is no longer supported.
**
** A DL_POSITIONS_OFFSETS doclist is stored like this:
**
** array {
**   varint docid;
**   array {                (position list for column 0)
**     varint position;     (delta from previous position plus POS_BASE)
**   }
**   array {
**     varint POS_COLUMN;   (marks start of position list for new column)
**     varint column;       (index of new column)
**     array {
**       varint position;   (delta from previous position plus POS_BASE)
**     }
**   }
**   varint POS_END;        (marks end of positions for this document.
** }
**
** Here, array { X } means zero or more occurrences of X, adjacent in
** memory.  A "position" is an index of a token in the token stream
** generated by the tokenizer. Note that POS_END and POS_COLUMN occur 
** in the same logical place as the position element, and act as sentinals
** ending a position list array.  POS_END is 0.  POS_COLUMN is 1.
** The positions numbers are not stored literally but rather as two more
** the difference from the prior position, or the just the position plus
** 2 for the first position.  Example:
**
**   label:       A B C D E  F  G H   I  J K
**   value:     123 5 9 1 1 14 35 0 234 72 0
**
** The 123 value is the first docid.  For column zero in this document
** there are two matches at positions 3 and 10 (5-2 and 9-2+3).  The 1
** at D signals the start of a new column; the 1 at E indicates that the
** new column is column number 1.  There are two positions at 12 and 45
** (14-2 and 35-2+12).  The 0 at H indicate the end-of-document.  The
** 234 at I is the next docid.  It has one position 72 (72-2) and then
** terminates with the 0 at K.
**
** A DL_POSITIONS doclist omits the startOffset and endOffset
** information.  A DL_DOCIDS doclist omits both the position and
** offset information, becoming an array of varint-encoded docids.


**
** On-disk data is stored as type DL_DEFAULT, so we don't serialize
** the type.  Due to how deletion is implemented in the segmentation
** system, on-disk doclists MUST store at least positions.
**
**
**** Segment leaf nodes ****
** Segment leaf nodes store terms and doclists, ordered by term.  Leaf
** nodes are written using LeafWriter, and read using LeafReader (to
** iterate through a single leaf node's data) and LeavesReader (to
** iterate through a segment's entire leaf layer).  Leaf nodes have
** the format:
................................................................................

#include "fts3.h"
#ifndef SQLITE_CORE 
# include "sqlite3ext.h"
  SQLITE_EXTENSION_INIT1
#endif















/* 
** Write a 64-bit variable-length integer to memory starting at p[0].
** The length of data written will be between 1 and FTS3_VARINT_MAX bytes.
** The number of bytes written is returned.
*/
int sqlite3Fts3PutVarint(char *p, sqlite_int64 v){
  unsigned char *q = (unsigned char *) p;
................................................................................
 sqlite_int64 i;
 int ret = sqlite3Fts3GetVarint(p, &i);
 *pi = (int) i;
 return ret;
}

/*
** Return the number of bytes required to store the value passed as the
** first argument in varint form.
*/
int sqlite3Fts3VarintLen(sqlite3_uint64 v){
  int i = 0;
  do{
    i++;
    v >>= 7;
  }while( v!=0 );
................................................................................
    }
    z[iOut] = '\0';
  }
}

/*
** Read a single varint from the doclist at *pp and advance *pp to point
** to the next element of the varlist.  Add the value of the varint
** to *pVal.
*/
static void fts3GetDeltaVarint(char **pp, sqlite3_int64 *pVal){
  sqlite3_int64 iVal;
  *pp += sqlite3Fts3GetVarint(*pp, &iVal);
  *pVal += iVal;
}
................................................................................
  return rc;
}

/*
** Create the backing store tables (%_content, %_segments and %_segdir)
** required by the FTS3 table passed as the only argument. This is done
** as part of the vtab xCreate() method.




*/
static int fts3CreateTables(Fts3Table *p){
  int rc = SQLITE_OK;             /* Return code */
  int i;                          /* Iterator variable */
  char *zContentCols;             /* Columns of %_content table */
  sqlite3 *db = p->db;            /* The database connection */

................................................................................

/*
** This function is the implementation of both the xConnect and xCreate
** methods of the FTS3 virtual table.
**
** The argv[] array contains the following:
**
**   argv[0]   -> module name
**   argv[1]   -> database name
**   argv[2]   -> table name
**   argv[...] -> "column name" and other module argument fields.
*/
static int fts3InitVtab(
  int isCreate,                   /* True for xCreate, false for xConnect */
  sqlite3 *db,                    /* The SQLite database connection */
................................................................................
  char **pzErr                    /* Write any error message here */
){
  Fts3Hash *pHash = (Fts3Hash *)pAux;
  Fts3Table *p;                   /* Pointer to allocated vtab */
  int rc;                         /* Return code */
  int i;                          /* Iterator variable */
  int nByte;                      /* Size of allocation used for *p */
  int iCol;
  int nString = 0;
  int nCol = 0;
  char *zCsr;
  int nDb;
  int nName;

  const char *zTokenizer = 0;               /* Name of tokenizer to use */
  sqlite3_tokenizer *pTokenizer = 0;        /* Tokenizer for this table */

  nDb = (int)strlen(argv[1]) + 1;
  nName = (int)strlen(argv[2]) + 1;
  for(i=3; i<argc; i++){
................................................................................
  sqlite3Fts3ExprFree(pCsr->pExpr);
  sqlite3_free(pCsr->aDoclist);
  sqlite3_free(pCsr->aMatchinfo);
  sqlite3_free(pCsr);
  return SQLITE_OK;
}






static int fts3CursorSeek(sqlite3_context *pContext, Fts3Cursor *pCsr){
  if( pCsr->isRequireSeek ){
    pCsr->isRequireSeek = 0;
    sqlite3_bind_int64(pCsr->pStmt, 1, pCsr->iPrevId);
    if( SQLITE_ROW==sqlite3_step(pCsr->pStmt) ){
      return SQLITE_OK;
    }else{
................................................................................
      return rc;
    }
  }else{
    return SQLITE_OK;
  }
}












static int fts3NextMethod(sqlite3_vtab_cursor *pCursor){
  int rc = SQLITE_OK;             /* Return code */
  Fts3Cursor *pCsr = (Fts3Cursor *)pCursor;

  if( pCsr->aDoclist==0 ){
    if( SQLITE_ROW!=sqlite3_step(pCsr->pStmt) ){
      pCsr->isEof = 1;
................................................................................
  *piPrev = iVal;
}

/*
** When this function is called, *ppPoslist is assumed to point to the 
** start of a position-list. After it returns, *ppPoslist points to the
** first byte after the position-list.





**
** If pp is not NULL, then the contents of the position list are copied
** to *pp. *pp is set to point to the first byte past the last byte copied
** before this function returns.
*/
static void fts3PoslistCopy(char **pp, char **ppPoslist){
  char *pEnd = *ppPoslist;
  char c = 0;

  /* The end of a position list is marked by a zero encoded as an FTS3 
  ** varint. A single 0x00 byte. Except, if the 0x00 byte is preceded by
  ** a byte with the 0x80 bit set, then it is not a varint 0, but the tail
  ** of some other, multi-byte, value.
  **
  ** The following block moves pEnd to point to the first byte that is not 
  ** immediately preceded by a byte with the 0x80 bit set. Then increments
  ** pEnd once more so that it points to the byte immediately following the
  ** last byte in the position-list.
  */
  while( *pEnd | c ) c = *pEnd++ & 0x80;
  pEnd++;




  if( pp ){
    int n = (int)(pEnd - *ppPoslist);
    char *p = *pp;
    memcpy(p, *ppPoslist, n);
    p += n;
    *pp = p;
  }
  *ppPoslist = pEnd;
}


















static void fts3ColumnlistCopy(char **pp, char **ppPoslist){
  char *pEnd = *ppPoslist;
  char c = 0;

  /* A column-list is terminated by either a 0x01 or 0x00. */


  while( 0xFE & (*pEnd | c) ) c = *pEnd++ & 0x80;



  if( pp ){
    int n = (int)(pEnd - *ppPoslist);
    char *p = *pp;
    memcpy(p, *ppPoslist, n);
    p += n;
    *pp = p;
  }
  *ppPoslist = pEnd;
}

/*
** Value used to signify the end of an offset-list. This is safe because
** it is not possible to have a document with 2^31 terms.
*/
#define OFFSET_LIST_END 0x7fffffff

/*
** This function is used to help parse offset-lists. When this function is
** called, *pp may point to the start of the next varint in the offset-list
** being parsed, or it may point to 1 byte past the end of the offset-list
** (in which case **pp will be 0x00 or 0x01).

**
** If *pp points past the end of the current offset list, set *pi to 
** OFFSET_LIST_END and return. Otherwise, read the next varint from *pp,
** increment the current value of *pi by the value read, and set *pp to
** point to the next value before returning.






*/
static void fts3ReadNextPos(
  char **pp,                      /* IN/OUT: Pointer into offset-list buffer */
  sqlite3_int64 *pi               /* IN/OUT: Value read from offset-list */
){
  if( **pp&0xFE ){
    fts3GetDeltaVarint(pp, pi);
    *pi -= 2;
  }else{
    *pi = OFFSET_LIST_END;
  }
}

/*
** If parameter iCol is not 0, write an 0x01 byte followed by the value of
** iCol encoded as a varint to *pp. 

**
** Set *pp to point to the byte just after the last byte written before 
** returning (do not modify it if iCol==0). Return the total number of bytes
** written (0 if iCol==0).
*/
static int fts3PutColNumber(char **pp, int iCol){
  int n = 0;                      /* Number of bytes written */
................................................................................
    *p = 0x01;
    *pp = &p[n];
  }
  return n;
}

/*
**




*/
static void fts3PoslistMerge(
  char **pp,                      /* Output buffer */
  char **pp1,                     /* Left input list */
  char **pp2                      /* Right input list */
){
  char *p = *pp;
  char *p1 = *pp1;
  char *p2 = *pp2;

  while( *p1 || *p2 ){
    int iCol1;
    int iCol2;

    if( *p1==0x01 ) sqlite3Fts3GetVarint32(&p1[1], &iCol1);
    else if( *p1==0x00 ) iCol1 = OFFSET_LIST_END;
    else iCol1 = 0;

    if( *p2==0x01 ) sqlite3Fts3GetVarint32(&p2[1], &iCol2);
    else if( *p2==0x00 ) iCol2 = OFFSET_LIST_END;
    else iCol2 = 0;

    if( iCol1==iCol2 ){
      sqlite3_int64 i1 = 0;
      sqlite3_int64 i2 = 0;
      sqlite3_int64 iPrev = 0;
      int n = fts3PutColNumber(&p, iCol1);
      p1 += n;
      p2 += n;

      /* At this point, both p1 and p2 point to the start of offset-lists.

      ** An offset-list is a list of non-negative delta-encoded varints, each 
      ** incremented by 2 before being stored. Each list is terminated by a 0 
      ** or 1 value (0x00 or 0x01). The following block merges the two lists
      ** and writes the results to buffer p. p is left pointing to the byte
      ** after the list written. No terminator (0x00 or 0x01) is written to
      ** the output.
      */
      fts3GetDeltaVarint(&p1, &i1);
      fts3GetDeltaVarint(&p2, &i2);
      do {
        fts3PutDeltaVarint(&p, &iPrev, (i1<i2) ? i1 : i2); 
        iPrev -= 2;
        if( i1==i2 ){
................................................................................
          fts3ReadNextPos(&p1, &i1);
          fts3ReadNextPos(&p2, &i2);
        }else if( i1<i2 ){
          fts3ReadNextPos(&p1, &i1);
        }else{
          fts3ReadNextPos(&p2, &i2);
        }
      }while( i1!=OFFSET_LIST_END || i2!=OFFSET_LIST_END );
    }else if( iCol1<iCol2 ){
      p1 += fts3PutColNumber(&p, iCol1);
      fts3ColumnlistCopy(&p, &p1);
    }else{
      p2 += fts3PutColNumber(&p, iCol2);
      fts3ColumnlistCopy(&p, &p2);
    }
  }

  *p++ = '\0';
  *pp = p;
  *pp1 = p1 + 1;
  *pp2 = p2 + 1;
}

/*
** nToken==1 searches for adjacent positions.







|










|
|
|
|




|




|





|











|













|
|
|
>
>

<
<
|
|







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|
<







 







|







 







>
>
>
>







 







|







 







|
|
|
|
|
|







 







>
>
>
>
>







 







>
>
>
>
>
>
>
>
>
>
>







 







>
>
>
>
>










|



|




|
|
>
>
>











>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>




|
>
>
|
>
>
>











|


|


|
|
|
|
>

|
|


>
>
>
>
>
>


|
|

|



|




|
|
>







 







|
>
>
>
>











|
|

|
|


|
|



|
|





|
>
|
|
|

|
|







 







|









|







44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112


113
114
115
116
117
118
119
120
121
...
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
...
369
370
371
372
373
374
375
376

377
378
379
380
381
382
383
...
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
...
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
...
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
...
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
...
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
...
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
....
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
....
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
....
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
** 14 bits - BA
** 21 bits - BBA
** and so on.
**
** This is similar in concept to how sqlite encodes "varints" but
** the encoding is not the same.  SQLite varints are big-endian
** are are limited to 9 bytes in length whereas FTS3 varints are
** little-endian and can be up to 10 bytes in length (in theory).
**
** Example encodings:
**
**     1:    0x01
**   127:    0x7f
**   128:    0x81 0x00
**
**
**** Document lists ****
** A doclist (document list) holds a docid-sorted list of hits for a
** given term.  Doclists hold docids and associated token positions.
** A docid is the unique integer identifier for a single document.
** A position is the index of a word within the document.  The first 
** word of the document has a position of 0.
**
** FTS3 used to optionally store character offsets using a compile-time
** option.  But that functionality is no longer supported.
**
** A doclist is stored like this:
**
** array {
**   varint docid;
**   array {                (position list for column 0)
**     varint position;     (2 more than the delta from previous position)
**   }
**   array {
**     varint POS_COLUMN;   (marks start of position list for new column)
**     varint column;       (index of new column)
**     array {
**       varint position;   (2 more than the delta from previous position)
**     }
**   }
**   varint POS_END;        (marks end of positions for this document.
** }
**
** Here, array { X } means zero or more occurrences of X, adjacent in
** memory.  A "position" is an index of a token in the token stream
** generated by the tokenizer. Note that POS_END and POS_COLUMN occur 
** in the same logical place as the position element, and act as sentinals
** ending a position list array.  POS_END is 0.  POS_COLUMN is 1.
** The positions numbers are not stored literally but rather as two more
** than the difference from the prior position, or the just the position plus
** 2 for the first position.  Example:
**
**   label:       A B C D E  F  G H   I  J K
**   value:     123 5 9 1 1 14 35 0 234 72 0
**
** The 123 value is the first docid.  For column zero in this document
** there are two matches at positions 3 and 10 (5-2 and 9-2+3).  The 1
** at D signals the start of a new column; the 1 at E indicates that the
** new column is column number 1.  There are two positions at 12 and 45
** (14-2 and 35-2+12).  The 0 at H indicate the end-of-document.  The
** 234 at I is the next docid.  It has one position 72 (72-2) and then
** terminates with the 0 at K.
**
** A "position-list" is the list of positions for multiple columns for
** a single docid.  A "column-list" is the set of positions for a single
** column.  Hence, a position-list consists of one or more column-lists,
** a document record consists of a docid followed by a position-list and
** a doclist consists of one or more document records.
**


** A bare doclist omits the position information, becoming an 
** array of varint-encoded docids.
**
**** Segment leaf nodes ****
** Segment leaf nodes store terms and doclists, ordered by term.  Leaf
** nodes are written using LeafWriter, and read using LeafReader (to
** iterate through a single leaf node's data) and LeavesReader (to
** iterate through a segment's entire leaf layer).  Leaf nodes have
** the format:
................................................................................

#include "fts3.h"
#ifndef SQLITE_CORE 
# include "sqlite3ext.h"
  SQLITE_EXTENSION_INIT1
#endif

/*
** The testcase() macro is only used by the amalgamation.  If undefined,
** make it a no-op.
*/
#ifndef testcase
# define testcase(X)
#endif

/*
** Terminator values for position-lists and column-lists.
*/
#define POS_COLUMN  (1)     /* Column-list terminator */
#define POS_END     (0)     /* Position-list terminator */ 

/* 
** Write a 64-bit variable-length integer to memory starting at p[0].
** The length of data written will be between 1 and FTS3_VARINT_MAX bytes.
** The number of bytes written is returned.
*/
int sqlite3Fts3PutVarint(char *p, sqlite_int64 v){
  unsigned char *q = (unsigned char *) p;
................................................................................
 sqlite_int64 i;
 int ret = sqlite3Fts3GetVarint(p, &i);
 *pi = (int) i;
 return ret;
}

/*
** Return the number of bytes required to encode v as a varint

*/
int sqlite3Fts3VarintLen(sqlite3_uint64 v){
  int i = 0;
  do{
    i++;
    v >>= 7;
  }while( v!=0 );
................................................................................
    }
    z[iOut] = '\0';
  }
}

/*
** Read a single varint from the doclist at *pp and advance *pp to point
** to the first byte past the end of the varint.  Add the value of the varint
** to *pVal.
*/
static void fts3GetDeltaVarint(char **pp, sqlite3_int64 *pVal){
  sqlite3_int64 iVal;
  *pp += sqlite3Fts3GetVarint(*pp, &iVal);
  *pVal += iVal;
}
................................................................................
  return rc;
}

/*
** Create the backing store tables (%_content, %_segments and %_segdir)
** required by the FTS3 table passed as the only argument. This is done
** as part of the vtab xCreate() method.
**
** If the p->bHasDocsize boolean is true (indicating that this is an
** FTS4 table, not an FTS3 table) then also create the %_docsize and
** %_stat tables required by FTS4.
*/
static int fts3CreateTables(Fts3Table *p){
  int rc = SQLITE_OK;             /* Return code */
  int i;                          /* Iterator variable */
  char *zContentCols;             /* Columns of %_content table */
  sqlite3 *db = p->db;            /* The database connection */

................................................................................

/*
** This function is the implementation of both the xConnect and xCreate
** methods of the FTS3 virtual table.
**
** The argv[] array contains the following:
**
**   argv[0]   -> module name  ("fts3" or "fts4")
**   argv[1]   -> database name
**   argv[2]   -> table name
**   argv[...] -> "column name" and other module argument fields.
*/
static int fts3InitVtab(
  int isCreate,                   /* True for xCreate, false for xConnect */
  sqlite3 *db,                    /* The SQLite database connection */
................................................................................
  char **pzErr                    /* Write any error message here */
){
  Fts3Hash *pHash = (Fts3Hash *)pAux;
  Fts3Table *p;                   /* Pointer to allocated vtab */
  int rc;                         /* Return code */
  int i;                          /* Iterator variable */
  int nByte;                      /* Size of allocation used for *p */
  int iCol;                       /* Column index */
  int nString = 0;                /* Bytes required to hold all column names */
  int nCol = 0;                   /* Number of columns in the FTS table */
  char *zCsr;                     /* Space for holding column names */
  int nDb;                        /* Bytes required to hold database name */
  int nName;                      /* Bytes required to hold table name */

  const char *zTokenizer = 0;               /* Name of tokenizer to use */
  sqlite3_tokenizer *pTokenizer = 0;        /* Tokenizer for this table */

  nDb = (int)strlen(argv[1]) + 1;
  nName = (int)strlen(argv[2]) + 1;
  for(i=3; i<argc; i++){
................................................................................
  sqlite3Fts3ExprFree(pCsr->pExpr);
  sqlite3_free(pCsr->aDoclist);
  sqlite3_free(pCsr->aMatchinfo);
  sqlite3_free(pCsr);
  return SQLITE_OK;
}

/*
** Position the pCsr->pStmt statement so that it is on the row
** of the %_content table that contains the last match.  Return
** SQLITE_OK on success.  
*/
static int fts3CursorSeek(sqlite3_context *pContext, Fts3Cursor *pCsr){
  if( pCsr->isRequireSeek ){
    pCsr->isRequireSeek = 0;
    sqlite3_bind_int64(pCsr->pStmt, 1, pCsr->iPrevId);
    if( SQLITE_ROW==sqlite3_step(pCsr->pStmt) ){
      return SQLITE_OK;
    }else{
................................................................................
      return rc;
    }
  }else{
    return SQLITE_OK;
  }
}

/*
** Advance the cursor to the next row in the %_content table that
** matches the search criteria.  For a MATCH search, this will be
** the next row that matches.  For a full-table scan, this will be
** simply the next row in the %_content table.  For a docid lookup,
** this routine simply sets the EOF flag.
**
** Return SQLITE_OK if nothing goes wrong.  SQLITE_OK is returned
** even if we reach end-of-file.  The fts3EofMethod() will be called
** subsequently to determine whether or not an EOF was hit.
*/
static int fts3NextMethod(sqlite3_vtab_cursor *pCursor){
  int rc = SQLITE_OK;             /* Return code */
  Fts3Cursor *pCsr = (Fts3Cursor *)pCursor;

  if( pCsr->aDoclist==0 ){
    if( SQLITE_ROW!=sqlite3_step(pCsr->pStmt) ){
      pCsr->isEof = 1;
................................................................................
  *piPrev = iVal;
}

/*
** When this function is called, *ppPoslist is assumed to point to the 
** start of a position-list. After it returns, *ppPoslist points to the
** first byte after the position-list.
**
** A position list is list of positions (delta encoded) and columns for 
** a single document record of a doclist.  So, in other words, this
** routine advances *ppPoslist so that it points to the next docid in
** the doclist, or to the first byte past the end of the doclist.
**
** If pp is not NULL, then the contents of the position list are copied
** to *pp. *pp is set to point to the first byte past the last byte copied
** before this function returns.
*/
static void fts3PoslistCopy(char **pp, char **ppPoslist){
  char *pEnd = *ppPoslist;
  char c = 0;

  /* The end of a position list is marked by a zero encoded as an FTS3 
  ** varint. A single POS_END (0) byte. Except, if the 0 byte is preceded by
  ** a byte with the 0x80 bit set, then it is not a varint 0, but the tail
  ** of some other, multi-byte, value.
  **
  ** The following while-loop moves pEnd to point to the first byte that is not 
  ** immediately preceded by a byte with the 0x80 bit set. Then increments
  ** pEnd once more so that it points to the byte immediately following the
  ** last byte in the position-list.
  */
  while( *pEnd | c ){
    c = *pEnd++ & 0x80;
    testcase( c!=0 && (*pEnd)==0 );
  }
  pEnd++;  /* Advance past the POS_END terminator byte */

  if( pp ){
    int n = (int)(pEnd - *ppPoslist);
    char *p = *pp;
    memcpy(p, *ppPoslist, n);
    p += n;
    *pp = p;
  }
  *ppPoslist = pEnd;
}

/*
** When this function is called, *ppPoslist is assumed to point to the 
** start of a column-list. After it returns, *ppPoslist points to the
** to the terminator (POS_COLUMN or POS_END) byte of the column-list.
**
** A column-list is list of delta-encoded positions for a single column
** within a single document within a doclist.
**
** The column-list is terminated either by a POS_COLUMN varint (1) or
** a POS_END varint (0).  This routine leaves *ppPoslist pointing to
** the POS_COLUMN or POS_END that terminates the column-list.
**
** If pp is not NULL, then the contents of the column-list are copied
** to *pp. *pp is set to point to the first byte past the last byte copied
** before this function returns.  The POS_COLUMN or POS_END terminator
** is not copied into *pp.
*/
static void fts3ColumnlistCopy(char **pp, char **ppPoslist){
  char *pEnd = *ppPoslist;
  char c = 0;

  /* A column-list is terminated by either a 0x01 or 0x00 byte that is
  ** not part of a multi-byte varint.
  */
  while( 0xFE & (*pEnd | c) ){
    c = *pEnd++ & 0x80;
    testcase( c!=0 && ((*pEnd)&0xfe)==0 );
  }
  if( pp ){
    int n = (int)(pEnd - *ppPoslist);
    char *p = *pp;
    memcpy(p, *ppPoslist, n);
    p += n;
    *pp = p;
  }
  *ppPoslist = pEnd;
}

/*
** Value used to signify the end of an position-list. This is safe because
** it is not possible to have a document with 2^31 terms.
*/
#define POSITION_LIST_END 0x7fffffff

/*
** This function is used to help parse position-lists. When this function is
** called, *pp may point to the start of the next varint in the position-list
** being parsed, or it may point to 1 byte past the end of the position-list
** (in which case **pp will be a terminator bytes POS_END (0) or
** (1)).
**
** If *pp points past the end of the current position-list, set *pi to 
** POSITION_LIST_END and return. Otherwise, read the next varint from *pp,
** increment the current value of *pi by the value read, and set *pp to
** point to the next value before returning.
**
** Before calling this routine *pi must be initialized to the value of
** the previous position, or zero if we are reading the first position
** in the position-list.  Because positions are delta-encoded, the value
** of the previous position is needed in order to compute the value of
** the next position.
*/
static void fts3ReadNextPos(
  char **pp,                    /* IN/OUT: Pointer into position-list buffer */
  sqlite3_int64 *pi             /* IN/OUT: Value read from position-list */
){
  if( (**pp)&0xFE ){
    fts3GetDeltaVarint(pp, pi);
    *pi -= 2;
  }else{
    *pi = POSITION_LIST_END;
  }
}

/*
** If parameter iCol is not 0, write an POS_COLUMN (1) byte followed by
** the value of iCol encoded as a varint to *pp.   This will start a new
** column list.
**
** Set *pp to point to the byte just after the last byte written before 
** returning (do not modify it if iCol==0). Return the total number of bytes
** written (0 if iCol==0).
*/
static int fts3PutColNumber(char **pp, int iCol){
  int n = 0;                      /* Number of bytes written */
................................................................................
    *p = 0x01;
    *pp = &p[n];
  }
  return n;
}

/*
** Compute the union of two position lists.  The output written
** into *pp contains all positions of both *pp1 and *pp2 in sorted
** order and with any duplicates removed.  All pointers are
** updated appropriately.   The caller is responsible for insuring
** that there is enough space in *pp to hold the complete output.
*/
static void fts3PoslistMerge(
  char **pp,                      /* Output buffer */
  char **pp1,                     /* Left input list */
  char **pp2                      /* Right input list */
){
  char *p = *pp;
  char *p1 = *pp1;
  char *p2 = *pp2;

  while( *p1 || *p2 ){
    int iCol1;         /* The current column index in pp1 */
    int iCol2;         /* The current column index in pp2 */

    if( *p1==POS_COLUMN ) sqlite3Fts3GetVarint32(&p1[1], &iCol1);
    else if( *p1==POS_END ) iCol1 = POSITION_LIST_END;
    else iCol1 = 0;

    if( *p2==POS_COLUMN ) sqlite3Fts3GetVarint32(&p2[1], &iCol2);
    else if( *p2==POS_END ) iCol2 = POSITION_LIST_END;
    else iCol2 = 0;

    if( iCol1==iCol2 ){
      sqlite3_int64 i1 = 0;       /* Last position from pp1 */
      sqlite3_int64 i2 = 0;       /* Last position from pp2 */
      sqlite3_int64 iPrev = 0;
      int n = fts3PutColNumber(&p, iCol1);
      p1 += n;
      p2 += n;

      /* At this point, both p1 and p2 point to the start of column-lists
      ** for the same column (the column with index iCol1 and iCol2).
      ** A column-list is a list of non-negative delta-encoded varints, each 
      ** incremented by 2 before being stored. Each list is terminated by a
      ** POS_END (0) or POS_COLUMN (1). The following block merges the two lists
      ** and writes the results to buffer p. p is left pointing to the byte
      ** after the list written. No terminator (POS_END or POS_COLUMN) is
      ** written to the output.
      */
      fts3GetDeltaVarint(&p1, &i1);
      fts3GetDeltaVarint(&p2, &i2);
      do {
        fts3PutDeltaVarint(&p, &iPrev, (i1<i2) ? i1 : i2); 
        iPrev -= 2;
        if( i1==i2 ){
................................................................................
          fts3ReadNextPos(&p1, &i1);
          fts3ReadNextPos(&p2, &i2);
        }else if( i1<i2 ){
          fts3ReadNextPos(&p1, &i1);
        }else{
          fts3ReadNextPos(&p2, &i2);
        }
      }while( i1!=POSITION_LIST_END || i2!=POSITION_LIST_END );
    }else if( iCol1<iCol2 ){
      p1 += fts3PutColNumber(&p, iCol1);
      fts3ColumnlistCopy(&p, &p1);
    }else{
      p2 += fts3PutColNumber(&p, iCol2);
      fts3ColumnlistCopy(&p, &p2);
    }
  }

  *p++ = POS_END;
  *pp = p;
  *pp1 = p1 + 1;
  *pp2 = p2 + 1;
}

/*
** nToken==1 searches for adjacent positions.