/ Check-in [64b7e340]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Require a minimum fanout for interior nodes. This prevents cases where excessively large terms keep the tree from finding a single root. A downside is that this could result in large interior nodes in the presence of large terms, which may be prone to fragmentation, though if the nodes were smaller that would translate into more levels in the tree, which would also have that problem. (CVS 3510)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 64b7e3406134ac4891113b9bb432ad97504268bb
User & Date: shess 2006-11-13 21:00:55
Context
2006-11-13
21:09
Delta-encode docids. This is good for around 22% reduction in index size with DL_POSITIONS. It improves performance about 5%-6%. (CVS 3511) check-in: 9b6d413d user: shess tags: trunk
21:00
Require a minimum fanout for interior nodes. This prevents cases where excessively large terms keep the tree from finding a single root. A downside is that this could result in large interior nodes in the presence of large terms, which may be prone to fragmentation, though if the nodes were smaller that would translate into more levels in the tree, which would also have that problem. (CVS 3510) check-in: 64b7e340 user: shess tags: trunk
20:15
Allow backing tables to be missing on dropping fts table. Fixes http://www.sqlite.org/cvstrac/tktview?tn=1992,35 . (CVS 3509) check-in: 9628a61a user: shess tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to ext/fts2/fts2.c.

   155    155   ** is encoded.  The subtree at iBlockid will contain all terms less
   156    156   ** than the first term encoded (or all terms if no term is encoded).
   157    157   ** Otherwise, for terms greater than or equal to pTerm[i] but less
   158    158   ** than pTerm[i+1], the subtree for that term will be rooted at
   159    159   ** iBlockid+i.
   160    160   **
   161    161   ** New data is spilled to a new interior node at the same height when
   162         -** the current node exceeds INTERIOR_MAX bytes (default 2048).  The
   163         -** interior nodes at a given height are naturally tracked by interior
   164         -** nodes at height+1, and so on.
          162  +** the current node exceeds INTERIOR_MAX bytes (default 2048).
          163  +** INTERIOR_MIN_TERMS (default 7) keeps large terms from monopolizing
          164  +** interior nodes and making the tree too skinny.  The interior nodes
          165  +** at a given height are naturally tracked by interior nodes at
          166  +** height+1, and so on.
   165    167   **
   166    168   **
   167    169   **** Segment directory ****
   168    170   ** The segment directory in table %_segdir stores meta-information for
   169    171   ** merging and deleting segments, and also the root node of the
   170    172   ** segment's tree.
   171    173   **
................................................................................
  3593   3595   /* InteriorWriter is used to collect terms and block references into
  3594   3596   ** interior nodes in %_segments.  See commentary at top of file for
  3595   3597   ** format.
  3596   3598   */
  3597   3599   
  3598   3600   /* How large interior nodes can grow. */
  3599   3601   #define INTERIOR_MAX 2048
         3602  +
         3603  +/* Minimum number of terms per interior node (except the root). This
         3604  +** prevents large terms from making the tree too skinny - must be >0
         3605  +** so that the tree always makes progress.  Note that the min tree
         3606  +** fanout will be INTERIOR_MIN_TERMS+1.
         3607  +*/
         3608  +#define INTERIOR_MIN_TERMS 7
         3609  +#if INTERIOR_MIN_TERMS<1
         3610  +# error INTERIOR_MIN_TERMS must be greater than 0.
         3611  +#endif
  3600   3612   
  3601   3613   /* ROOT_MAX controls how much data is stored inline in the segment
  3602   3614   ** directory.
  3603   3615   */
  3604   3616   /* TODO(shess) Push ROOT_MAX down to whoever is writing things.  It's
  3605   3617   ** only here so that interiorWriterRootInfo() and leafWriterRootInfo()
  3606   3618   ** can both see it, but if the caller passed it in, we wouldn't even
................................................................................
  3638   3650   }
  3639   3651   
  3640   3652   typedef struct InteriorWriter {
  3641   3653     int iHeight;                   /* from 0 at leaves. */
  3642   3654     InteriorBlock *first, *last;
  3643   3655     struct InteriorWriter *parentWriter;
  3644   3656   
         3657  +  sqlite_int64 iOpeningChildBlock; /* First child block in block "last". */
  3645   3658   #ifndef NDEBUG
  3646   3659     sqlite_int64 iLastChildBlock;  /* for consistency checks. */
  3647   3660   #endif
  3648   3661   } InteriorWriter;
  3649   3662   
  3650   3663   /* Initialize an interior node where pTerm[nTerm] marks the leftmost
  3651   3664   ** term in the tree.  iChildBlock is the leftmost child block at the
................................................................................
  3655   3668                                  sqlite_int64 iChildBlock,
  3656   3669                                  InteriorWriter *pWriter){
  3657   3670     InteriorBlock *block;
  3658   3671     assert( iHeight>0 );
  3659   3672     CLEAR(pWriter);
  3660   3673   
  3661   3674     pWriter->iHeight = iHeight;
         3675  +  pWriter->iOpeningChildBlock = iChildBlock;
  3662   3676   #ifndef NDEBUG
  3663   3677     pWriter->iLastChildBlock = iChildBlock;
  3664   3678   #endif
  3665   3679     block = interiorBlockNew(iHeight, iChildBlock, pTerm, nTerm);
  3666   3680     pWriter->last = pWriter->first = block;
  3667   3681   }
  3668   3682   
................................................................................
  3676   3690     int n = putVarint(c, nTerm);
  3677   3691   
  3678   3692   #ifndef NDEBUG
  3679   3693     pWriter->iLastChildBlock++;
  3680   3694   #endif
  3681   3695     assert( pWriter->iLastChildBlock==iChildBlock );
  3682   3696   
  3683         -  if( pWriter->last->data.nData+n+nTerm>INTERIOR_MAX ){
  3684         -    /* Overflow to a new block. */
         3697  +  /* Overflow to a new block if the new term makes the current block
         3698  +  ** too big, and the current block already has enough terms.
         3699  +  */
         3700  +  if( pWriter->last->data.nData+n+nTerm>INTERIOR_MAX &&
         3701  +      iChildBlock-pWriter->iOpeningChildBlock>INTERIOR_MIN_TERMS ){
  3685   3702       pWriter->last->next = interiorBlockNew(pWriter->iHeight, iChildBlock,
  3686   3703                                              pTerm, nTerm);
  3687   3704       pWriter->last = pWriter->last->next;
         3705  +    pWriter->iOpeningChildBlock = iChildBlock;
  3688   3706     }else{
  3689   3707       dataBufferAppend2(&pWriter->last->data, c, n, pTerm, nTerm);
  3690   3708     }
  3691   3709   }
  3692   3710   
  3693   3711   /* Free the space used by pWriter, including the linked-list of
  3694   3712   ** InteriorBlocks, and parentWriter, if present.