Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Delta-encode terms in interior nodes. While experiments have shown that this is of marginal utility when encoding terms resulting from regular English text, it turns out to be very useful when encoding inputs with very large terms. (CVS 3520) |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
c8151a998ec2423b417566823dc9957c |
User & Date: | shess 2006-11-29 01:02:03.000 |
Context
2006-11-29
| ||
05:17 |
http://www.sqlite.org/cvstrac/tktview?tn=2046
The virtual table interface allows for a cursor to field multiple xFilter() calls. For instance, if a join is done with a virtual table, there could be a call for each row which potentially matches. Unfortunately, fulltextFilter() assumes that it has a fresh cursor, and overwrites a prepared statement and a malloc'ed pointer, resulting in unfinalized statements and a memory leak. This change hacks the code to manually clean up offending items in fulltextFilter(), emphasis on "hacks", since it's a fragile fix insofar as future additions to fulltext_cursor could continue to have the problem. (CVS 3521) (check-in: 18142fdb6d user: shess tags: trunk) | |
01:02 | Delta-encode terms in interior nodes. While experiments have shown that this is of marginal utility when encoding terms resulting from regular English text, it turns out to be very useful when encoding inputs with very large terms. (CVS 3520) (check-in: c8151a998e user: shess tags: trunk) | |
2006-11-23
| ||
21:09 | Improvements to the speed tests recently added to the test suite. (CVS 3519) (check-in: 272c1a6e61 user: drh tags: trunk) | |
Changes
Changes to ext/fts2/fts2.c.
︙ | ︙ | |||
134 135 136 137 138 139 140 | ** InteriorReader. InteriorWriters are created as needed when ** SegmentWriter creates new leaf nodes, or when an interior node ** itself grows too big and must be split. The format of interior ** nodes: ** ** varint iHeight; (height from leaf level, always >0) ** varint iBlockid; (block id of node's leftmost subtree) | | | | > > > > > > > | < | 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | ** InteriorReader. InteriorWriters are created as needed when ** SegmentWriter creates new leaf nodes, or when an interior node ** itself grows too big and must be split. The format of interior ** nodes: ** ** varint iHeight; (height from leaf level, always >0) ** varint iBlockid; (block id of node's leftmost subtree) ** optional { ** varint nTerm; (length of first term) ** char pTerm[nTerm]; (content of first term) ** array { ** (further terms are delta-encoded) ** varint nPrefix; (length of shared prefix with previous term) ** varint nSuffix; (length of unshared suffix) ** char pTermSuffix[nSuffix]; (unshared suffix of next term) ** } ** } ** ** Here, optional { X } means an optional element, while array { X } ** means zero or more occurrences of X, adjacent in memory. ** ** An interior node encodes n terms separating n+1 subtrees. The ** subtree blocks are contiguous, so only the first subtree's blockid ** is encoded. The subtree at iBlockid will contain all terms less ** than the first term encoded (or all terms if no term is encoded). ** Otherwise, for terms greater than or equal to pTerm[i] but less ** than pTerm[i+1], the subtree for that term will be rooted at |
︙ | ︙ | |||
3686 3687 3688 3689 3690 3691 3692 | n = getVarint(pData, &iBlockid); assert( n>0 ); assert( n<=nData ); pData += n; nData -= n; /* Zero or more terms of positive length */ | | > > > > > > > > > > > > > > > > > > > > > > | 3692 3693 3694 3695 3696 3697 3698 3699 3700 3701 3702 3703 3704 3705 3706 3707 3708 3709 3710 3711 3712 3713 3714 3715 3716 3717 3718 3719 3720 3721 3722 3723 3724 3725 3726 3727 3728 3729 3730 3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 3747 | n = getVarint(pData, &iBlockid); assert( n>0 ); assert( n<=nData ); pData += n; nData -= n; /* Zero or more terms of positive length */ if( nData!=0 ){ /* First term is not delta-encoded. */ n = getVarint32(pData, &iDummy); assert( n>0 ); assert( iDummy>0 ); assert( n+iDummy>0); assert( n+iDummy<=nData ); pData += n+iDummy; nData -= n+iDummy; /* Following terms delta-encoded. */ while( nData!=0 ){ /* Length of shared prefix. */ n = getVarint32(pData, &iDummy); assert( n>0 ); assert( iDummy>=0 ); assert( n<nData ); pData += n; nData -= n; /* Length and data of distinct suffix. */ n = getVarint32(pData, &iDummy); assert( n>0 ); assert( iDummy>0 ); assert( n+iDummy>0); assert( n+iDummy<=nData ); pData += n+iDummy; nData -= n+iDummy; } } } #define ASSERT_VALID_INTERIOR_BLOCK(x) interiorBlockValidate(x) #else #define ASSERT_VALID_INTERIOR_BLOCK(x) assert( 1 ) #endif typedef struct InteriorWriter { int iHeight; /* from 0 at leaves. */ InteriorBlock *first, *last; struct InteriorWriter *parentWriter; DataBuffer term; /* Last term written to block "last". */ sqlite_int64 iOpeningChildBlock; /* First child block in block "last". */ #ifndef NDEBUG sqlite_int64 iLastChildBlock; /* for consistency checks. */ #endif } InteriorWriter; /* Initialize an interior node where pTerm[nTerm] marks the leftmost |
︙ | ︙ | |||
3731 3732 3733 3734 3735 3736 3737 3738 3739 3740 3741 3742 3743 3744 3745 3746 | pWriter->iOpeningChildBlock = iChildBlock; #ifndef NDEBUG pWriter->iLastChildBlock = iChildBlock; #endif block = interiorBlockNew(iHeight, iChildBlock, pTerm, nTerm); pWriter->last = pWriter->first = block; ASSERT_VALID_INTERIOR_BLOCK(pWriter->last); } /* Append the child node rooted at iChildBlock to the interior node, ** with pTerm[nTerm] as the leftmost term in iChildBlock's subtree. */ static void interiorWriterAppend(InteriorWriter *pWriter, const char *pTerm, int nTerm, sqlite_int64 iChildBlock){ char c[VARINT_MAX+VARINT_MAX]; | > | > > > > > > > > > > > > > > > > > > | > | > > | 3759 3760 3761 3762 3763 3764 3765 3766 3767 3768 3769 3770 3771 3772 3773 3774 3775 3776 3777 3778 3779 3780 3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 3795 3796 3797 3798 3799 3800 3801 3802 3803 3804 3805 3806 3807 3808 3809 3810 3811 3812 3813 3814 3815 3816 3817 3818 3819 3820 3821 3822 3823 | pWriter->iOpeningChildBlock = iChildBlock; #ifndef NDEBUG pWriter->iLastChildBlock = iChildBlock; #endif block = interiorBlockNew(iHeight, iChildBlock, pTerm, nTerm); pWriter->last = pWriter->first = block; ASSERT_VALID_INTERIOR_BLOCK(pWriter->last); dataBufferInit(&pWriter->term, 0); } /* Append the child node rooted at iChildBlock to the interior node, ** with pTerm[nTerm] as the leftmost term in iChildBlock's subtree. */ static void interiorWriterAppend(InteriorWriter *pWriter, const char *pTerm, int nTerm, sqlite_int64 iChildBlock){ char c[VARINT_MAX+VARINT_MAX]; int n, nPrefix = 0; ASSERT_VALID_INTERIOR_BLOCK(pWriter->last); /* The first term written into an interior node is actually ** associated with the second child added (the first child was added ** in interiorWriterInit, or in the if clause at the bottom of this ** function). That term gets encoded straight up, with nPrefix left ** at 0. */ if( pWriter->term.nData==0 ){ n = putVarint(c, nTerm); }else{ while( nPrefix<pWriter->term.nData && pTerm[nPrefix]==pWriter->term.pData[nPrefix] ){ nPrefix++; } n = putVarint(c, nPrefix); n += putVarint(c+n, nTerm-nPrefix); } #ifndef NDEBUG pWriter->iLastChildBlock++; #endif assert( pWriter->iLastChildBlock==iChildBlock ); /* Overflow to a new block if the new term makes the current block ** too big, and the current block already has enough terms. */ if( pWriter->last->data.nData+n+nTerm-nPrefix>INTERIOR_MAX && iChildBlock-pWriter->iOpeningChildBlock>INTERIOR_MIN_TERMS ){ pWriter->last->next = interiorBlockNew(pWriter->iHeight, iChildBlock, pTerm, nTerm); pWriter->last = pWriter->last->next; pWriter->iOpeningChildBlock = iChildBlock; dataBufferReset(&pWriter->term); }else{ dataBufferAppend2(&pWriter->last->data, c, n, pTerm+nPrefix, nTerm-nPrefix); dataBufferReplace(&pWriter->term, pTerm, nTerm); } ASSERT_VALID_INTERIOR_BLOCK(pWriter->last); } /* Free the space used by pWriter, including the linked-list of ** InteriorBlocks, and parentWriter, if present. */ |
︙ | ︙ | |||
3781 3782 3783 3784 3785 3786 3787 3788 3789 3790 3791 3792 3793 3794 | dataBufferDestroy(&b->data); free(b); } if( pWriter->parentWriter!=NULL ){ interiorWriterDestroy(pWriter->parentWriter); free(pWriter->parentWriter); } SCRAMBLE(pWriter); return SQLITE_OK; } /* If pWriter can fit entirely in ROOT_MAX, return it as the root info ** directly, leaving *piEndBlockid unchanged. Otherwise, flush ** pWriter to %_segments, building a new layer of interior nodes, and | > | 3831 3832 3833 3834 3835 3836 3837 3838 3839 3840 3841 3842 3843 3844 3845 | dataBufferDestroy(&b->data); free(b); } if( pWriter->parentWriter!=NULL ){ interiorWriterDestroy(pWriter->parentWriter); free(pWriter->parentWriter); } dataBufferDestroy(&pWriter->term); SCRAMBLE(pWriter); return SQLITE_OK; } /* If pWriter can fit entirely in ROOT_MAX, return it as the root info ** directly, leaving *piEndBlockid unchanged. Otherwise, flush ** pWriter to %_segments, building a new layer of interior nodes, and |
︙ | ︙ | |||
3837 3838 3839 3840 3841 3842 3843 | /* Parent node gets the chance to be the root. */ return interiorWriterRootInfo(v, pWriter->parentWriter, ppRootInfo, pnRootInfo, piEndBlockid); } /****************************************************************/ /* InteriorReader is used to read off the data from an interior node | | < > > | > > > > > > > > > > > > > > > | < < | < < | < > > > > > > > > > | > > > > > > | | | > | 3888 3889 3890 3891 3892 3893 3894 3895 3896 3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 3938 3939 3940 3941 3942 3943 3944 3945 3946 3947 3948 3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 | /* Parent node gets the chance to be the root. */ return interiorWriterRootInfo(v, pWriter->parentWriter, ppRootInfo, pnRootInfo, piEndBlockid); } /****************************************************************/ /* InteriorReader is used to read off the data from an interior node ** (see comment at top of file for the format). */ typedef struct InteriorReader { const char *pData; int nData; DataBuffer term; /* previous term, for decoding term delta. */ sqlite_int64 iBlockid; } InteriorReader; static void interiorReaderDestroy(InteriorReader *pReader){ SCRAMBLE(pReader); } static void interiorReaderInit(const char *pData, int nData, InteriorReader *pReader){ int n, nTerm; /* Require at least the leading flag byte */ assert( nData>0 ); assert( pData[0]!='\0' ); CLEAR(pReader); /* Decode the base blockid, and set the cursor to the first term. */ n = getVarint(pData+1, &pReader->iBlockid); assert( 1+n<=nData ); pReader->pData = pData+1+n; pReader->nData = nData-(1+n); /* A single-child interior node (such as when a leaf node was too ** large for the segment directory) won't have any terms. ** Otherwise, decode the first term. */ if( pReader->nData==0 ){ dataBufferInit(&pReader->term, 0); }else{ n = getVarint32(pReader->pData, &nTerm); dataBufferInit(&pReader->term, nTerm); dataBufferReplace(&pReader->term, pReader->pData+n, nTerm); assert( n+nTerm<=pReader->nData ); pReader->pData += n+nTerm; pReader->nData -= n+nTerm; } } static int interiorReaderAtEnd(InteriorReader *pReader){ return pReader->term.nData==0; } static sqlite_int64 interiorReaderCurrentBlockid(InteriorReader *pReader){ return pReader->iBlockid; } static int interiorReaderTermBytes(InteriorReader *pReader){ assert( !interiorReaderAtEnd(pReader) ); return pReader->term.nData; } static const char *interiorReaderTerm(InteriorReader *pReader){ assert( !interiorReaderAtEnd(pReader) ); return pReader->term.pData; } /* Step forward to the next term in the node. */ static void interiorReaderStep(InteriorReader *pReader){ assert( !interiorReaderAtEnd(pReader) ); /* If the last term has been read, signal eof, else construct the ** next term. */ if( pReader->nData==0 ){ dataBufferReset(&pReader->term); }else{ int n, nPrefix, nSuffix; n = getVarint32(pReader->pData, &nPrefix); n += getVarint32(pReader->pData+n, &nSuffix); /* Truncate the current term and append suffix data. */ pReader->term.nData = nPrefix; dataBufferAppend(&pReader->term, pReader->pData+n, nSuffix); assert( n+nSuffix<=pReader->nData ); pReader->pData += n+nSuffix; pReader->nData -= n+nSuffix; } pReader->iBlockid++; } /* Compare the current term to pTerm[nTerm], returning strcmp-style ** results. */ static int interiorReaderTermCmp(InteriorReader *pReader, |
︙ | ︙ |