Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add an optimization to the fts5 unicode tokenizer code. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | fts5 |
Files: | files | file ages | folders |
SHA1: |
f5db489250029678fce845dfb2b1109f |
User & Date: | dan 2015-03-11 14:51:39.375 |
Context
2015-03-21
| ||
15:37 | When scanning the full-text index as part of the fts5 integrity-check, also run a point query for every term and verify that these results are consistent with those found by the linear scan. (check-in: ce972f6aab user: dan tags: fts5) | |
2015-03-11
| ||
14:51 | Add an optimization to the fts5 unicode tokenizer code. (check-in: f5db489250 user: dan tags: fts5) | |
2015-03-10
| ||
19:24 | Avoid redundant string comparisons while merging fts5 segment b-trees. (check-in: 5c46820d9b user: dan tags: fts5) | |
Changes
Changes to ext/fts5/fts5_tokenize.c.
︙ | ︙ | |||
400 401 402 403 404 405 406 407 408 409 410 411 412 413 | unsigned char *zTerm = (unsigned char*)&pText[nText]; unsigned char *zCsr = (unsigned char *)pText; /* Output buffer */ char *aFold = p->aFold; int nFold = p->nFold; /* Each iteration of this loop gobbles up a contiguous run of separators, ** then the next token. */ while( rc==SQLITE_OK ){ int iCode; /* non-ASCII codepoint read from input */ char *zOut = aFold; int is; | > | 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 | unsigned char *zTerm = (unsigned char*)&pText[nText]; unsigned char *zCsr = (unsigned char *)pText; /* Output buffer */ char *aFold = p->aFold; int nFold = p->nFold; const char *pEnd = &aFold[nFold-6]; /* Each iteration of this loop gobbles up a contiguous run of separators, ** then the next token. */ while( rc==SQLITE_OK ){ int iCode; /* non-ASCII codepoint read from input */ char *zOut = aFold; int is; |
︙ | ︙ | |||
435 436 437 438 439 440 441 | /* Run through the tokenchars. Fold them into the output buffer along ** the way. */ while( zCsr<zTerm ){ /* Grow the output buffer so that there is sufficient space to fit the ** largest possible utf-8 character. */ | | > | 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 | /* Run through the tokenchars. Fold them into the output buffer along ** the way. */ while( zCsr<zTerm ){ /* Grow the output buffer so that there is sufficient space to fit the ** largest possible utf-8 character. */ if( zOut>pEnd ){ aFold = sqlite3_malloc(nFold*2); if( aFold==0 ){ rc = SQLITE_NOMEM; goto tokenize_done; } zOut = &aFold[zOut - p->aFold]; memcpy(aFold, p->aFold, nFold); sqlite3_free(p->aFold); p->aFold = aFold; p->nFold = nFold = nFold*2; pEnd = &aFold[nFold-6]; } if( *zCsr & 0x80 ){ /* An non-ascii-range character. Fold it into the output buffer if ** it is a token character, or break out of the loop if it is not. */ READ_UTF8(zCsr, zTerm, iCode); if( fts5UnicodeIsAlnum(p,iCode)||sqlite3Fts5UnicodeIsdiacritic(iCode) ){ |
︙ | ︙ |