Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Modify the key encoding so that integer values use one less byte of space. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
af96bd359f96e882f19f7b6e0b21c680 |
User & Date: | drh 2012-02-23 17:56:30.812 |
Context
2012-03-01
| ||
14:47 | Minor changes so that the code builds on Mac. check-in: a03018e6b8 user: drh tags: trunk | |
2012-02-23
| ||
17:56 | Modify the key encoding so that integer values use one less byte of space. check-in: af96bd359f user: drh tags: trunk | |
2012-02-22
| ||
14:20 | Add the "kvdump" pragma for debugging. check-in: 4746d99cc1 user: drh tags: trunk | |
Changes
Changes to notes/key_encoding.txt.
︙ | ︙ | |||
33 34 35 36 37 38 39 | Each SQL value that is a NULL encodes as a single byte of 0x05. Since every other SQL value encoding begins with a byte greater than 0x05, this forces NULL values to sort first. TEXT ENCODING | | | | 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | Each SQL value that is a NULL encodes as a single byte of 0x05. Since every other SQL value encoding begins with a byte greater than 0x05, this forces NULL values to sort first. TEXT ENCODING Each SQL value that is TEXT begins with a single byte of 0x24 and ends with a single byte of 0x00. There are zero or more intervening bytes that encode the text value. The intervening bytes are chosen so that the encoding will sort in the desired collating order. The default sequence of bytes is simply UTF8. The intervening bytes may not contain a 0x00 character; the only 0x00 byte allowed in a text encoding is the final byte. The text encoding ends in 0x00 in order to ensure that when there are two strings where one is a prefix of the other that the shorter string will sort first. BINARY ENCODING Each SQL value that is BINARY begins with a single byte of 0x25 and ends with a single byte of 0x00. There are zero or more intervening bytes that encode the binary value. None of the intervening bytes may be zero. Each of the intervening bytes contains 7 bits of blob content with a 1 in the high-order bit (the 0x80 bit). The final byte before the 0x00 contains any left-over bits of the blob content. The initial byte of a binary value, 0x0f, is larger than the initial |
︙ | ︙ | |||
73 74 75 76 77 78 79 | If the numeric value is a negative infinity then the encoding is a single byte of 0x07. Since every other numeric value except NaN has a larger initial byte, this encoding ensures that negative infinity will sort prior to every other numeric value other than NaN. If the numeric value is a positive infinity then the encoding is a single | | | | < | < < < | | | | 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | If the numeric value is a negative infinity then the encoding is a single byte of 0x07. Since every other numeric value except NaN has a larger initial byte, this encoding ensures that negative infinity will sort prior to every other numeric value other than NaN. If the numeric value is a positive infinity then the encoding is a single byte of 0x23. Every other numeric value encoding begins with a smaller byte, ensuring that positive infinity always sorts last among numeric values. 0x0d is also smaller than 0x0e, the initial byte of a text value, ensuring that every numeric value sorts before every text value. If the numeric value is exactly zero then then is encoded is a single byte of 0x15. Finite negative values will have initial bytes of 0x08 through 0x14 and finite positive values will have initial bytes of 0x16 through 0x22. For all values, we compute a mantissa M and an exponent E. The mantissa is a base-100 representation of the value. The exponent E determines where to put the decimal point. Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is X (hence X>=0 and X<=99) then the byte value will be 2*X+1 for every byte of the mantissa, except for the last byte which will be 2*X+0. If we assume all digits of the mantissa occur to the right of the |
︙ | ︙ | |||
135 136 137 138 139 140 141 | 1234.5 3 19 45 64 12.345 2 19 45 64 0.123 0 19 3c 0.0123 0 03 2e 0.00123 -1 19 3c 9223372036854775807 10 13 2d 43 91 07 89 6d 9b 75 0e | | | | < < > > | | > | > | | | | > > > | | | | | > | | | > | | | | 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 | 1234.5 3 19 45 64 12.345 2 19 45 64 0.123 0 19 3c 0.0123 0 03 2e 0.00123 -1 19 3c 9223372036854775807 10 13 2d 43 91 07 89 6d 9b 75 0e Values are classified as large, medium, or small according to the value of E. If E is 11 or more, the value is large. For E between 0 and 10, the value is medium. For E less than zero, the value is small. Large positive values are encoded as a single byte 0x22 followed by E as a varint and then M. Medium positive values are a single byte of 0x17+E followed by M. Small positive values are encoded as a single byte 0x16 followed by the ones-complement of the varint for -E followed by M. Small negative values are encoded as a single byte 0x14 followed by -E as a varint and then the ones-complement of M. Medium negative values are encoded as a byte 0x13-E followed by the ones-complement of M. Large negative values consist of the single byte 0x08 followed by the ones-complement of the varint encoding of E followed by the ones-complement of M. SUMMARY Each SQL value is encoded as one or more bytes. The first byte of the encoding, its meaning, and a terse description of the bytes that follow is given by the following table: Content Type Encoding ------------------ ----------------- NULL 0x05 NaN 0x06 negative infinity 0x07 negative large 0x08 ~E ~M negative medium 0x13-E ~M negative small 0x14 -E ~M zero 0x15 positive small 0x16 ~-E M positive medium 0x17+E M positive large 0x22 E M positive infinity 0x23 text 0x24 T binary 0x25 B |
Changes to src/vdbecodec.c.
︙ | ︙ | |||
354 355 356 357 358 359 360 | ** is the number of digits in the integer, followed by the digits of ** the integer from most significant to least significant, packed to ** digits to a byte. Each digit is represented by a number between 1 ** and 10 with 1 representing 0 and 10 representing 9. A zero value ** marks the end of the significand. An extra zero is added to fill out ** the final byte, if necessary. */ | | > | > | < | > > > > > > | < | 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 | ** is the number of digits in the integer, followed by the digits of ** the integer from most significant to least significant, packed to ** digits to a byte. Each digit is represented by a number between 1 ** and 10 with 1 representing 0 and 10 representing 9. A zero value ** marks the end of the significand. An extra zero is added to fill out ** the final byte, if necessary. */ static int encodeIntKey(sqlite4_uint64 m, KeyEncoder *p){ int i = 0; int e; unsigned char aDigits[20]; assert( m>0 ); do{ aDigits[i++] = m%100; m /= 100; }while( m ); e = i; while( i ) p->aOut[p->nOut++] = aDigits[--i]*2 + 1; p->aOut[p->nOut-1] &= 0xfe; return e; } /* ** Encode a single integer using the key encoding. The caller must ** ensure that sufficient space exits in a[] (at least 12 bytes). ** The return value is the number of bytes of a[] used. */ int sqlite4VdbeEncodeIntKey(u8 *a, sqlite4_int64 v){ int i, e; KeyEncoder s; s.aOut = a; s.nOut = 1; if( v<0 ){ e = encodeIntKey((sqlite4_uint64)-v, &s); assert( e<=10 ); a[0] = 0x13-e; for(i=1; i<s.nOut; i++) a[i] ^= 0xff; }else if( v>0 ){ e = encodeIntKey((sqlite4_uint64)v, &s); assert( e<=10 ); a[0] = 0x17+e; }else{ a[0] = 0x15; } return s.nOut; } /* ** Encode the small positive floating point number r using the key ** encoding. The caller guarantees that r will be less than 1.0 and |
︙ | ︙ | |||
428 429 430 431 432 433 434 | ** ** The key encoding is the exponent E followed by the mantessa M. ** The exponent E is one less than the number of digits to the left ** of the decimal point. Since r is at least than 1.0, E will always ** be non-negative here. The mantissa is stored two-digits per byte ** as described for the integer encoding above. */ | | > | | > > | | | > > | | > | | | > | | > | | > | 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 | ** ** The key encoding is the exponent E followed by the mantessa M. ** The exponent E is one less than the number of digits to the left ** of the decimal point. Since r is at least than 1.0, E will always ** be non-negative here. The mantissa is stored two-digits per byte ** as described for the integer encoding above. */ static int encodeLargeFloatKey(double r, KeyEncoder *p){ int e = 0; int i, n; assert( r>=1.0 ); while( r>=1e32 && e<=350 ){ r *= 1e-32; e+=16; } while( r>=1e8 && e<=350 ){ r *= 1e-8; e+=4; } while( r>=100.0 && e<=350 ){ r *= 0.01; e++; } while( r<1.0 ){ r *= 10.0; e--; } if( e>10 ){ n = sqlite4PutVarint64(p->aOut+p->nOut, e); p->nOut += n; } for(i=0; i<18 && r!=0.0; i++){ int d = r; p->aOut[p->nOut++] = 2*d + 1; r -= d; r *= 100.0; } p->aOut[p->nOut-1] &= 0xfe; return e; } /* ** Encode a single column of the key */ static int encodeOneKeyValue( KeyEncoder *p, Mem *pMem, u8 sortOrder, CollSeq *pColl ){ int flags = pMem->flags; int i, e; int n; int iStart = p->nOut; if( flags & MEM_Null ){ if( enlargeEncoderAllocation(p, 1) ) return SQLITE_NOMEM; p->aOut[p->nOut++] = 0x05; /* NULL */ }else if( flags & MEM_Int ){ sqlite4_int64 v = pMem->u.i; if( enlargeEncoderAllocation(p, 11) ) return SQLITE_NOMEM; if( v==0 ){ p->aOut[p->nOut++] = 0x15; /* Numeric zero */ }else if( v<0 ){ p->aOut[p->nOut++] = 0x08; /* Large negative number */ i = p->nOut; e = encodeIntKey((sqlite4_uint64)-v, p); if( e<=10 ) p->aOut[i-1] = 0x13-e; while( i<p->nOut ) p->aOut[i++] ^= 0xff; }else{ i = p->nOut; p->aOut[p->nOut++] = 0x22; /* Large positive number */ e = encodeIntKey((sqlite4_uint64)v, p); if( e<=10 ) p->aOut[i] = 0x17+e; } }else if( flags & MEM_Real ){ double r = pMem->r; if( enlargeEncoderAllocation(p, 16) ) return SQLITE_NOMEM; if( r==0.0 ){ p->aOut[p->nOut++] = 0x15; /* Numeric zero */ }else if( sqlite4IsNaN(r) ){ p->aOut[p->nOut++] = 0x06; /* NaN */ }else if( (n = sqlite4IsInf(r))!=0 ){ p->aOut[p->nOut++] = n<0 ? 0x07 : 0x23; /* Neg and Pos infinity */ }else if( r<=-1.0 ){ p->aOut[p->nOut++] = 0x08; /* Large negative values */ i = p->nOut; e = encodeLargeFloatKey(-r, p); if( e<=10 ) p->aOut[i-1] = 0x13-e; while( i<p->nOut ) p->aOut[i++] ^= 0xff; }else if( r<0.0 ){ p->aOut[p->nOut++] = 0x14; /* Small negative values */ i = p->nOut; encodeSmallFloatKey(-r, p); while( i<p->nOut ) p->aOut[i++] ^= 0xff; }else if( r<1.0 ){ p->aOut[p->nOut++] = 0x16; /* Small positive values */ encodeSmallFloatKey(r, p); }else{ i = p->nOut; p->aOut[p->nOut++] = 0x22; /* Large positive values */ e = encodeLargeFloatKey(r, p); if( e<=10 ) p->aOut[i] = 0x17+e; } }else if( flags & MEM_Str ){ if( enlargeEncoderAllocation(p, pMem->n*4 + 2) ) return SQLITE_NOMEM; p->aOut[p->nOut++] = 0x0e; /* Text */ if( pColl==0 || pColl->xMkKey==0 ){ memcpy(p->aOut+p->nOut, pMem->z, pMem->n); |
︙ | ︙ | |||
624 625 626 627 628 629 630 | */ int sqlite4VdbeDecodeIntKey( const KVByteArray *aKey, /* Input encoding */ KVSize nKey, /* Number of bytes in aKey[] */ sqlite4_int64 *pVal /* Write the result here */ ){ int isNeg; | | | | > > | > < < < < | | 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 | */ int sqlite4VdbeDecodeIntKey( const KVByteArray *aKey, /* Input encoding */ KVSize nKey, /* Number of bytes in aKey[] */ sqlite4_int64 *pVal /* Write the result here */ ){ int isNeg; int e, x; int i, n; sqlite4_int64 m; KVByteArray aBuf[12]; if( nKey<2 ) return 0; if( nKey>sizeof(aBuf) ) nKey = sizeof(aBuf); x = aKey[0]; if( x>=0x09 && x<=0x13 ){ isNeg = 1; memcpy(aBuf, aKey, nKey); aKey = aBuf; for(i=1; i<nKey; i++) aBuf[i] ^= 0xff; e = 0x13-x; }else if( x>=0x17 && x<=0x21 ){ isNeg = 0; e = x-0x17; }else{ return 0; } m = 0; i = 1; do{ m = m*100 + aKey[i]/2; e--; }while( aKey[i++] & 1 ); if( isNeg ){ *pVal = -m; }else{ *pVal = m; } return m==0 ? 0 : i; } |
Changes to src/vdbecursor.c.
︙ | ︙ | |||
37 38 39 40 41 42 43 | KVSize nKey; KVSize nProbe; int rc; KVByteArray aProbe[16]; assert( iEnd==(+1) || iEnd==(-1) ); nProbe = sqlite4PutVarint64(aProbe, pC->iRoot); | | | 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | KVSize nKey; KVSize nProbe; int rc; KVByteArray aProbe[16]; assert( iEnd==(+1) || iEnd==(-1) ); nProbe = sqlite4PutVarint64(aProbe, pC->iRoot); aProbe[nProbe++] = 16 - iEnd*12; rc = sqlite4KVCursorSeek(pCur, aProbe, nProbe, iEnd); if( rc==SQLITE_OK ){ return SQLITE_CORRUPT; } if( rc==SQLITE_INEXACT ){ rc = sqlite4KVCursorKey(pCur, &aKey, &nKey); if( rc==SQLITE_OK && (nKey<nProbe-1 || memcmp(aKey, aProbe, nProbe-1)!=0) ){ |
︙ | ︙ |