- File www/key_encoding.wiki — part of check-in [4beefed2c9] at 2013-07-18 00:58:38 on branch trunk — Fix a typo in the "key format" documentation. (user: drh size: 9706)
This note describes how record keys are encoded. The encoding is designed such that memcmp() can be used to sort the keys into their proper order.
A key consists of a table number followed by a list of one or more SQL values. Each SQL value in the list has one of the following types: NULL, numeric, text, or binary. Keys are compared value by value, from left to right, until a difference is found. The first difference determines the key order.
The table number is a varint that identifies the table to which the key belongs. Table numbers always sort in ASCENDING order.
Each SQL value has a sort-order which is either ASCENDING or DESCENDING. The default and the usual case is ASCENDING.
To generate the encoding, the SQL values of the key are visited from left to right. Each SQL value generates one or more bytes that are appended to the encoding. If the SQL value is DESCENDING, then its encoding bytes are inverted (ones complement) prior to being appended. The complete key encoding is the concatenation of the individual SQL value encodings, in the same order as the SQL values.
Two key encodings are only compariable if they have the same number of SQL values and if corresponding SQL values have the same sort-order.
The first byte of a key past the table number will be in the range of 0x05..0x0f if ascending or 0xf0..0xfa if descending. This leaves large chunks of key space available for other uses. For example, the three-byte key 0x00 0x00 0x01 stores the schema cookie for the database as a 64-bit big-endian integer.
Each SQL value that is a NULL encodes as a single byte of 0x05. Since every other SQL value encoding begins with a byte greater than 0x05, this forces NULL values to sort first.
Each SQL value that is TEXT begins with a single byte of 0x24 and ends with a single byte of 0x00. There are zero or more intervening bytes that encode the text value. The intervening bytes are chosen so that the encoding will sort in the desired collating order. The default sequence of bytes is simply UTF8. The intervening bytes may not contain a 0x00 character; the only 0x00 byte allowed in a text encoding is the final byte.
Note that all key-encoded text with the BINARY collating sequence is simply UTF8 text. UTF8 not UTF16. Strings must be converted to UTF8 so that equivalent strings in different encodings compare the same and so that the strings contain no embedded 0x00 bytes. If a collating function is used, it needs to work like ucol_getSortKey() in the ICU library. In other words, strcmp() should be sufficient for comparing two text keys.
The text encoding ends in 0x00 in order to ensure that when there are two strings where one is a prefix of the other that the shorter string will sort first.
The encoding of binaries fields is different depending on whether or not the value to be encoded is the last value (the right-most value) in the key.
Each SQL value that is BINARY that is not the last value of the key begins with a single byte of 0x25 and ends with a single byte of 0x00. There are zero or more intervening bytes that encode the binary value. None of the intervening bytes may be zero. Each of the intervening bytes contains 7 bits of blob content with a 1 in the high-order bit (the 0x80 bit). The final byte before the 0x00 contains any left-over bits of the blob content.
When the very last value of a key is BINARY, then it is encoded as a single byte of 0x26 and is followed by a byte-for-byte copy of the BINARY value. This alternative encoding is more efficient, but it only works if there are no subsequent values in the key, since there is no termination mark on the BLOB being encoded.
The initial byte of a binary value, 0x25 or 0x26, is larger than the initial byte of a text value, 0x24, ensuring that every binary value will sort after every text value.
Numeric SQL values must be coded so as to sort in numeric order. We assume that numeric SQL values can be both integer and floating point values.
Simpliest cases first: If the numeric value is a NaN, then the encoding is a single byte of 0x06. This causes NaN values to sort prior to every other numeric value. The only value that is less than a NaN is a NULL.
If the numeric value is a negative infinity then the encoding is a single byte of 0x07. Since every other numeric value except NaN has a larger initial byte, this encoding ensures that negative infinity will sort prior to every other numeric value other than NaN.
If the numeric value is a positive infinity then the encoding is a single byte of 0x23. Every other numeric value encoding begins with a smaller byte, ensuring that positive infinity always sorts last among numeric values. 0x0d is also smaller than 0x0e, the initial byte of a text value, ensuring that every numeric value sorts before every text value.
If the numeric value is exactly zero then it is encoded as a single byte of 0x15. Finite negative values will have initial bytes of 0x08 through 0x14 and finite positive values will have initial bytes of 0x16 through 0x22.
For all values, we compute a mantissa M and an exponent E. The mantissa is a base-100 representation of the value. The exponent E determines where to put the decimal point.
Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is X (hence X>=0 and X<=99) then the byte value will be 2*X+1 for every byte of the mantissa, except for the last byte which will be 2*X+0. The mantissa must be the minimum number of bytes necessary to represent the value; trailing X==0 digits are omitted. This means that the mantissa will never contain a byte with the value 0x00.
If we assume all digits of the mantissa occur to the right of the decimal point, then the exponent E is the power of one hundred by which one must multiply the mantissa to recover the original value.
Value Exponent E Significand M (in hex) 1.0 1 02 10.0 1 14 99.0 1 c6 99.01 1 c7 02 99.0001 1 c7 01 02 100.0 2 02 100.01 2 03 01 02 100.1 2 03 01 14 1234 2 19 44 9999 2 c7 c6 9999.000001 2 c7 c7 01 01 02 9999.000009 2 c7 c7 01 01 12 9999.00001 2 c7 c7 01 01 14 9999.00009 2 c7 c7 01 01 b4 9999.000099 2 c7 c7 01 01 c6 9999.0001 2 c7 c7 01 02 9999.001 2 c7 c7 01 14 9999.01 2 c7 c7 02 9999.1 2 c7 c7 14 10000 3 02 10001 3 03 01 02 12345 3 03 2f 5a 123450 4 19 45 64 1234.5 3 19 45 64 12.345 2 19 45 64 0.123 0 19 3c 0.0123 0 03 2e 0.00123 -1 19 3c 9223372036854775807 10 13 2d 43 91 07 89 6d 9b 75 0e
Values are classified as large, medium, or small according to the value of E. If E is 11 or more, the value is large. For E between 0 and 10, the value is medium. For E less than zero, the value is small.
Large positive values are encoded as a single byte 0x22 followed by E as a varint and then M. Medium positive values are a single byte of 0x17+E followed by M. Small positive values are encoded as a single byte 0x16 followed by the ones-complement of the varint for -E followed by M.
Small negative values are encoded as a single byte 0x14 followed by -E as a varint and then the ones-complement of M. Medium negative values are encoded as a byte 0x13-E followed by the ones-complement of M. Large negative values consist of the single byte 0x08 followed by the ones-complement of the varint encoding of E followed by the ones-complement of M.
Each SQL value is encoded as one or more bytes. The first byte of the encoding, its meaning, and a terse description of the bytes that follow is given by the following table:
Content Type Encoding NULL 0x05 NaN 0x06 negative infinity 0x07 negative large 0x08, ~E, ~M negative medium 0x13-E, ~M negative small 0x14, -E, ~M zero 0x15 positive small 0x16, ~-E, M positive medium 0x17+E, M positive large 0x22, E, M positive infinity 0x23 text 0x24, T binary 0x25, B final binary 0x26, X