Documentation Source Text

Check-in [149d28ec0e]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Enhance the file format document to make it clear that records might contain fewer values than there are columns in the table schema.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 149d28ec0eced12a534522685043981b8680941d
User & Date: drh 2016-06-20 10:51:27
Context
2016-06-20
11:12
Additional clarification of the file format. check-in: df46867c0f user: drh tags: trunk
10:51
Enhance the file format document to make it clear that records might contain fewer values than there are columns in the table schema. check-in: 149d28ec0e user: drh tags: trunk
2016-06-04
17:25
Fix an error in fts5.html. check-in: 627d913e58 user: dan tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fileformat2.in.

755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
...
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
...
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965










966
967
968
969
970
971
972
...
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
<dd><p>
^(Let X be U-35.  If the payload size P is less than or equal to X then
the entire payload is stored on the b-tree leaf page.)^
^(Let M be ((U-12)*32/255)-23 and let K be M+((P-M)%(U-4)).
If P is greater than X
then the number of bytes stored on the table b-tree leaf page is K
if K is less or equal to X or M otherwise.)^
^(Note that number of bytes stored on the leaf page is never less than M.)^
</p></dd>

<dt><p>Table B-Tree Interior Cell:</dt>
<dd><p>
Interior pages of table b-trees have no payload and so there is never
any payload to spill.
</p></dd>
................................................................................
<dd><p>
^(Let X be ((U-12)*64/255)-23).  If the payload size P is less than
or equal to X then the entire payload is stored on the b-tree page.)^
^(Let M be ((U-12)*32/255)-23 and let K be M+((P-M)%(U-4)).
If P is greater than X then the number
of bytes stored on the index b-tree page is K if K is less than or
equal to X or M otherwise.)^
^(Note that number of bytes stored on the index page is never less than M.)^
</p></dd>
</dl>

<p>Here is an alternative description of the same computation:

<ul>
<li>X is U-35 for table btree leaf pages or
................................................................................
^Value is a BLOB that is (N-12)/2 bytes in length.
<tr><td valign=top align=center>N&#x2265;13 and odd
    <td valign=top align=center>(N-13)/2<td align=left>
^Value is a string in the [text encoding] and (N-13)/2 bytes in length.
^The nul terminator is not stored.
</table></center>

<p>Note that because of the way varints are defined, the header size varint
and serial type varints will usually consist of a single byte.  The
serial type varints for large strings and BLOBs might extend to two or three
byte varints, but that is the exception rather than the rule. 
The varint format is very efficient at coding the record header.</p>

<p>^The values for each column in the record immediately follow the header.
^(Note that for serial types 0, 8, 9, 12, and 13, the value is zero bytes in
length.  If all columns are of these types then the body section of the
record is empty.)^</p>











<h2>Record Sort Order</h2>

<p>The order of keys in an index b-tree is determined by the sort order of
the records that the keys represent.  Record comparison progresses column
by column.  Columns of a record are examined from left to right.  The
first pair of columns that are not equal determines the relative order
................................................................................
    <td> ^(The built-in BINARY collation compares strings byte by byte
        using the memcmp() function
        from the standard C library.)^
<tr><td valign=top>NOCASE
    <td> ^(The NOCASE collation is like BINARY except that uppercase
        ASCII characters ('A' through 'Z')
        are folded into their lowercase equivalents prior to running the
        comparison.  Note that only ASCII characters are case-folded.)^
        ^(NOCASE
        does not implement a general purpose unicode caseless comparison.)^
<tr><td valign=top>RTRIM
    <td> ^(RTRIM is like BINARY except that extra spaces at the end of either
         string do not change the result.  In other words, strings will
         compare equal to one another as long as they
         differ only in the number of spaces at the end.)^







|







 







|







 







|






|


>
>
>
>
>
>
>
>
>
>







 







|







755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
...
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
...
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
....
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
<dd><p>
^(Let X be U-35.  If the payload size P is less than or equal to X then
the entire payload is stored on the b-tree leaf page.)^
^(Let M be ((U-12)*32/255)-23 and let K be M+((P-M)%(U-4)).
If P is greater than X
then the number of bytes stored on the table b-tree leaf page is K
if K is less or equal to X or M otherwise.)^
^(The number of bytes stored on the leaf page is never less than M.)^
</p></dd>

<dt><p>Table B-Tree Interior Cell:</dt>
<dd><p>
Interior pages of table b-trees have no payload and so there is never
any payload to spill.
</p></dd>
................................................................................
<dd><p>
^(Let X be ((U-12)*64/255)-23).  If the payload size P is less than
or equal to X then the entire payload is stored on the b-tree page.)^
^(Let M be ((U-12)*32/255)-23 and let K be M+((P-M)%(U-4)).
If P is greater than X then the number
of bytes stored on the index b-tree page is K if K is less than or
equal to X or M otherwise.)^
^(The number of bytes stored on the index page is never less than M.)^
</p></dd>
</dl>

<p>Here is an alternative description of the same computation:

<ul>
<li>X is U-35 for table btree leaf pages or
................................................................................
^Value is a BLOB that is (N-12)/2 bytes in length.
<tr><td valign=top align=center>N&#x2265;13 and odd
    <td valign=top align=center>(N-13)/2<td align=left>
^Value is a string in the [text encoding] and (N-13)/2 bytes in length.
^The nul terminator is not stored.
</table></center>

<p>The header size varint
and serial type varints will usually consist of a single byte.  The
serial type varints for large strings and BLOBs might extend to two or three
byte varints, but that is the exception rather than the rule. 
The varint format is very efficient at coding the record header.</p>

<p>^The values for each column in the record immediately follow the header.
^(For serial types 0, 8, 9, 12, and 13, the value is zero bytes in
length.  If all columns are of these types then the body section of the
record is empty.)^</p>

<p>^A record might have fewer values than the number of columns in the
corresponding table.  This can happen, for example, after an
[ALTER TABLE|ALTER TABLE ... ADD COLUMN] SQL statement has increased
the number of columns in the table schema without modifying preexisting rows
in the table.
^Missing values at the end of the record are filled in using the
[default value] for the corresponding columns defined in the table schema.
</p>


<h2>Record Sort Order</h2>

<p>The order of keys in an index b-tree is determined by the sort order of
the records that the keys represent.  Record comparison progresses column
by column.  Columns of a record are examined from left to right.  The
first pair of columns that are not equal determines the relative order
................................................................................
    <td> ^(The built-in BINARY collation compares strings byte by byte
        using the memcmp() function
        from the standard C library.)^
<tr><td valign=top>NOCASE
    <td> ^(The NOCASE collation is like BINARY except that uppercase
        ASCII characters ('A' through 'Z')
        are folded into their lowercase equivalents prior to running the
        comparison.  Only ASCII characters are case-folded.)^
        ^(NOCASE
        does not implement a general purpose unicode caseless comparison.)^
<tr><td valign=top>RTRIM
    <td> ^(RTRIM is like BINARY except that extra spaces at the end of either
         string do not change the result.  In other words, strings will
         compare equal to one another as long as they
         differ only in the number of spaces at the end.)^