Documentation Source Text

Check-in [8925c8c2e1]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Initial identification of requirements in the fileformat2.html document.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 8925c8c2e1ab502eb1f261ff0717146b286a6012
User & Date: drh 2010-08-12 17:55:32
Context
2010-08-13
17:44
Enhancements to the description of how the COLLATE operator works. check-in: 17093bc7d6 user: drh tags: trunk
2010-08-12
17:55
Initial identification of requirements in the fileformat2.html document. check-in: 8925c8c2e1 user: drh tags: trunk
16:26
Work on the queryplanner.html document. check-in: e8152063fb user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/fileformat2.in.

    31     31   scenario and so are uncommon, but they are part of the state of an SQLite
    32     32   database and so should not be ignored.  This document will define the format
    33     33   of a rollback journal, but most of the content of this document will focus
    34     34   on the main database file.</p>
    35     35   
    36     36   <h3>1.1 Pages</h3>
    37     37   
    38         -<p>The main database file consists of one or more pages.  The size of a
    39         -page is a power of two between 512 and 32768 inclusive.  All pages within
    40         -the same database are the same size.  The page size for a database file
           38  +<p>The main database file consists of one or more pages.  ^The size of a
           39  +page is a power of two between 512 and 65536 inclusive.  All pages within
           40  +the same database are the same size.  ^The page size for a database file
    41     41   is determined by the 2-byte integer located at an offset of
    42     42   16 bytes from the beginning of the database file.</p>
    43     43   
    44     44   <p>Pages are numbered beginning with 1.  The maximum page number is
    45     45   2147483646 (2<sup><small>31</small></sup> - 2).  The minimum size
    46     46   SQLite database is a single 512-byte page.
    47         -The maximum size database would be 2147483646 pages at 32768 bytes per
    48         -page or 70,368,744,112,128 bytes (about 70 terabytes).  Usually SQLite will
           47  +The maximum size database would be 2147483646 pages at 65536 bytes per
           48  +page or 140,737,488,224,256 bytes (about 140 terabytes).  Usually SQLite will
    49     49   hit the maximum file size limit of the underlying filesystem or disk
    50     50   hardware size limit long before it hits its own internal size limit.</p>
    51     51   
    52     52   <p>In common use, SQLite databases tend to range in size from a few kilobytes
    53     53   to a few gigabytes.</p>
    54     54   
    55     55   <p>At any point in time, every page in the main database has a single
................................................................................
    69     69   <li>An index b-tree leaf page
    70     70   </ul>
    71     71   <li>A payload overflow page
    72     72   <li>A pointer map page
    73     73   </ul>
    74     74   </p>
    75     75   
    76         -<p>All reads from and writes to the main database file begin at a page
    77         -boundary and all writes are an integer number of pages in size.  Reads
           76  +<p>^All reads from and writes to the main database file begin at a page
           77  +boundary and all writes are an integer number of pages in size.  ^Reads
    78     78   are also usually an integer number of pages in size, with the one exception
    79     79   that when the database is first opened, the first 100 bytes of the
    80     80   database file (the database file header) are read as a sub-page size unit.</p>
    81     81   
    82         -<p>Before any information-bearing page of the database is modified, 
           82  +<p>^Before any information-bearing page of the database is modified, 
    83     83   the original unmodified content of that page is written into the 
    84     84   rollback journal.  If a transaction is interrupted and needs to be 
    85     85   rolled back, the rollback journal can then be used to restore the
    86         -database to its original state.  Freelist leaf pages bear no
           86  +database to its original state.  ^Freelist leaf pages bear no
    87     87   information that would need to be restored on a rollback and so they
    88     88   are not written to the journal prior to modification, in order to
    89     89   reduce disk I/O.</p>
    90     90   
    91     91   <tcl>hd_fragment database_header {database header}</tcl>
    92     92   <h3>1.2 The Database Header</h3>
    93     93   
................................................................................
   147    147   The [version-valid-for number].
   148    148   <tr><td valign=top align=center>96<td valign=top align=center>4<td align=left>
   149    149   [SQLITE_VERSION_NUMBER]
   150    150   </table></center>
   151    151   
   152    152   <h4>1.2.1 Magic Header String</h4>
   153    153   
   154         -<p>Every SQLite database file begins with the following 16 bytes (in hex):
   155         -53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00.  This byte sequence
          154  +<p>^Every valid SQLite database file begins with the following 16 bytes 
          155  +(in hex): 53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00.  This byte sequence
   156    156   corresponds to the UTF-8 string "SQLite format 3" including the nul
   157    157   terminator character at the end.</p>
   158    158   
   159    159   <h4>1.2.2 Page Size</h4>
   160    160   
   161    161   <p>The two-byte value beginning at offset 16 determines the page size of 
   162    162   the database.  For SQLite versions 3.7.0.1 and earlier, this value is 
................................................................................
   164    164   512 and 32768, inclusive.  Beginning with SQLite version 3.7.1, a page
   165    165   size of 65536 bytes is supported.  The value 65536 will not fit in a
   166    166   two-byte integer, so to specify a 65536-byte page size, the value is
   167    167   at offset 16 is 0x00 0x01.
   168    168   This value can be interpreted as a big-endian
   169    169   1 and thought of is as a magic number to represent the 65536 page size.
   170    170   Or one can view the two-byte field as a little endian number and say
   171         -that it represents the page size divided by 256.</p>
          171  +that it represents the page size divided by 256.  These two 
          172  +interpretations of the page-size field are equivelent.</p>
   172    173   
   173    174   <h4>1.2.3 File format version numbers</h4>
   174    175   
   175    176   <p>The file format write version and file format read version at offsets
   176    177   18 and 19 are intended to allow for enhancements of the file format
   177    178   in future versions of SQLite.  In current versions of SQLite, both of
   178    179   these values are 1 for rollback journalling modes and 2 for [WAL]
................................................................................
   183    184   greater than 2 is encounter, then that database cannot be read or written.</p>
   184    185   
   185    186   <h4>1.2.4 Reserved bytes per page</h4>
   186    187   
   187    188   <p>SQLite has the ability to set aside a small number of extra bytes at
   188    189   the end of every page for use by extensions.  These extra bytes are
   189    190   used, for example, by the SQLite Encryption Extension to store a nonce
   190         -and/or cryptographic checksum associated with each page.  The 
          191  +and/or cryptographic checksum associated with each page.  ^The 
   191    192   "reserved space" size in the 1-byte integer at offset 20 is the number
   192    193   of bytes of space at the end of each page to reserve for extensions.
   193    194   This value is usually 0.  The value can be odd.</p>
   194    195   
   195    196   <tcl>hd_fragment usable_size {usable size}</tcl>
   196    197   <p>The "usable size" of a database page is the page size specify by the
   197    198   2-byte integer at offset 16 in the header less the "reserved" space size
   198    199   recorded in the 1-byte integer at offset 20 in the header.  The usable
   199         -size of a page might be an odd number.  However, the usable size is not
          200  +size of a page might be an odd number.  ^(However, the usable size is not
   200    201   allowed to be less than 480.  In other words, if the page size is 512,
   201         -then the reserved space size cannot exceed 32.</p>
          202  +then the reserved space size cannot exceed 32.)^</p>
   202    203   
   203    204   <h4>1.2.5 Payload fractions</h4>
   204    205   
   205         -<p>The maximum and minimum embedded payload fractions and the leaf
          206  +<p>^The maximum and minimum embedded payload fractions and the leaf
   206    207   payload fraction values must be 64, 32, and 32.  These values were
   207    208   originally intended to as tunable parameters that could be used to
   208    209   modify the storage format of the b-tree algorithm.  However, that
   209    210   functionality is not supported and there are no current plans to add
   210    211   support in the future.  Hence, these three bytes are fixed at the
   211    212   values specified.</p>
   212    213   
   213    214   <h4>1.2.6 File change counter</h4>
   214    215   
   215    216   <tcl>hd_fragment chngctr {change counter}</tcl>
   216         -<p>The file change counter is a 4-byte big-endian integer which is
   217         -incremented whenever the database file is changed in rollback mode.  
          217  +<p>^The file change counter is a 4-byte big-endian integer which is
          218  +incremented whenever the database file is unlocked after having
          219  +been modified.
   218    220   When two or more processes are reading the same database file, each 
   219    221   process can detect database changes from other processes by monitoring 
   220    222   the change counter.
   221    223   A process will normally want to flush its database page cache when
   222    224   another process modified the database, since the cache has become stale.
   223    225   The file change counter facilitates this.</p>
   224    226   
................................................................................
   225    227   <p>In WAL mode, changes to the database are detected using the wal-index
   226    228   and so the change counter is not needed.  Hence, the change counter might
   227    229   not be incremented on each transaction in WAL mode.</p>
   228    230   
   229    231   <h4>1.2.7 In-header database size</h4>
   230    232   
   231    233   <tcl>hd_fragment filesize {in-header database size}</tcl>
   232         -<p>The 4-byte big-endian integer at offset 28 into the header 
   233         -stores the size of the database file in pages.  If this in-header
          234  +<p>^The 4-byte big-endian integer at offset 28 into the header 
          235  +stores the size of the database file in pages.  ^If this in-header
   234    236   datasize size is not valid (see the next paragraph), then the database 
   235    237   size is computed by looking
   236    238   at the actual size of the database file. Older versions of SQLite
   237    239   ignored the in-header database size and used the actual file size
   238         -exclusively.  Newer versions of SQLite use the in-header database
          240  +exclusively.  ^Newer versions of SQLite use the in-header database
   239    241   size if it is available but fall back to the actual file size if
   240    242   the in-header database size is not valid.</p>
   241    243   
   242         -<p>The in-header database size is only considered to be valid if
          244  +<p>^The in-header database size is only considered to be valid if
   243    245   it is non-zero and if the 4-byte [change counter] at offset 24
   244    246   exactly matches the 4-byte [version-valid-for number] at offset 92.
   245         -The in-header database size should always be valid 
          247  +^The in-header database size is always valid 
   246    248   when the database is only modified using recent versions of SQLite
   247    249   (versions 3.7.0 and later).
   248    250   If a legacy version of SQLite writes to the database, it will not
   249    251   know to update the in-header database size and so the in-header
   250    252   database size could be incorrect.  But legacy versions of SQLite
   251    253   will also leave the version-valid-for number at offset 92 unchanged
   252    254   so it will not match the change-counter.  Hence, invalid in-header
   253    255   database sizes can be detected (and ignored) by observing when
   254    256   the change-counter does not match the version-valid-for number.</p>
   255    257   
   256    258   <h4>1.2.8 Free page list</h4>
   257    259   
   258         -<p>Unused pages in the database file are stored on a freelist.  The
          260  +<p>Unused pages in the database file are stored on a freelist.  ^The
   259    261   4-byte big-endian integer at offset 32 stores the page number of
   260    262   the first page of the freelist, or zero if the freelist is empty.
   261         -The 4-byte big-endian integer at offset 36 stores stores the total 
          263  +^The 4-byte big-endian integer at offset 36 stores stores the total 
   262    264   number of pages on the freelist.</p>
   263    265   
   264    266   <h4>1.2.9 Schema cookie</h4>
   265    267   
   266         -<p>The schema cookie is a 4-byte big-endian integer at offset 40
          268  +<p>^The schema cookie is a 4-byte big-endian integer at offset 40
   267    269   that is incremented whenever the database schema changes.  A 
   268    270   prepared statement is compiled against a specific version of the
   269         -database schema.  Whenever the database schema changes, the statement
   270         -must be reprepared.  Whenever a prepared statement runs, it first checks
          271  +database schema.  ^Whenever the database schema changes, the statement
          272  +must be reprepared.  ^Whenever a prepared statement runs, it first checks
   271    273   the schema cookie to make sure the value is the same as when the statement
   272         -was prepared and if not it aborts to force the statement to be reprepared.</p>
          274  +was prepared and if the schema cookie has changed, the statement aborts
          275  +in order to force the statement to be reprepared.</p>
   273    276   
   274    277   <h4>1.2.10 Schema format number</h4>
   275    278   
   276    279   <p>The schema format number is a 4-byte big-endian integer at offset 44.
   277    280   The schema format number is similar to the file format read and write
   278    281   version numbers at offsets 18 and 19 except that the schema format number
   279    282   refers to the high-level SQL formatting rather than the low-level b-tree
................................................................................
   287    290   [ALTER TABLE | ALTER TABLE ... ADD COLUMN] functionality.  Support for
   288    291   reading and writing format 2 was added in SQLite version 3.1.3 
   289    292   on 2005-02-19.</li>
   290    293   <li value=3>Format 3 adds the ability of extra columns added by
   291    294   [ALTER TABLE | ALTER TABLE ... ADD COLUMN] to have non-NULL default
   292    295   values.  This capability was added in SQLite version 3.1.4 
   293    296   on 2005-03-11.</li>
   294         -<li value=4>Format 4 causes SQLite to respect the DESC keyword on
   295         -index declarations.  (The DESC keyword is ignored in indices for 
          297  +<li value=4>^Format 4 causes SQLite to respect the DESC keyword on
          298  +index declarations.  (^The DESC keyword is ignored in indices for 
   296    299   formats 1, 2, and 3.)
   297         -Format 4 also adds two new boolean record type values ([serial types]
          300  +^Format 4 also adds two new boolean record type values ([serial types]
   298    301   8 and 9.)  Support for format 4 was added in SQLite 3.3.0 on
   299    302   2006-01-10.</li>
   300    303   </ol>
   301    304   
   302         -<p>New database files created by SQLite use format 1 by default, so
          305  +<p>^New database files created by SQLite use format 1 by default, so
   303    306   that database files created by newer versions of SQLite can still
   304    307   be read by older versions of SQLite.
   305         -The [legacy_file_format pragma] can be used to cause SQLite
          308  +^The [legacy_file_format pragma] can be used to cause SQLite
   306    309   to create new database files using format 4.  Future versions of 
   307    310   SQLite may begin to create files using format 4 by default.</p>
   308    311   
   309    312   <h4>1.2.11 Suggested cache size</h4>
   310    313   
   311    314   <p>The 4-byte big-endian signed integer at offset 48 is the suggest
   312    315   cache size in pages for the database file.  The value is a suggestion
................................................................................
   313    316   only and SQLite is under no obligation to honor it.  The absolute value
   314    317   of the integer is used as the suggested size.  The suggested cache size
   315    318   can be set using the [default_cache_size pragma].</p>
   316    319   
   317    320   <h4>1.2.12 Incremental vacuum settings</h4>
   318    321   
   319    322   <p>The two 4-byte big-endian integers at offsets 52 and 64 are used
   320         -to manage the [auto_vacuum] and [incremental_vacuum] modes.  If
          323  +to manage the [auto_vacuum] and [incremental_vacuum] modes.  ^If
   321    324   the integer at offset 52 is zero then pointer-map (ptrmap) pages are
   322    325   omitted from the database file and neither auto_vacuum nor
   323         -incremental_vacuum are supported.  If the integer at offset 52 is
          326  +incremental_vacuum are supported.  ^If the integer at offset 52 is
   324    327   non-zero then it is the page number of the largest root page in the
   325    328   database file, the database file contain ptrmap pages, and the
   326         -mode must be either auto_vacuum or incremental_vacuum.  In this latter
          329  +mode must be either auto_vacuum or incremental_vacuum.  ^In this latter
   327    330   case, the integer at offset 64 is true for incremental_vacuum and
   328         -false for auto_vacuum.  If the integer at offset 52 is zero then
          331  +false for auto_vacuum.  ^If the integer at offset 52 is zero then
   329    332   the integer at offset 64 must also be zero.</p>
   330    333   
   331    334   <h4>1.2.13 Text encoding</h4>
   332    335   
   333         -<p>The 4-byte big-endian integer at offset 56 determines the encoding
   334         -used for all text strings stored in the database.  A value of 1 means
   335         -UTF-8.  A value of 2 means UTF-16le.  A value of 3 means UTF-16be.
          336  +<p>^The 4-byte big-endian integer at offset 56 determines the encoding
          337  +used for all text strings stored in the database.  ^A value of 1 means
          338  +UTF-8.  ^A value of 2 means UTF-16le.  ^A value of 3 means UTF-16be.
   336    339   No other values are allowed.</p>
   337    340   
   338    341   <h4>1.2.14 User version number</h4>
   339    342   
   340         -<p>The 4-byte big-endian integer at offset 60 is the user version which
          343  +<p>^The 4-byte big-endian integer at offset 60 is the user version which
   341    344   is set and queried by the [user_version pragma].  The user version is
   342    345   not used by SQLite.</p>
   343    346   
   344    347   <tcl>hd_fragment validfor {version-valid-for number}</tcl>
   345    348   <h4>1.2.15 Write library version number and version-valid-for number</h4>
   346    349   
   347         -<p>The 4-byte big-endian integer at offset 96 stores the 
   348         -[SQLITE_VERSION_NUMBER] value.  The 4-byte big-ending integer at
          350  +<p>^The 4-byte big-endian integer at offset 96 stores the 
          351  +[SQLITE_VERSION_NUMBER] value for the SQLite library that most
          352  +recently modified the database file.  ^The 4-byte big-ending integer at
   349    353   offset 92 is the value of the [change counter] when the version number
   350    354   was stored.  The integer at offset 92 indicates which transaction
   351    355   the version number is valid for and is sometimes called the
   352    356   "version-valid-for number".
   353    357   
   354    358   <h4>1.2.16 Header space reserved for expansion</h4>
   355    359   
................................................................................
   363    367   inclusive.  A database file that is less than or equal to 1073741824 bytes 
   364    368   in size contains no lock-byte page.  A database file larger than
   365    369   1073741824 contains exactly one lock-byte page.
   366    370   </p>
   367    371   
   368    372   <p>The lock-byte page is set aside for use by the operating-system specific
   369    373   VFS implementation in implementing the database file locking primitives.
   370         -SQLite will not use the lock-byte page; it will never be read or written
          374  +^SQLite will not use the lock-byte page; it will never be read or written
   371    375   by the SQLite core, though operating-system specific VFS implementations may
   372    376   choose to read or write bytes on the lock-byte page according to the 
   373    377   needs and proclivities of the underlying system.  The unix and win32
   374    378   VFS implementations that come built into SQLite do not write to the
   375    379   lock-byte page, but we are aware of third-party VFS implementations for
   376    380   other operating systems that do sometimes write to the lock-byte page.</p>
   377    381   
................................................................................
   385    389   <p>The freelist is organized as a linked list of freelist trunk pages
   386    390   with each trunk pages containing page numbers for zero or more freelist
   387    391   leaf pages.</p>
   388    392   
   389    393   <p>A freelist trunk page consists of an array of 4-byte big-endian integers.
   390    394   The size of the array is as many integers as will fit in the usable space
   391    395   of a page.  The minimum usable space is 480 bytes so the array will always
   392         -be at least 120 entries in length.  The first integer in the array 
          396  +be at least 120 entries in length.  ^The first integer in the array 
   393    397   is the page number of the next freelist trunk page in the list or zero 
   394         -if this is the last freelist trunk page.  The second integer in the array
          398  +if this is the last freelist trunk page.  ^The second integer in the array
   395    399   is the number of leaf page pointers to follow.  Call the second integer L.
   396         -If L is greater than zero then integers with array indexes between 2 and
          400  +^If L is greater than zero then integers with array indexes between 2 and
   397    401   L+1 inclusive contain page numbers for freelist leaf pages.</p>
   398    402   
   399         -<p>Freelist leaf pages contain no information.  SQLite avoid reading or
          403  +<p>Freelist leaf pages contain no information.  ^SQLite avoid reading or
   400    404   writing freelist leaf pages in order to reduce disk I/O.</p>
   401    405   
   402    406   <p>A bug in SQLite versions prior to 3.6.0 caused the database to be
   403    407   reported as corrupt if any last 6 entries in the freelist trunk page 
   404    408   array contained non-zero values.  Newer versions of SQLite do not have
   405         -this problem.  However, newer versions of SQLite still avoid using the 
          409  +this problem.  ^However, newer versions of SQLite still avoid using the 
   406    410   last six entries in the freelist trunk page array in order that database
   407    411   files created by newer versions of SQLite can be read by older versions
   408    412   of SQLite.</p>
   409    413   
   410         -<p>The number of freelist pages is stored as a 4-byte big-endian integer
          414  +<p>^The number of freelist pages is stored as a 4-byte big-endian integer
   411    415   in the database header at an offset of 36 from the beginning of the file.
   412         -The database header also stores the page number of the first freelist trunk
          416  +^The database header also stores the page number of the first freelist trunk
   413    417   page as a 4-byte big-endian integer at an offset of 32 from the beginning
   414    418   of the file.</p>
   415    419   
   416    420   <h3>1.5 B-tree Pages</h3>
   417    421   
   418    422   <p>A b-tree page is either an interior page or a leaf page.
   419    423   A leaf page contains keys and in the case of a table b-tree each
................................................................................
   420    424   key has associated content.  An interior page contains
   421    425   K keys without content but with K+1 pointers to child b-tree pages.
   422    426   A "pointer" in an interior b-tree page is just the 31-bit integer
   423    427   page number of the child page.</p>
   424    428   
   425    429   
   426    430   <p>Define the depth
   427         -of a leaf b-tree to be 1 and that the depth of any interior b-tree to be one
   428         -more than the maximum depth of any of its children.  In a well-formed
          431  +of a leaf b-tree to be 1 and the the depth of any interior b-tree to be one
          432  +more than the maximum depth of any of its children.  ^In a well-formed
   429    433   database, all children of any one interior b-tree have the same depth.</p>
   430    434   
   431    435   <p>In an interior b-tree page, the pointers and keys logically alternate 
   432    436   with a pointer on both ends. (The previous sentence is to be understood
   433    437   conceptually - the actual layout of the keys and
   434    438   pointers within the page is more complicated and will be described in
   435         -the sequel.)  All keys within the same page are unique and are organized
   436         -in ascending order from left to right.  For any key X, pointers to the left
          439  +the sequel.)  All keys within the same page are unique and are logically
          440  +organized in ascending order from left to right.  (Again, this ordering
          441  +is logical, not physical.  The actual location of keys within the page
          442  +is arbitrary.) ^For any key X, pointers to the left
   437    443   of a X refer to b-tree pages on which alls keys are less than or equal to X.
   438         -Pointers to the right of X refer to pages where all keys are greater than X.</p>
          444  +^Pointers to the right of X refer to pages where all keys are 
          445  +greater than X.</p>
   439    446   
   440    447   <p>Within an interior b-tree page, each key and the pointer to its
   441    448   immediate left are combined into a structure called a "cell".  The
   442    449   right-most pointer is held separately.  A leaf b-tree page has no
   443    450   pointers, but it still uses the cell structure to hold keys for
   444    451   index b-trees or keys and content for table b-trees.</p>
   445    452   </p>
................................................................................
   498    505   
   499    506   <p>The 100-byte database file header is found only on page 1, which is
   500    507   always a table b-tree page.  All other b-tree pages in the database file
   501    508   omit this 100-byte header.</p>
   502    509   
   503    510   <p>The reserved region is an area of unused space at the end of every
   504    511   page (except the locking page) that extensions can use to hold per-page
   505         -information.  The size of the reserved region is determined by the one-byte
          512  +information.  ^The size of the reserved region is determined by the one-byte
   506    513   unsigned integer found at an offset of 20 into the database file header.
   507    514   The size of the reserved region is usually zero.</p>
   508    515   
   509    516   <p>The b-tree page header is 8 bytes in size for leaf pages and 12
   510    517   bytes for interior pages.  All multibyte values in the page header
   511    518   are big-endian.
   512    519   The b-tree page header is composed of the following fields:</p>
................................................................................
   513    520   
   514    521   <center>
   515    522   <i>B-tree Page Header Format</i><br>
   516    523   <table border=1 width="80%">
   517    524   <tr><th>Offset<th>Size<th>Description
   518    525   <tr><td align=center valign=top>0<td align=center valign=top>1<td align=left>
   519    526   A flag indicating the b-tree page type
   520         -A value of 2 means the page is an interior index b-tree page.
   521         -A value of 5 means the page is an interior table b-tree page.
   522         -A value of 10 means the page is a leaf index b-tree page.
   523         -A value of 13 means the page is a leaf table b-tree page.
   524         -Any other value is an error.
          527  +^A value of 2 means the page is an interior index b-tree page.
          528  +^A value of 5 means the page is an interior table b-tree page.
          529  +^A value of 10 means the page is a leaf index b-tree page.
          530  +^A value of 13 means the page is a leaf table b-tree page.
          531  +^Any other value for the b-tree page type is an error.
   525    532   <tr><td align=center valign=top>1<td align=center valign=top>2<td align=left>
   526    533   Byte offset into the page of the first freeblock
   527    534   <tr><td align=center valign=top>3<td align=center valign=top>2<td align=left>
   528    535   Number of cells on this page
   529    536   <tr><td align=center valign=top>5<td align=center valign=top>2<td align=left>
   530    537   Offset to the first byte of the cell content area.  A zero value is used to represent an offset of 65536, which occurs on an empty root page when using a 65536-byte page size.
   531    538   <tr><td align=center valign=top>7<td align=center valign=top>1<td align=left>
   532    539   Number of fragmented free bytes within the cell content area
   533    540   <tr><td align=center valign=top>8<td align=center valign=top>4<td align=left>
   534    541   The right-most pointer (interior b-tree pages only)
   535    542   </table></blockquote></center>
   536    543   
   537         -<p>The cell pointer array of a b-tree page immediately follows the b-tree
   538         -page header.  Let K be the number of cells on the btree.  The cell pointer
   539         -array consists of K 2-byte integer offsets to the cell contents.  The
          544  +<p>^The cell pointer array of a b-tree page immediately follows the b-tree
          545  +page header.  Let K be the number of cells on the btree.  ^The cell pointer
          546  +array consists of K 2-byte integer offsets to the cell contents.  ^The
   540    547   cell pointers are arranged in key order with left-most cell (the cell with the
   541    548   smallest key) first and the right-most cell (the cell with the largest
   542    549   key) last.</p>
   543    550   
   544    551   <p>Cell content is stored in the cell content region of the b-tree page.
   545    552   SQLite strives to place cells as far toward the end of the b-tree page as
   546    553   it can, in order to leave space for future growth of the cell pointer array.
   547    554   The area in between the last cell pointer array entry and the beginning of
   548    555   the first cell is the unallocated region.
   549    556   </p>
   550    557   
   551         -<p>If a page contains no cells (which is only possible for a root page
          558  +<p>^If a page contains no cells (which is only possible for a root page
   552    559   of a table that contains no rows) then the offset to the cell content
   553         -area will equal the page size minus the bytes of reserved space.  If
          560  +area will equal the page size minus the bytes of reserved space.  ^(If
   554    561   the database uses a 65536-byte page size and the reserved space is zero
   555         -(the usual value for reserved space) then the cell content offset would
   556         -want to be 65536.  However, that integer is too large to be stored in a
   557         -2-byte unsigned integer, so a value of 0 is used in its place.
          562  +(the usual value for reserved space) then the cell content offset of an
          563  +empty page wants to be 65536.  
          564  +However, that integer is too large to be stored in a
          565  +2-byte unsigned integer, so a value of 0 is used in its place.)^
   558    566   
   559    567   <p>A freeblock is a structure used to identify unallocated space within
   560         -a b-tree page.  Freeblocks are organized on a chain.  The first 2 bytes of
          568  +a b-tree page.  Freeblocks are organized as a chain.  ^The first 2 bytes of
   561    569   a freeblock are a big-endian integer which is the offset in the b-tree page
   562    570   of the next freeblock in the chain, or zero if the freeblock is the last on
   563         -the chain.  The third and fourth bytes of each freeblock form
          571  +the chain.  ^The third and fourth bytes of each freeblock form
   564    572   a big-endian integer which is the size of the freeblock in bytes, including
   565         -the 4-byte header.  Freeblocks are always connected in order 
   566         -of increasing offset.  The second field of the b-tree page header is the
          573  +the 4-byte header.  ^Freeblocks are always connected in order 
          574  +of increasing offset.  ^The second field of the b-tree page header is the
   567    575   offset of the first freeblock, or zero if there are no freeblocks on the
   568         -page.  In a well-formed b-tree page, there will always be at least one cell
          576  +page.  ^In a well-formed b-tree page, there will always be at least one cell
   569    577   before the first freeblock.</p>
   570    578   
   571    579   <p>A freeblock requires at least 4 bytes of space.  If there is an isolated
   572    580   group of 1, 2, or 3 unused bytes within the cell content area, those bytes
   573         -comprise a fragment.  The total number of bytes in all fragments is stored
   574         -in the fifth field of the b-tree page header.  In a well-formed b-tree page,
          581  +comprise a fragment.  ^The total number of bytes in all fragments is stored
          582  +in the fifth field of the b-tree page header.  ^In a well-formed b-tree page,
   575    583   the total number of bytes in fragments may not exceed 60.</p>
   576    584   
   577    585   <p>The total amount of free space on a b-tree page consists of the size
   578    586   of the unallocated region plus the total size of all freeblocks plus the
   579         -number of fragmented free bytes.  SQLite may from time to time reorganize
          587  +number of fragmented free bytes.  ^SQLite may from time to time reorganize
   580    588   a b-tree page so that there are no freeblocks or fragment bytes, all
   581    589   unused bytes are contained in the unallocated space region, and all
   582    590   cells are packed tightly at the end of the page.  This is called 
   583    591   "defragmenting" the b-tree page.</p>
   584    592   
   585    593   <tcl>hd_fragment varint {variable-length integer} {varint}</tcl>
   586    594   
................................................................................
   692    700   the page type.  For the following computations, let U be the usable size
   693    701   of a database page, the total page size less the reserved space at the
   694    702   end of each page.  And let P be the payload size.</p>
   695    703   
   696    704   <blockquote><dl>
   697    705   <dt>Table B-Tree Leaf Cell:</dt>
   698    706   <dd><p>
   699         -If the payload size P is less than U-36 then the entire payload is stored
   700         -on the b-tree page.  Let M be ((U-12)*32/255)-23.  If P is greater than U-35
          707  +^If the payload size P is less than U-36 then the entire payload is stored
          708  +on the b-tree page.  ^(Let M be ((U-12)*32/255)-23.  If P is greater than U-35
   701    709   then the number of byte stored on the b-tree page is the lessor of
   702         -M+((P-M)%(U-4)) and U-35.
          710  +M+((P-M)%(U-4)) and U-35.)^
   703    711   </p></dd>
   704    712   
   705    713   <dt>Table B-Tree Interior Cell:</dt>
   706    714   <dd><p>
   707    715   Interior pages of table b-trees have no payload and so there is never
   708    716   any payload to spill.
   709    717   </p></dd>
   710    718   
   711    719   <dt>Index B-Tree Leaf Or Interior Cell:</dt>
   712    720   <dd><p>
   713         -If the payload size P is less than ((U-12)*64/255)-22 then the entire
          721  +^If the payload size P is less than ((U-12)*64/255)-22 then the entire
   714    722   payload is stored on the b-tree page.  
   715         -Let M be ((U-12)*32/255)-23.  If P is greater than ((U-12)*64/255)-23
          723  +^(Let M be ((U-12)*32/255)-23.  If P is greater than ((U-12)*64/255)-23
   716    724   then the number of byte stored on the b-tree page is the lessor of
   717         -M+((P-M)%(U-4)) and ((U-12)*64/255)-23.
          725  +M+((P-M)%(U-4)) and ((U-12)*64/255)-23.)^
   718    726   </p></dd>
   719    727   </dl></blockquote>
   720    728   
   721    729   <p>The overflow thresholds are designed to give a minimum fanout of
   722    730   4 for index b-trees and to make sure enough of the payload
   723    731   is on the b-tree page that the record header can usually be accessed
   724    732   without consulting an overflow page.  In hindsight, the designers of
................................................................................
   725    733   the SQLite b-tree logic realize that these thresholds could have been
   726    734   made much simpler.  However, the computations cannot be now be changed
   727    735   without resulting in an incompatible file format.  And the current computations
   728    736   work well, even if they are a little complex.</p>
   729    737   
   730    738   <h3>1.6 Cell Payload Overflow Pages</h3>
   731    739   
   732         -<p>When the payload of a b-tree cell is too large for the b-tree page,
   733         -the surplus is spilled onto overflow pages.  Overflow pages form a linked
   734         -list.  The first four bytes of each overflow page are a big-endian
          740  +<p>^When the payload of a b-tree cell is too large for the b-tree page,
          741  +the surplus is spilled onto overflow pages.  ^Overflow pages form a linked
          742  +list.  ^The first four bytes of each overflow page are a big-endian
   735    743   integer which is the page number of the next page in the chain, or zero
   736         -for the final page in the chain.  The fifth byte through the last usable
          744  +for the final page in the chain.  ^The fifth byte through the last usable
   737    745   byte are used to hold overflow content.</p>
   738    746   
   739    747   <h3>1.7 Pointer Map or Ptrmap Pages</h3>
   740    748   
   741    749   <p>Pointer map or ptrmap pages are extra pages inserted into the database
   742    750   to make the operation of [auto_vacuum] and [incremental_vacuum] modes
   743    751   more efficient.  Other page types in the database typically have pointers
   744    752   from parent to child.  For example, a interior b-tree page contains pointers
   745    753   to its child b-tree pages and an overflow chain has a pointer
   746    754   from earlier to later links in the chain.  A ptrmap page contains linkage
   747    755   information going in the opposite direction, from child to parent.</p>
   748    756   
   749         -<p>Ptrmap pages must exist in any database file which has a non-zero
          757  +<p>^Ptrmap pages must exist in any database file which has a non-zero
   750    758   largest root b-tree page value at offset 52 in the database header.
   751         -If the largest root b-tree page value is zero, then the database must not
          759  +^If the largest root b-tree page value is zero, then the database must not
   752    760   contain ptrmap pages.</p>
   753    761   
   754         -<p>In a database with ptrmap pages, the first ptrmap page is page 2.
          762  +<p>^In a database with ptrmap pages, the first ptrmap page is page 2.
   755    763   A ptrmap page consists of an array of 5-byte entries.  Let J be the
   756    764   number of 5-byte entries that will fit in the usable space of a page.
   757         -(In other words, J=U/5.)  The first ptrmap page will contain back pointer
   758         -information for pages 3 through J+2, inclusive.  The next pointer map
          765  +(In other words, J=U/5.)  ^The first ptrmap page will contain back pointer
          766  +information for pages 3 through J+2, inclusive.  ^The second pointer map
   759    767   page will be on page J+3 and that ptrmap page will provide back pointer
   760    768   information for pages J+4 through 2*J+3 inclusive.  An so forth for
   761    769   the entire database file.</p>
   762    770   
   763         -<p>In a database that uses ptrmap pages, all pages at locations identified
          771  +<p>^(In a database that uses ptrmap pages, all pages at locations identified
   764    772   by the computation in the previous paragraph must be ptrmap page and no
   765    773   other page may be a ptrmap page.  Except, if the byte-lock page happens to
   766    774   fall on the same page number as a ptrmap page, then the ptrmap is moved
   767         -to the following page for that one case.</p>
          775  +to the following page for that one case.)^</p>
   768    776   
   769    777   <p>Each 5-byte entry on a ptrmap page provides back-link information about 
   770         -one of pages that immediately follow the pointer map.  If page B is
          778  +one of pages that immediately follow the pointer map.  ^(If page B is a
   771    779   ptrmap page then back-link information about page B+1 is provided by
   772    780   the first entry on the pointer map.  Information about page B+2 is
   773         -provided by the second entry.  And so forth.</p>
          781  +provided by the second entry.  And so forth.)^</p>
   774    782   
   775    783   <p>Each 5-byte ptrmap entry consists of one byte of "page type" information
   776    784   followed by a 4-byte big-endian page number.  Five page types are recognized:
   777    785   </p>
   778    786   
   779    787   <ol>
   780    788   <li>A b-tree root page.  The
................................................................................
   787    795   <li>A page in an overflow chain
   788    796   other than the first page.  The page number is the prior page of the
   789    797   overflow chain.
   790    798   <li>A non-root b-tree page.  The
   791    799   page number is the parent b-tree page.
   792    800   </ol>
   793    801   
   794         -<p>In any database file that contains ptrmap pages, all b-tree root pages 
          802  +<p>^In any database file that contains ptrmap pages, all b-tree root pages 
   795    803   must come before any non-root b-tree page, cell payload overflow page, or
   796         -freelist page.</p>
          804  +freelist page.  This restriction ensures that a root page will never
          805  +be moved during an auto-vacuum or incremental-vacuum.  The auto-vacuum
          806  +logic is not to update the root_page field of the sqlite_master
          807  +table and so it is necessary to prevent root pages from being moved
          808  +during an auto-vacuum in order to preserve the integrity of the
          809  +sqlite_master table.  ^Root pages are moved to the beginning of the
          810  +database file by the CREATE TABLE, CREATE INDEX, DROP TABLE, and
          811  +DROP INDEX operations.</p>
   797    812   
   798    813   <h2>2.0 Schema Layer</h2>
   799    814   
   800    815   <p>The foregoing text describes low-level aspects of the SQLite file
   801         -format.  The b-tree mechanism provides powerful and efficient means of
          816  +format.  The b-tree mechanism provides a powerful and efficient means of
   802    817   accessing a large data set.  This section will describe how the
   803    818   low-level b-tree layer is used to implement higher-level SQL
   804    819   capabilities.</p>
   805    820   
   806    821   <tcl>hd_fragment record_format {record format}</tcl>
   807    822   <h3>2.1 Record Format</h3>
   808    823   
   809    824   <p>The content of a table b-tree leaf page and the key
   810    825   of any index b-tree page was characterized above
   811    826   as an arbitrary sequence of bytes.
   812    827   The prior discussion mentioned one key being less than another, but
   813    828   did not define what "less than" meant.  The current section will address
   814         -these deficiencies.</p>
          829  +these omissions.</p>
   815    830   
   816    831   <p>Payload, be it table content or index keys, is always in the "record
   817    832   format".  The record format defines a sequence of values corresponding to
   818    833   to columns in a table or index.  The record format specifies the number
   819    834   of columns, the datatype of each column, and the content of each column.</p>
   820    835   
   821    836   <p>The record format makes extensive use of the 
   822    837   [variable-length integer] or [varint]
   823    838   representation of 64-bit signed integers defined above.</p>
   824    839   
   825    840   <tcl>hd_fragment serialtype {serial type} {serial types}</tcl>
   826    841   <p>A record contains a header and a body, in that order.  
   827         -The header begins with a single varint which determines the total number
          842  +^(The header begins with a single varint which determines the total number
   828    843   of bytes in the header.  The varint value is the size of the header in
   829         -byte including the size varint itself.  Following the size varint are
          844  +bytes including the size varint itself.)^  ^Following the size varint are
   830    845   one or more additional varints, one per column.  These additional varints
   831    846   are called "serial type" numbers and
   832    847   determine the datatype of each column, according to the following key:</p>
   833    848   
   834    849   <center>
   835         -<i>Serial Type Codes Of The Record Format</i><br>
          850  +^(<i>Serial Type Codes Of The Record Format</i><br>
   836    851   <table width="80%" border=1>
   837    852   <tr><th>Serial Type<th>Content Size<th>Meaning
   838    853   <tr><td valign=top align=center>0<td valign=top align=center>0<td align=left>
   839    854   NULL
   840    855   <tr><td valign=top align=center>1<td valign=top align=center>1<td align=left>
   841    856   8-bit twos-complement integer
   842    857   <tr><td valign=top align=center>2<td valign=top align=center>2<td align=left>
................................................................................
   861    876   <tr><td valign=top align=center>N&#x2265;12 and even
   862    877       <td valign=top align=center>(N-12)/2<td align=left>
   863    878   A string in the database encoding and (N-12)/2 bytes in length.
   864    879   The nul terminator is omitted.
   865    880   <tr><td valign=top align=center>N&#x2265;13 and odd
   866    881       <td valign=top align=center>(N-13)/2<td align=left>
   867    882   A BLOB that is (N-13)/2 bytes in length
   868         -</table></center>
          883  +</table></center>)^
   869    884   
   870    885   <p>Note that because of the way varints are defined, the header size varint
   871    886   and serial type varints will usually consist of a single byte.  The
   872    887   serial type varints for large strings and BLOBs might extend to two or three
   873    888   byte varints, but that is the exception rather than the rule. 
   874    889   The varint format is very efficient at coding the record header.</p>
   875    890   
   876    891   <p>The values for each column in the record immediately follow the header.
   877         -Note that for serial types 0, 8, 9, 12, and 13, the value is zero bytes in
          892  +^(Note that for serial types 0, 8, 9, 12, and 13, the value is zero bytes in
   878    893   length.  If all columns are of these types then the body section of the
   879         -record is empty.</p>
          894  +record is empty.)^</p>
   880    895   
   881    896   <h3>2.2 Record Sort Order</h3>
   882    897   
   883    898   <p>The order of keys in an index b-tree is determined by the sort order of
   884    899   the records that the keys represent.  Record comparison proceeds column
   885    900   by column.  Columns of a record are examined from left to right.  The
   886    901   first pair of columns that are not equal determines the relative order
   887    902   of the two records.  The sort order of individual columns is as
   888    903   follows:</p>
   889    904   
   890    905   <ol>
   891         -<li>NULL values (serial type 0) sort first
   892         -<li>Numeric values (serial types 1 through 9) sort next and in numeric order
   893         -<li>Text values (even serial types 12 and larger) sort next in the order
          906  +<li>^NULL values (serial type 0) sort first
          907  +<li>^Numeric values (serial types 1 through 9) sort next and in numeric order
          908  +<li>^Text values (even serial types 12 and larger) sort next in the order
   894    909       determined by the columns collating function
   895         -<li>BLOB values (odd serial types 13 and larger) sort last in order 
          910  +<li>^BLOB values (odd serial types 13 and larger) sort last in order 
   896    911       determined by memcmp().
   897    912   </ol>
   898    913   
   899    914   <p>A collating function for each column is necessary in order to compute
   900    915   the order of text fields.  SQLite defines three built-in collating functions:
   901    916   </p>
   902    917   
   903    918   <blockquote><table border=0 cellspacing=10>
   904         -<tr><td valign=top>BINARY
          919  +<tr>^<td valign=top>BINARY
   905    920       <td>Strings are compared byte by byte using the memcmp() function
   906    921           from the standard C library.
   907         -<tr><td valign=top>NOCASE
          922  +<tr>^<td valign=top>NOCASE
   908    923       <td>Like BINARY except that uppercase ASCII characters ('A' through 'Z')
   909    924           are folded into their lowercase equivalents prior to running the
   910         -        comparison.  Note that only ASCII characters are case-folded.  NOCASE
          925  +        comparison.  Note that only ASCII characters are case-folded.  ^NOCASE
   911    926           does not implement a general purpose unicode caseless comparison.
   912         -<tr><td valign=top>RTRIM
          927  +<tr>^<td valign=top>RTRIM
   913    928       <td>Like BINARY except that spaces at the end of the string are elided
   914    929           prior to comparison.
   915    930   </table></blockquote>
   916    931   
   917         -<p>Additional application-specific collating functions can be added to
          932  +<p>^Additional application-specific collating functions can be added to
   918    933   SQLite using the [sqlite3_create_collation()] interface.</p>
   919    934   
   920         -<p>The default collating function for all strings is BINARY.
   921         -Alternative collating functions for table columns can be specified in the
          935  +<p>^The default collating function for all strings is BINARY.
          936  +^Alternative collating functions for table columns can be specified in the
   922    937   [CREATE TABLE] statement using the COLLATE clause on the column definition.
   923         -When a column is indexed, the same collating function specified in the
          938  +^When a column is indexed, the same collating function specified in the
   924    939   [CREATE TABLE] statement is used for the column in the index, by default,
   925    940   though this can be overridden using a COLLATE clause in the 
   926    941   [CREATE INDEX] statement.
   927    942   
   928    943   <h3>2.3 Representation Of SQL Tables</h3>
   929    944   
   930    945   <p>Each ordinary SQL table in the database schema is represented on disk
................................................................................
   931    946   by a table b-tree.  Each entry in the table b-tree corresponds to a row
   932    947   of the SQL table.  The [rowid] of the SQL table is the 64-bit signed
   933    948   integer key for each entry in the table b-tree.</p>
   934    949   
   935    950   <p>The content of each SQL table row is stored in the database file by
   936    951   first combining the values in the various columns into a byte array
   937    952   in the record format, then storing that byte array as the payload in
   938         -an entry in the table b-tree.  The order of values in the record is
          953  +an entry in the table b-tree.  ^The order of values in the record is
   939    954   the same as the order of columns in the SQL table definition.
   940         -When an SQL table that includes an
          955  +^When an SQL table that includes an
   941    956   [INTEGER PRIMARY KEY] column (which aliases the [rowid]) then that
   942         -column appears in the record as a NULL value.  SQLite will know to use
          957  +column appears in the record as a NULL value.  ^SQLite will always use
   943    958   the table b-tree key rather than the NULL value when referencing the
   944    959   [INTEGER PRIMARY KEY] column.</p>
   945    960   
   946         -<p>If the [affinity] of a column is REAL and that column contains a
          961  +<p>^If the [affinity] of a column is REAL and that column contains a
   947    962   value that can be converted to an integer without loss of information
   948    963   (if the value contains no fractional part and is not too large to be
   949    964   represented as an integer) then the column may be stored in the record
   950         -as an integer.  SQLite will know to convert the value back to floating
          965  +as an integer.  ^SQLite will convert the value back to floating
   951    966   point when extracting it from the record.</p>
   952    967   
   953    968   
   954    969   <h3>2.4 Representation Of SQL Indices</h3>
   955    970   
   956         -<p>Each SQL index, whether explicitly declared via a [CREATE INDEX] statement
          971  +<p>^Each SQL index, whether explicitly declared via a [CREATE INDEX] statement
   957    972   or implied by a UNIQUE constraint, corresponds to an index b-tree in the
   958    973   database file.
   959         -There is one entry in index b-tree for each row in the corresponding table.
   960         -The key to an index b-tree is
          974  +^There is one entry in index b-tree for each row in the corresponding table.
          975  +^The key to an index b-tree is
   961    976   a record composed of the columns that are being indexed followed by the
   962    977   [rowid] of the table row.  Because every row in a table has a unique
   963    978   rowid and all keys in an index contain the rowid, all keys in an index
   964    979   are unique.</p>
   965    980   
   966         -<p>There is a one-to-one mapping between rows in a table and
          981  +<p>^There is a one-to-one mapping between rows in a table and
   967    982   entries in each index associated with that table.
   968         -Corresponding rows int the index and table b-trees share the same rowid
          983  +^Corresponding rows int the index and table b-trees share the same rowid
   969    984   value, and contain the same value for all indexed columns.</p>
   970    985   
   971    986   <h3>2.5 Storage Of The SQL Database Schema</h3>
   972    987   
   973         -<p>Page 1 of a database file is the root page of a table b-tree that
          988  +<p>^Page 1 of a database file is the root page of a table b-tree that
   974    989   holds a special table named "sqlite_master" (or "sqlite_temp_master" in
   975    990   the case of a TEMP database) which stores the complete
   976         -database schema.  The structure of the sqlite_master table is as
          991  +database schema.  ^(The structure of the sqlite_master table is as
   977    992   if it had been created using the following SQL:</p>
   978    993   
   979    994   <blockquote><pre>
   980    995   CREATE TABLE sqlite_master(
   981    996     type text,
   982    997     name text,
   983    998     tbl_name text,
   984    999     rootpage integer,
   985   1000     sql text
   986   1001   );
   987         -</pre></blockquote>
         1002  +</pre></blockquote>)^
   988   1003   
   989         -<p>The sqlite_master table contains a row for each table, index, view,
         1004  +<p>^The sqlite_master table contains a row for each table, index, view,
   990   1005   and trigger in the database schema, except there is no entry for the
   991   1006   sqlite_master table itself.</p>
   992   1007   
   993         -<p>The sqlite_master.type column will be one
         1008  +<p>^(The sqlite_master.type column will be one
   994   1009   of the following text strings:  'table', 'index', 'view', or 'trigger'
   995         -according to the type of object defined.  The 'table' string is used
   996         -for both ordinary and [virtual tables].</p>
         1010  +according to the type of object defined.  ^The 'table' string is used
         1011  +for both ordinary and [virtual tables].)^</p>
   997   1012   
   998         -</p>The sqlite_master.name column will hold the name of the object.
   999         -For indices that are automatically created by UNIQUE constraints, the
  1000         -name is "sqlite_autoindex_TABLE_N" where TABLE is replaced by the name
  1001         -of the table that contains the UNIQUE constraint and N is an integer beginning
  1002         -with 1 and increasing by one with each UNIQUE constraint seen.</p>
         1013  +</p>^(The sqlite_master.name column will hold the name of the object.
         1014  +^For indices that are automatically created by UNIQUE or PRIMARY KEY
         1015  +constraints, the name is "sqlite_autoindex_TABLE_N" where TABLE is 
         1016  +replaced by the name of the table that contains the constraint and N 
         1017  +is an integer beginning
         1018  +with 1 and increasing by one with each constraint seen.)^</p>
  1003   1019   
  1004   1020   <p>The sqlite_master.tbl_name column holds the name of a table or view
  1005         -that the object is associated with.  For a table or view, the
  1006         -tbl_name column is a copy of the name column.  For an index, the tbl_name
  1007         -is the name of the table that is indexed.  For a trigger, the tbl_name
         1021  +that the object is associated with.  ^For a table or view, the
         1022  +tbl_name column is a copy of the name column.  ^For an index, the tbl_name
         1023  +is the name of the table that is indexed.  ^For a trigger, the tbl_name
  1008   1024   column stores the name of the table or view that causes the trigger 
  1009   1025   to fire.</p>
  1010   1026   
  1011         -<p>The sqlite_master.rootpage column stores the page number of the root
  1012         -b-tree page for tables and indices.  For rows that define views, triggers,
         1027  +<p>^(The sqlite_master.rootpage column stores the page number of the root
         1028  +b-tree page for tables and indices.)^  ^For rows that define views, triggers,
  1013   1029   and virtual tables, the rootpage column is 0 or NULL.</p>
  1014   1030   
  1015         -<p>The sqlite_master.sql column stores SQL text that describes the
         1031  +<p>^(The sqlite_master.sql column stores SQL text that describes the
  1016   1032   object.  This SQL text is a [CREATE TABLE], [CREATE VIRTUAL TABLE],
  1017   1033   [CREATE INDEX],
  1018   1034   [CREATE VIEW], or [CREATE TRIGGER] statement that if evaluated against
  1019   1035   the database file when it is the main database of a [database connection]
  1020         -would recreated the object.  The text is usually a copy of the original
         1036  +would recreated the object.)  The text is usually a copy of the original
  1021   1037   statement used to create the object but with normalizations applied so
  1022   1038   that the text conforms to the following rules:
  1023   1039   
  1024   1040   <ul>
  1025         -<li>The CREATE, TABLE, VIEW, TRIGGER, and INDEX keywords at the beginning
         1041  +<li>^The CREATE, TABLE, VIEW, TRIGGER, and INDEX keywords at the beginning
  1026   1042   of the statement are converted to all upper case letters.
  1027         -<li>The TEMP or TEMPORARY keyword is removed if it occurs after the 
         1043  +<li>^The TEMP or TEMPORARY keyword is removed if it occurs after the 
  1028   1044   initial CREATE keyword.
  1029         -<li>Any database name qualifier that occurs prior to the name of the
         1045  +<li>^Any database name qualifier that occurs prior to the name of the
  1030   1046   object being created is removed.
  1031         -<li>Leading spaces are removed.
  1032         -<li>All spaces following the first two keywords are converted into a single
         1047  +<li>^Leading spaces are removed.
         1048  +<li>^All spaces following the first two keywords are converted into a single
  1033   1049   space.
  1034   1050   </ul>
  1035   1051   
  1036         -<p>The text in the sqlite_master.sql column is a copy of the original
         1052  +<p>^(The text in the sqlite_master.sql column is a copy of the original
  1037   1053   CREATE statement text that created the object, except normalized as
  1038         -described above and as modified by subsequent [ALTER TABLE] statements.</p>
         1054  +described above and as modified by subsequent [ALTER TABLE] statements.)^</p>
  1039   1055   
  1040         -<p>For indices that are automatically created by UNIQUE constraints,
  1041         -the sqlite_master.sql field is NULL.</p>
         1056  +<p>^(For indices that are automatically created by UNIQUE or
         1057  +PRIMARY KEY constraints, the sqlite_master.sql field is NULL.)^</p>
  1042   1058   
  1043   1059   
  1044   1060   <tcl>hd_fragment rollbackjournal {rollback journal format}</tcl>
  1045   1061   <h2>3.0 The Rollback Journal</h2>
  1046   1062   
  1047   1063   <p>The rollback journal is a file associated with each SQLite database
  1048   1064   fail that hold information used to restore the database file to its initial
  1049   1065   state during the course of a transaction.
  1050         -The rollback journal file is always located in the directory as the database
         1066  +^The rollback journal file is always located in the directory as the database
  1051   1067   file and has the same name as the database file but with the string
  1052   1068   "<tt>-journal</tt>" appended.  There can only be a single rollback journal
  1053   1069   associated with a give database and hence there can only be one write
  1054   1070   transaction open against a single database at one time.</p>
  1055   1071   
  1056   1072   <p>If a transaction is aborted due to an application crash, an operating
  1057   1073   system crash, or a hardware power failure or crash, then the database may
  1058         -be left in an inconsistent state.  The next time SQLite attempts to open
         1074  +be left in an inconsistent state.  ^The next time SQLite attempts to open
  1059   1075   the database file, the presence of the rollback journal file will be 
  1060   1076   detected and the journal will be automatically played back to restore the
  1061   1077   database to its state at the start of the incomplete transaction.</p>
  1062   1078   
  1063         -<p>A rollback journal is only considered to be valid if it exists and
         1079  +<p>^A rollback journal is only considered to be valid if it exists and
  1064   1080   contains a valid header.  Hence a transaction can be committed in one
  1065   1081   of three ways:
  1066   1082   <ol>
  1067         -<li>The rollback journal file can be deleted,
  1068         -<li>The rollback journal file can be truncated to zero length, or
  1069         -<li>The header of the rollback journal can be overwritten with
         1083  +<li>^The rollback journal file can be deleted,
         1084  +<li>^The rollback journal file can be truncated to zero length, or
         1085  +<li>^The header of the rollback journal can be overwritten with
  1070   1086   invalid header text (for example, all zeros).
  1071   1087   </ol>
  1072         -These three ways of committing a transaction correspond to the DELETE,
         1088  +^These three ways of committing a transaction correspond to the DELETE,
  1073   1089   TRUNCATE, and PERSIST settings, respectively, of the [journal_mode pragma].
  1074   1090   </p>
  1075   1091   
  1076   1092   
  1077   1093   <p>A valid rollback journal begins with a header in the following format:</p>
  1078   1094   
  1079   1095   <center>
  1080   1096   <i>Rollback Journal Header Format</i><br>
  1081   1097   <table width="80%" border=1>
  1082   1098   <tr><th>Offset<th>Size<th>Description
  1083         -<tr><td valign=top align=center>0
         1099  +<tr>^(<td valign=top align=center>0
  1084   1100       <td valign=top align=center>8
  1085         -    <td>Header string:  0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, 0xd7
  1086         -<tr><td valign=top align=center>8
         1101  +    <td>Header string:  0xd9, 0xd5, 0x05, 0xf9, 0x20, 0xa1, 0x63, 0xd7)^
         1102  +<tr>^(<td valign=top align=center>8
  1087   1103       <td valign=top align=center>4
  1088   1104       <td>The "Page Count" - The number of pages in the next segment of the 
  1089   1105           journal, or -1 to
  1090         -        mean all content to the end of the file
  1091         -<tr><td valign=top align=center>12
         1106  +        mean all content to the end of the file)^
         1107  +<tr>^(<td valign=top align=center>12
  1092   1108       <td valign=top align=center>4
  1093         -    <td>A random nonce for the checksum
  1094         -<tr><td valign=top align=center>16
         1109  +    <td>A random nonce for the checksum)^
         1110  +<tr>^(<td valign=top align=center>16
  1095   1111       <td valign=top align=center>4
  1096         -    <td>Initial size of the database in pages
  1097         -<tr><td valign=top align=center>20
         1112  +    <td>Initial size of the database in pages)^
         1113  +<tr>^(<td valign=top align=center>20
  1098   1114       <td valign=top align=center>4
  1099   1115       <td>Size of a disk sector assumed by the process that wrote this
  1100         -        journal.
  1101         -<tr><td valign=top align=center>24
         1116  +        journal.)^
         1117  +<tr>^(<td valign=top align=center>24
  1102   1118       <td valign=top align=center>4
  1103         -    <td>Size of pages in this journal.
         1119  +    <td>Size of pages in this journal.)^
  1104   1120   </table>
  1105   1121   </center>
  1106   1122   
  1107         -<p>A rollback journal header is padded with zeros out to the size of a 
         1123  +<p>^A rollback journal header is padded with zeros out to the size of a 
  1108   1124   single sector (as defined by the sector size integer at offset 20).
  1109   1125   The header is in a sector by itself so that if a power loss occurs while
  1110   1126   writing the sector, information that follows the header will be
  1111   1127   (hopefully) undamaged.</p>
  1112   1128   
  1113         -<p>After the header and zero padding are zero or more page records.  Each
         1129  +<p>^After the header and zero padding are zero or more page records.  ^Each
  1114   1130   page record stores a copy of the content of a page from the database file
  1115         -before it was changed.  The same page may not appear more than once
         1131  +before it was changed.  ^The same page may not appear more than once
  1116   1132   within a single rollback journal.
  1117   1133   To rollback an incomplete transaction, a process
  1118   1134   has merely to read the rollback journal from beginning to end and
  1119   1135   write pages found in the journal back into the database file at the
  1120   1136   appropriate location.</p>
  1121   1137   
  1122   1138   <p>Let the database page size (the value of the integer at offset 24 
................................................................................
  1123   1139   in the journal header) be N.
  1124   1140   Then the format of a page record is as follows:</p>
  1125   1141   
  1126   1142   <center>
  1127   1143   <i>Rollback Journal Page Record Format</i><br>
  1128   1144   <table width="80%" border=1>
  1129   1145   <tr><th>Offset<th>Size<th>Description
  1130         -<tr><td valign=top align=center>0
         1146  +<tr>^(<td valign=top align=center>0
  1131   1147       <td valign=top align=center>4
  1132         -    <td>The page number in the database file
  1133         -<tr><td valign=top align=center>4
         1148  +    <td>The page number in the database file)^
         1149  +<tr>^(<td valign=top align=center>4
  1134   1150       <td valign=top align=center>N
  1135         -    <td>Original content of the page prior to the start of the transaction
  1136         -<tr><td valign=top align=center>N+4
         1151  +    <td>Original content of the page prior to the start of the transaction)^
         1152  +<tr>^(<td valign=top align=center>N+4
  1137   1153       <td valign=top align=center>4
  1138         -    <td>Checksum
         1154  +    <td>Checksum)^
  1139   1155   </table>
  1140   1156   </center>
  1141   1157   
  1142   1158   
  1143         -<p>The checksum is an unsigned 32-bit integer computed as follows:</p>
         1159  +<p>^(The checksum is an unsigned 32-bit integer computed as follows:</p>
  1144   1160   
  1145   1161   <ol>
  1146   1162   <li>Initialize the checksum to the checksum nonce value found in the
  1147   1163   journal header at offset 12.
  1148   1164   <li>Initialize index X to be N-200 (where N is the size of a database page
  1149   1165   in bytes.
  1150   1166   <li>Interpret the four bytes at offset X into the page as a 4-byte big-endian
  1151   1167   unsigned integer.  Add the value of that integer to the checksum.
  1152   1168   <li>Subtrace 200 from X.
  1153   1169   <li>If X is greater than or equal to zero, go back to step 3.
  1154         -</ol>
         1170  +</ol>)^
  1155   1171   
  1156   1172   <p>The checksum value is used to guard against incomplete writes of
  1157   1173   a journal page record following a power failure.  A different random nonce
  1158   1174   is used each time a transaction is started in order to minimize the risk
  1159   1175   that unwritten sectors might by chance contain data from the same page
  1160   1176   that was a part of prior journals.  By changing the nonce for each
  1161   1177   transaction, stale data on disk will still generate an incorrect checksum
  1162   1178   and be detected with high probability.  The checksum only uses a sparce sample
  1163   1179   of 32-bit words from the data record for performance reasons - design studies 
  1164   1180   during the planning phases of SQLite 3.0.0 showed
  1165   1181   a significant performance hit in checksumming the entire page.</p>
  1166   1182   
  1167         -<p>Let the page count value at offset 8 in the journal header by M.
  1168         -If M is greater than zero then after M page records the journal file
         1183  +<p>Let the page count value at offset 8 in the journal header be M.
         1184  +^If M is greater than zero then after M page records the journal file
  1169   1185   may be zero padded out to the next multiple of the sector size and another
  1170         -journal header may be inserted.  All journal headers within the same
         1186  +journal header may be inserted.  ^All journal headers within the same
  1171   1187   journal must contain the same database page size and sector size.</p>
  1172   1188   
  1173         -<p>If M is -1 in the initial journal header, then the number of page records
  1174         -that follow is computed by figure out how many page records will fit in
         1189  +<p>^If M is -1 in the initial journal header, then the number of page records
         1190  +that follow is computed by computing how many page records will fit in
  1175   1191   the available space of the remainder of the journal file.</p>
  1176   1192   
  1177   1193   <tcl>hd_fragment walformat {WAL format}</tcl>
  1178   1194   <h2>4.0 The Write-Ahead Log</h2>
  1179   1195   
  1180   1196   <p>Beginning with [version 3.7.0], SQLite supports a new transaction
  1181   1197   control mechanism called "[WAL | write-ahead log]" or "[WAL]".
  1182         -When a database is in WAL mode, all connections to that database must
  1183         -use the WAL.  A particular database will use either a rollback journal
         1198  +^When a database is in WAL mode, all connections to that database must
         1199  +use the WAL.  ^A particular database will use either a rollback journal
  1184   1200   or a WAL, but not both at the same time.
  1185         -The WAL is always located in the directory as the database
         1201  +^The WAL is always located in the directory as the database
  1186   1202   file and has the same name as the database file but with the string
  1187   1203   "<tt>-wal</tt>" appended.</p>
  1188   1204   
  1189   1205   <h3>4.1 WAL File Format</h4>
  1190   1206   
  1191   1207   <p>A WAL file consists of a header followed by zero or more "frames".
  1192   1208   Each frame records the revised content of a single page from the
  1193   1209   database file.  All changes to the database are recorded by writing
  1194   1210   frames into the WAL.  Transactions commit when a frame is written that
  1195         -contains a commit marker.  A single WAL can and usually does record 
         1211  +contains a commit marker.  ^A single WAL can and usually does record 
  1196   1212   multiple transactions.  Periodically, the content of the WAL is
  1197   1213   transferred back into the database file in an operation called a
  1198   1214   "checkpoint".</p>
  1199   1215   
  1200         -<p>A single WAL file can be reused multiple times.  In other words, the
         1216  +<p>^A single WAL file can be reused multiple times.  ^In other words, the
  1201   1217   WAL can fill up with frames and then be checkpointed and then new
  1202         -frames can overwrite the old ones.  A WAL always grows from beginning
         1218  +frames can overwrite the old ones.  ^A WAL always grows from beginning
  1203   1219   toward the end.  Checksums and counters attached to each frame are
  1204   1220   used to determine which frames within the WAL are valid and which
  1205   1221   are leftovers from prior checkpoints.</p>
  1206   1222   
  1207   1223   <p>The WAL header is 32 bytes in size and consists of the following eight
  1208   1224   big-endian 32-bit unsigned integer values:</p>
  1209   1225   
................................................................................
  1251   1267   <tr><td valign=top align=center>16<td valign=top align=center>4
  1252   1268       <td>Checksum-1:  Cumulative checksum up through and including this page
  1253   1269   <tr><td valign=top align=center>20<td valign=top align=center>4
  1254   1270       <td>Checksum-2:  Second half of the cumulative checksum.
  1255   1271   </table>
  1256   1272   </center>
  1257   1273   
  1258         -<p>A frame is considered valid if and only if the following conditions are
         1274  +^(<p>A frame is considered valid if and only if the following conditions are
  1259   1275   true:</p>
  1260   1276   
  1261   1277   <ol>
  1262   1278   <li><p>The salt-1 and salt-2 values in the frame-header match
  1263   1279          salt values in the wal-header</p></li>
  1264   1280   
  1265   1281   <li><p>The checksum values in the final 8 bytes of the frame-header
  1266   1282          exactly match the checksum computed consecutively on the
  1267   1283          WAL header and the first 8 bytes and the content of all frames
  1268   1284          up to and including the current frame.</p></li></li>
  1269         -</ol>
         1285  +</ol>)^
  1270   1286   
  1271   1287   <tcl>hd_fragment walcksm {WAL checksum algorithm}</tcl>
  1272   1288   <h3>4.2 Checksum Algorithm</h3>
  1273   1289   
  1274   1290   <p>The checksum is computed by interpreting the input as
  1275   1291   an even number of unsigned 32-bit integers: x(0) through x(N).
  1276         -The 32-bit integers are big-endian if the
         1292  +^The 32-bit integers are big-endian if the
  1277   1293   magic number in the first 4 bytes of the WAL header is 0x377f0683 and
  1278   1294   the integers are little-endian the magic number is 0x377f0682.
  1279         -The checksum values are always stored in the frame header in a
         1295  +^The checksum values are always stored in the frame header in a
  1280   1296   big-endian format regardless of which byte order is used to compute
  1281   1297   the checksum.</p>
  1282   1298   
  1283   1299   <p>The checksum algorithm only works for content which is a multiple of
  1284   1300   8 bytes in length.  In other words, if the inputs are x(0) through x(N)
  1285   1301   then N must be odd.
  1286         -The checksum algorithm is as follows:
         1302  +^(The checksum algorithm is as follows:
  1287   1303   
  1288   1304   <blockquote><pre> 
  1289   1305   s0 = s1 = 0
  1290   1306   for i from 0 to n-1 step 2:
  1291   1307      s0 += x(i) + s1;
  1292   1308      s1 += x(i+1) + s0;
  1293   1309   endfor
  1294   1310   # result in s0 and s1
  1295         -</pre></blockquote>
         1311  +</pre></blockquote>)^
  1296   1312   
  1297   1313   <p>The outputs s0 and s1 are both weighted checksums using fibonacci weights
  1298   1314   in reverse order (the largest fibonacci weight occurs on the first element
  1299   1315   of the sequence being summed.)  The s1 value spans all 32-bit integer
  1300   1316   terms of the sequence whereas s0 omits the final term.</p>
  1301   1317   
  1302   1318   <h3>4.3 Checkpoint Algorithm</h3>
................................................................................
  1306   1322   Then valid content of the WAL is transferred into the database file.
  1307   1323   Finally, the database is flushed to persistent storage using another
  1308   1324   xSync method call.
  1309   1325   The xSync operations serve as write barriers - all writes launched
  1310   1326   before the xSync must complete before any write that launches after the
  1311   1327   xSync begins.</p>
  1312   1328   
  1313         -<p>After each checkpoint, the WAL header salt-1 value is incremented and the 
         1329  +<p>^After each checkpoint, the WAL header salt-1 value is incremented and the 
  1314   1330   salt-2 value is randomized.  This prevents old and new frames in the WAL from
  1315   1331   being considered valid at the same time and being checkpointing together
  1316   1332   following a crash.</p>
  1317   1333   
  1318   1334   <tcl>hd_fragment walread {WAL read algorithm}</tcl>
  1319   1335   <h3>4.4 Reader Algorithm</h3>
  1320   1336   
  1321         -<p>To read a page from the database (call it page number P), a reader
         1337  +<p>^(To read a page from the database (call it page number P), a reader
  1322   1338   first checks the WAL to see if it contains page P.  If so, then the
  1323         -last valid instance of page P that is a followed by a commit frame
  1324         -or is a commit frame itself becomes the value read.  If the WAL
         1339  +last valid instance of page P that is followed by a commit frame
         1340  +or is a commit frame itself becomes the value read.)^  ^If the WAL
  1325   1341   contains no copies of page P that are valid and which are a commit
  1326   1342   frame or are followed by a commit frame, then page P is read from
  1327   1343   the database file.</p>
  1328   1344   
  1329   1345   <p>To start a read transaction, the reader records the index of the last
  1330   1346   valid frame in the WAL.  The reader uses this recorded "mxFrame" value
  1331   1347   for all subsequent read operations.  New transactions can be appended
  1332   1348   to the WAL, but as long as the reader uses its original mxFrame value
  1333   1349   and ignores subsequently appended content, the reader will see a 
  1334   1350   consistent snapshot of the database from a single point in time.  
  1335         -This technique allows multiple concurrent readers to view different 
         1351  +^This technique allows multiple concurrent readers to view different 
  1336   1352   versions of the database content simultaneously.</p>
  1337   1353   
  1338   1354   <p>The reader algorithm in the previous paragraphs works correctly, but 
  1339   1355   because frames for page P can appear anywhere within the WAL, the
  1340   1356   reader has to scan the entire WAL looking for page P frames.  If the
  1341   1357   WAL is large (multiple megabytes is typical) that scan can be slow,
  1342         -and read performance suffers.  To overcome this problem, a separate
         1358  +and read performance suffers.  ^To overcome this problem, a separate
  1343   1359   data structure called the wal-index is maintained to expedite the
  1344   1360   search for frames of a particular page.</p>
  1345   1361   
  1346   1362   <tcl>hd_fragment walindexformat {wal-index} {WAL-index format}</tcl>
  1347   1363   <h3>4.5 WAL-Index Format</h3>
  1348   1364   
  1349   1365   <p>Conceptually, the wal-index is shared memory, though the current
................................................................................
  1365   1381   
  1366   1382   <p>The <i>M</i> value in the previous paragraph is the "mxFrame" value
  1367   1383   defined in [WAL read algorithm | section 4.4] that is read at the start 
  1368   1384   of a transaction and which defines the maximum frame from the WAL that 
  1369   1385   the reader will use.</p>
  1370   1386   
  1371   1387   <p>The wal-index is transient.  After a crash, the wal-index is
  1372         -reconstructed from the original WAL file.  The VFS is required
         1388  +reconstructed from the original WAL file.  ^The VFS is required
  1373   1389   to either truncate or zero the header of the wal-index when the last
  1374   1390   connection to it closes.  Because the wal-index is transient, it can
  1375   1391   use an architecture-specific format; it does not have to be cross-platform.
  1376   1392   Hence, unlike the database and WAL file formats which store all values
  1377   1393   as big endian, the wal-index stores multi-byte values in the native
  1378   1394   byte order of the host computer.</p>
  1379   1395   
  1380   1396   <p>This document is concerned with the persist state of the database
  1381   1397   file, and since the wal-index is a transient structure, no further 
  1382   1398   information about the format of the wal-index will be provided here.
  1383   1399   Complete details on the format of the wal-index are contained within
  1384   1400   comments in SQLite source code.</p>