Documentation Source Text

Check-in [8f5cfd48d2]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to the file format documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 8f5cfd48d2c2bd135cd0b8e972494ae71bb0dff5
User & Date: drh 2010-10-06 13:47:26.000
Context
2010-10-08
02:37
Changes to the website for version 3.7.3. (check-in: c800d719f4 user: drh tags: trunk)
2010-10-06
13:47
Updates to the file format documentation. (check-in: 8f5cfd48d2 user: drh tags: trunk)
12:23
Update the change documentation for the 3.7.3 release. Fixes to the description of the index_info pragma. (check-in: d009514359 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fileformat2.in.
9
10
11
12
13
14
15
16
17


18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
format used by SQLite.</p>

<h2>1.0 The Database File</h2>

<p>Most of the time the complete state of an SQLite database 
is contained a single file on disk called the "main database file".</p>

<p>While performing a transaction, SQLite stores some temporary information
in a second file called the "rollback journal".  If the application or


host computer crashes before completing the transaction, then the rollback
journal contains critical state information needed to restore the
content of the main database file.  When a rollback journal contains
information necessary for recovering the state of the database, we say
that it is a "hot journal".  Hot journals only come up in an error recovery
scenario and so are uncommon, but they are part of the state of an SQLite
database and so should not be ignored.  This document will define the format
of a rollback journal, but most of the content of this document will focus
on the main database file.</p>

<h3>1.1 Pages</h3>

<p>The main database file consists of one or more pages.  ^The size of a
page is a power of two between 512 and 65536 inclusive.  All pages within
the same database are the same size.  ^The page size for a database file







|
|
>
>






|
|







9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
format used by SQLite.</p>

<h2>1.0 The Database File</h2>

<p>Most of the time the complete state of an SQLite database 
is contained a single file on disk called the "main database file".</p>

<p>While performing a transaction, the default behavior is to stores 
some temporary information in a second file called the "rollback journal".
(The alternative is to use a [write-ahead log] - described separately.)
If the application or
host computer crashes before completing the transaction, then the rollback
journal contains critical state information needed to restore the
content of the main database file.  When a rollback journal contains
information necessary for recovering the state of the database, we say
that it is a "hot journal".  Hot journals only come up in an error recovery
scenario and so are uncommon, but they are part of the state of an SQLite
database and so should not be ignored.  This document defines the format
of a rollback journal (and the write-ahead log file), but main focus is
on the main database file.</p>

<h3>1.1 Pages</h3>

<p>The main database file consists of one or more pages.  ^The size of a
page is a power of two between 512 and 65536 inclusive.  All pages within
the same database are the same size.  ^The page size for a database file
359
360
361
362
363
364
365
366
367
368

369
370
371
372
373
374
375
376
377
378
379
inclusive.  A database file that is less than or equal to 1073741824 bytes 
in size contains no lock-byte page.  A database file larger than
1073741824 contains exactly one lock-byte page.
</p>

<p>The lock-byte page is set aside for use by the operating-system specific
VFS implementation in implementing the database file locking primitives.
^SQLite will not use the lock-byte page; it will never be read or written
by the SQLite core, though operating-system specific VFS implementations may
choose to read or write bytes on the lock-byte page according to the 

needs and proclivities of the underlying system.  The unix and win32
VFS implementations that come built into SQLite do not write to the
lock-byte page, but we are aware of third-party VFS implementations for
other operating systems that do sometimes write to the lock-byte page.</p>

<tcl>hd_fragment {freelist} {freelist} {free-page list}</tcl>
<h3>1.4 The Freelist</h3>

<p>A database file might contain one or more pages that are not in
active use.  Unused pages can come about, for example, when information
is deleted from the database.  Unused pages are stored on the freelist







|
|
|
>


|
|







361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
inclusive.  A database file that is less than or equal to 1073741824 bytes 
in size contains no lock-byte page.  A database file larger than
1073741824 contains exactly one lock-byte page.
</p>

<p>The lock-byte page is set aside for use by the operating-system specific
VFS implementation in implementing the database file locking primitives.
^SQLite does not use the lock-byte page.  ^The SQLite core 
will never read or write, though operating-system specific VFS 
implementations may choose to read or write bytes on the lock-byte 
page according to the 
needs and proclivities of the underlying system.  The unix and win32
VFS implementations that come built into SQLite do not write to the
lock-byte page, but third-party VFS implementations for
other operating systems might.</p>

<tcl>hd_fragment {freelist} {freelist} {free-page list}</tcl>
<h3>1.4 The Freelist</h3>

<p>A database file might contain one or more pages that are not in
active use.  Unused pages can come about, for example, when information
is deleted from the database.  Unused pages are stored on the freelist
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
page number is the parent b-tree page.
</ol>

<p>^In any database file that contains ptrmap pages, all b-tree root pages 
must come before any non-root b-tree page, cell payload overflow page, or
freelist page.  This restriction ensures that a root page will never
be moved during an auto-vacuum or incremental-vacuum.  The auto-vacuum
logic is not to update the root_page field of the sqlite_master
table and so it is necessary to prevent root pages from being moved
during an auto-vacuum in order to preserve the integrity of the
sqlite_master table.  ^Root pages are moved to the beginning of the
database file by the CREATE TABLE, CREATE INDEX, DROP TABLE, and
DROP INDEX operations.</p>

<h2>2.0 Schema Layer</h2>







|







795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
page number is the parent b-tree page.
</ol>

<p>^In any database file that contains ptrmap pages, all b-tree root pages 
must come before any non-root b-tree page, cell payload overflow page, or
freelist page.  This restriction ensures that a root page will never
be moved during an auto-vacuum or incremental-vacuum.  The auto-vacuum
logic does not know how to update the root_page field of the sqlite_master
table and so it is necessary to prevent root pages from being moved
during an auto-vacuum in order to preserve the integrity of the
sqlite_master table.  ^Root pages are moved to the beginning of the
database file by the CREATE TABLE, CREATE INDEX, DROP TABLE, and
DROP INDEX operations.</p>

<h2>2.0 Schema Layer</h2>
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
<p>The content of a table b-tree leaf page and the key
of any index b-tree page was characterized above
as an arbitrary sequence of bytes.
The prior discussion mentioned one key being less than another, but
did not define what "less than" meant.  The current section will address
these omissions.</p>

<p>Payload, be it table content or index keys, is always in the "record
format".  The record format defines a sequence of values corresponding to
to columns in a table or index.  The record format specifies the number
of columns, the datatype of each column, and the content of each column.</p>

<p>The record format makes extensive use of the 
[variable-length integer] or [varint]
representation of 64-bit signed integers defined above.</p>

<tcl>hd_fragment serialtype {serial type} {serial types}</tcl>
<p>A record contains a header and a body, in that order.  
^(The header begins with a single varint which determines the total number
of bytes in the header.  The varint value is the size of the header in
bytes including the size varint itself.)^  ^Following the size varint are
one or more additional varints, one per column.  These additional varints
are called "serial type" numbers and
determine the datatype of each column, according to the following key:</p>

<center>
^(<i>Serial Type Codes Of The Record Format</i><br>
<table width="80%" border=1>
<tr><th>Serial Type<th>Content Size<th>Meaning
<tr><td valign=top align=center>0<td valign=top align=center>0<td align=left>
NULL







|
|














|







820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
<p>The content of a table b-tree leaf page and the key
of any index b-tree page was characterized above
as an arbitrary sequence of bytes.
The prior discussion mentioned one key being less than another, but
did not define what "less than" meant.  The current section will address
these omissions.</p>

<p>Payload, either table content or index keys, is always in the "record
format".  The record format defines a sequence of values corresponding
to columns in a table or index.  The record format specifies the number
of columns, the datatype of each column, and the content of each column.</p>

<p>The record format makes extensive use of the 
[variable-length integer] or [varint]
representation of 64-bit signed integers defined above.</p>

<tcl>hd_fragment serialtype {serial type} {serial types}</tcl>
<p>A record contains a header and a body, in that order.  
^(The header begins with a single varint which determines the total number
of bytes in the header.  The varint value is the size of the header in
bytes including the size varint itself.)^  ^Following the size varint are
one or more additional varints, one per column.  These additional varints
are called "serial type" numbers and
determine the datatype of each column, according to the following chart:</p>

<center>
^(<i>Serial Type Codes Of The Record Format</i><br>
<table width="80%" border=1>
<tr><th>Serial Type<th>Content Size<th>Meaning
<tr><td valign=top align=center>0<td valign=top align=center>0<td align=left>
NULL
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
^(Note that for serial types 0, 8, 9, 12, and 13, the value is zero bytes in
length.  If all columns are of these types then the body section of the
record is empty.)^</p>

<h3>2.2 Record Sort Order</h3>

<p>The order of keys in an index b-tree is determined by the sort order of
the records that the keys represent.  Record comparison proceeds column
by column.  Columns of a record are examined from left to right.  The
first pair of columns that are not equal determines the relative order
of the two records.  The sort order of individual columns is as
follows:</p>

<ol>
<li>^NULL values (serial type 0) sort first







|







888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
^(Note that for serial types 0, 8, 9, 12, and 13, the value is zero bytes in
length.  If all columns are of these types then the body section of the
record is empty.)^</p>

<h3>2.2 Record Sort Order</h3>

<p>The order of keys in an index b-tree is determined by the sort order of
the records that the keys represent.  Record comparison progresses column
by column.  Columns of a record are examined from left to right.  The
first pair of columns that are not equal determines the relative order
of the two records.  The sort order of individual columns is as
follows:</p>

<ol>
<li>^NULL values (serial type 0) sort first
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059

1060
1061
1062
1063
1064
1065
1066
PRIMARY KEY constraints, the sqlite_master.sql field is NULL.)^</p>


<tcl>hd_fragment rollbackjournal {rollback journal format}</tcl>
<h2>3.0 The Rollback Journal</h2>

<p>The rollback journal is a file associated with each SQLite database
fail that hold information used to restore the database file to its initial
state during the course of a transaction.
^The rollback journal file is always located in the directory as the database

file and has the same name as the database file but with the string
"<tt>-journal</tt>" appended.  There can only be a single rollback journal
associated with a give database and hence there can only be one write
transaction open against a single database at one time.</p>

<p>If a transaction is aborted due to an application crash, an operating
system crash, or a hardware power failure or crash, then the database may







|

|
>







1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
PRIMARY KEY constraints, the sqlite_master.sql field is NULL.)^</p>


<tcl>hd_fragment rollbackjournal {rollback journal format}</tcl>
<h2>3.0 The Rollback Journal</h2>

<p>The rollback journal is a file associated with each SQLite database
file that hold information used to restore the database file to its initial
state during the course of a transaction.
^The rollback journal file is always located in the same 
directory as the database
file and has the same name as the database file but with the string
"<tt>-journal</tt>" appended.  There can only be a single rollback journal
associated with a give database and hence there can only be one write
transaction open against a single database at one time.</p>

<p>If a transaction is aborted due to an application crash, an operating
system crash, or a hardware power failure or crash, then the database may
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
<h2>4.0 The Write-Ahead Log</h2>

<p>Beginning with [version 3.7.0], SQLite supports a new transaction
control mechanism called "[WAL | write-ahead log]" or "[WAL]".
^When a database is in WAL mode, all connections to that database must
use the WAL.  ^A particular database will use either a rollback journal
or a WAL, but not both at the same time.
^The WAL is always located in the directory as the database
file and has the same name as the database file but with the string
"<tt>-wal</tt>" appended.</p>

<h3>4.1 WAL File Format</h4>

<p>A WAL file consists of a header followed by zero or more "frames".
Each frame records the revised content of a single page from the







|







1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
<h2>4.0 The Write-Ahead Log</h2>

<p>Beginning with [version 3.7.0], SQLite supports a new transaction
control mechanism called "[WAL | write-ahead log]" or "[WAL]".
^When a database is in WAL mode, all connections to that database must
use the WAL.  ^A particular database will use either a rollback journal
or a WAL, but not both at the same time.
^The WAL is always located in the same directory as the database
file and has the same name as the database file but with the string
"<tt>-wal</tt>" appended.</p>

<h3>4.1 WAL File Format</h4>

<p>A WAL file consists of a header followed by zero or more "frames".
Each frame records the revised content of a single page from the