Documentation Source Text

Check-in [0ec9f2cb8f]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the file format documentation for the new 64K page size. Add a caution to the WAL document. Omit annoying echos in the script that removes requirement marks from the documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 0ec9f2cb8fcea0f08637f35fa94e496ca33d7f0d
User & Date: drh 2010-08-12 14:38:03.000
Context
2010-08-12
16:26
Work on the queryplanner.html document. (check-in: e8152063fb user: drh tags: trunk)
14:38
Update the file format documentation for the new 64K page size. Add a caution to the WAL document. Omit annoying echos in the script that removes requirement marks from the documentation. (check-in: 0ec9f2cb8f user: drh tags: trunk)
2010-08-10
05:58
Fix typos reported on the mailing list. (check-in: a85aaa0b62 user: drh tags: trunk)
Changes
Side-by-Side Diff Ignore Whitespace Patch
Changes to pages/fileformat.in.
826
827
828
829
830
831
832
833




834
835
836
837
838
839
840
826
827
828
829
830
831
832

833
834
835
836
837
838
839
840
841
842
843







-
+
+
+
+








    [h3 "Pages and Page Types" "pages_and_page_types"]
      <p>
        The entire database file is divided into pages, each page consisting
        of <i>page-size</i> bytes, where <i>page-size</i> is the 2-byte 
        integer value stored at offset 16 of the database header (see above).
        The <i>page-size</i> is always a power of two between 512 
        (2<sup>9</sup>) and 32768 (2<sup>15</sup>). SQLite database files
        (2<sup>9</sup>) and 32768 (2<sup>15</sup>) or the value 1 used to 
        represent a 65536-byte page. This field can equivalently be viewed
        as a little-endian number which is page size divided by 256.
        SQLite database files
        always consist of an exact number of pages.
      <p>
        Pages are numbered beginning from 1, not 0. Page 1 consists of
        the first <i>page-size</i> bytes of the database file. 
        The database header described in the previous section consumes
        the first 100 bytes of page 1.
      <p>
1374
1375
1376
1377
1378
1379
1380
1381


1382
1383
1384
1385
1386
1387
1388
1377
1378
1379
1380
1381
1382
1383

1384
1385
1386
1387
1388
1389
1390
1391
1392







-
+
+







        [Tr]<td>1..2  <td>2<td>Byte offset of first block of free space on 
                               this page. If there are no free blocks on this
                               page, this field is set to 0.
        [Tr]<td>3..4  <td>2<td>Number of cells (entries) on this page.
        [Tr]<td>5..6  <td>2<td>Byte offset of the first byte of the cell
                               content area (see figure 
                               <cite>figure_indexpage</cite>), relative to the 
                               start of the page.
                               start of the page.  If this value is zero, then
                               it should be interpreted as 65536.
        [Tr]<td>7     <td>1<td>Number of fragmented free bytes on page.
        [Tr]<td>8..11 <td>4<td>Page number of rightmost child-page (the
                               child-page that heads the sub-tree in which all
                               records are larger than all records stored on
                               this page). This field is not present for leaf
                               node pages.
      </table>
Changes to pages/fileformat2.in.
34
35
36
37
38
39
40
41

42
43
44
45
46
47
48
34
35
36
37
38
39
40

41
42
43
44
45
46
47
48







-
+







on the main database file.</p>

<h3>1.1 Pages</h3>

<p>The main database file consists of one or more pages.  The size of a
page is a power of two between 512 and 32768 inclusive.  All pages within
the same database are the same size.  The page size for a database file
is determined by the 2-byte big-endian integer located at an offset of
is determined by the 2-byte integer located at an offset of
16 bytes from the beginning of the database file.</p>

<p>Pages are numbered beginning with 1.  The maximum page number is
2147483646 (2<sup><small>31</small></sup> - 2).  The minimum size
SQLite database is a single 512-byte page.
The maximum size database would be 2147483646 pages at 32768 bytes per
page or 70,368,744,112,128 bytes (about 70 terabytes).  Usually SQLite will
100
101
102
103
104
105
106
107

108
109
110
111
112
113
114
100
101
102
103
104
105
106

107
108
109
110
111
112
113
114







-
+







<i>Database Header Format</i><br>
<table width="80%" border=1>
<tr><th>Offset<th>Size<th>Description
<tr><td valign=top align=center>0<td valign=top align=center>16<td align=left>
The header string: "SQLite format 3\000"
<tr><td valign=top align=center>16<td valign=top align=center>2<td align=left>
The database page size in bytes.  Must be a power of two between 512
and 32768 inclusive.
and 32768 inclusive, or the value 1 representing a page size of 65536.
<tr><td valign=top align=center>18<td valign=top align=center>1<td align=left>
File format write version.  1 for legacy; 2 for [WAL].
<tr><td valign=top align=center>19<td valign=top align=center>1<td align=left>
File format read version.  1 for legacy; 2 for [WAL].
<tr><td valign=top align=center>20<td valign=top align=center>1<td align=left>
Bytes of unused "reserved" space at the end of each page.  Usually 0.
<tr><td valign=top align=center>21<td valign=top align=center>1<td align=left>
152
153
154
155
156
157
158














159

160
161
162
163
164
165
166
167
168
169
170
171

172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189

190
191
192
193
194
195
196
197
198
199

200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215

216
217
218
219
220
221
222
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172

173
174
175
176
177
178
179
180
181
182
183
184

185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202

203
204
205
206
207
208
209
210
211
212

213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228

229
230
231
232
233
234
235
236







+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
+











-
+

















-
+









-
+















-
+







<h4>1.2.1 Magic Header String</h4>

<p>Every SQLite database file begins with the following 16 bytes (in hex):
53 51 4c 69 74 65 20 66 6f 72 6d 61 74 20 33 00.  This byte sequence
corresponds to the UTF-8 string "SQLite format 3" including the nul
terminator character at the end.</p>

<h4>1.2.2 Page Size</h4>

<p>The two-byte value beginning at offset 16 determines the page size of 
the database.  For SQLite versions 3.7.0.1 and earlier, this value is 
interpreted as a big-endian integer and must be a power of two between
512 and 32768, inclusive.  Beginning with SQLite version 3.7.1, a page
size of 65536 bytes is supported.  The value 65536 will not fit in a
two-byte integer, so to specify a 65536-byte page size, the value is
at offset 16 is 0x00 0x01.
This value can be interpreted as a big-endian
1 and thought of is as a magic number to represent the 65536 page size.
Or one can view the two-byte field as a little endian number and say
that it represents the page size divided by 256.</p>

<h4>1.2.2 File format version numbers</h4>
<h4>1.2.3 File format version numbers</h4>

<p>The file format write version and file format read version at offsets
18 and 19 are intended to allow for enhancements of the file format
in future versions of SQLite.  In current versions of SQLite, both of
these values are 1 for rollback journalling modes and 2 for [WAL]
journalling mode.  If a version of SQLite coded to the current
file format specification encounters a database file where the read
version is 1 or 2 but the write version is greater than 2, then the database
file must be treated as read-only.  If a database file with a read version
greater than 2 is encounter, then that database cannot be read or written.</p>

<h4>1.2.3 Reserved bytes per page</h4>
<h4>1.2.4 Reserved bytes per page</h4>

<p>SQLite has the ability to set aside a small number of extra bytes at
the end of every page for use by extensions.  These extra bytes are
used, for example, by the SQLite Encryption Extension to store a nonce
and/or cryptographic checksum associated with each page.  The 
"reserved space" size in the 1-byte integer at offset 20 is the number
of bytes of space at the end of each page to reserve for extensions.
This value is usually 0.  The value can be odd.</p>

<tcl>hd_fragment usable_size {usable size}</tcl>
<p>The "usable size" of a database page is the page size specify by the
2-byte integer at offset 16 in the header less the "reserved" space size
recorded in the 1-byte integer at offset 20 in the header.  The usable
size of a page might be an odd number.  However, the usable size is not
allowed to be less than 480.  In other words, if the page size is 512,
then the reserved space size cannot exceed 32.</p>

<h4>1.2.4 Payload fractions</h4>
<h4>1.2.5 Payload fractions</h4>

<p>The maximum and minimum embedded payload fractions and the leaf
payload fraction values must be 64, 32, and 32.  These values were
originally intended to as tunable parameters that could be used to
modify the storage format of the b-tree algorithm.  However, that
functionality is not supported and there are no current plans to add
support in the future.  Hence, these three bytes are fixed at the
values specified.</p>

<h4>1.2.5 File change counter</h4>
<h4>1.2.6 File change counter</h4>

<tcl>hd_fragment chngctr {change counter}</tcl>
<p>The file change counter is a 4-byte big-endian integer which is
incremented whenever the database file is changed in rollback mode.  
When two or more processes are reading the same database file, each 
process can detect database changes from other processes by monitoring 
the change counter.
A process will normally want to flush its database page cache when
another process modified the database, since the cache has become stale.
The file change counter facilitates this.</p>

<p>In WAL mode, changes to the database are detected using the wal-index
and so the change counter is not needed.  Hence, the change counter might
not be incremented on each transaction in WAL mode.</p>

<h4>1.2.6 In-header database size</h4>
<h4>1.2.7 In-header database size</h4>

<tcl>hd_fragment filesize {in-header database size}</tcl>
<p>The 4-byte big-endian integer at offset 28 into the header 
stores the size of the database file in pages.  If this in-header
datasize size is not valid (see the next paragraph), then the database 
size is computed by looking
at the actual size of the database file. Older versions of SQLite
235
236
237
238
239
240
241
242

243
244
245
246
247
248
249
250

251
252
253
254
255
256
257
258
259
260

261
262
263
264
265
266
267
249
250
251
252
253
254
255

256
257
258
259
260
261
262
263

264
265
266
267
268
269
270
271
272
273

274
275
276
277
278
279
280
281







-
+







-
+









-
+







know to update the in-header database size and so the in-header
database size could be incorrect.  But legacy versions of SQLite
will also leave the version-valid-for number at offset 92 unchanged
so it will not match the change-counter.  Hence, invalid in-header
database sizes can be detected (and ignored) by observing when
the change-counter does not match the version-valid-for number.</p>

<h4>1.2.7 Free page list</h4>
<h4>1.2.8 Free page list</h4>

<p>Unused pages in the database file are stored on a freelist.  The
4-byte big-endian integer at offset 32 stores the page number of
the first page of the freelist, or zero if the freelist is empty.
The 4-byte big-endian integer at offset 36 stores stores the total 
number of pages on the freelist.</p>

<h4>1.2.8 Schema cookie</h4>
<h4>1.2.9 Schema cookie</h4>

<p>The schema cookie is a 4-byte big-endian integer at offset 40
that is incremented whenever the database schema changes.  A 
prepared statement is compiled against a specific version of the
database schema.  Whenever the database schema changes, the statement
must be reprepared.  Whenever a prepared statement runs, it first checks
the schema cookie to make sure the value is the same as when the statement
was prepared and if not it aborts to force the statement to be reprepared.</p>

<h4>1.2.9 Schema format number</h4>
<h4>1.2.10 Schema format number</h4>

<p>The schema format number is a 4-byte big-endian integer at offset 44.
The schema format number is similar to the file format read and write
version numbers at offsets 18 and 19 except that the schema format number
refers to the high-level SQL formatting rather than the low-level b-tree
formatting.  Four schema format numbers are currently defined:</p>

288
289
290
291
292
293
294
295

296
297
298
299
300
301
302
303

304
305
306
307
308
309
310
311
312
313
314
315
316
317

318
319
320
321
322
323
324

325
326
327
328
329
330
331

332
333
334
335
336
337
338
339
340

341
342
343
344
345
346
347
302
303
304
305
306
307
308

309
310
311
312
313
314
315
316

317
318
319
320
321
322
323
324
325
326
327
328
329
330

331
332
333
334
335
336
337

338
339
340
341
342
343
344

345
346
347
348
349
350
351
352
353

354
355
356
357
358
359
360
361







-
+







-
+













-
+






-
+






-
+








-
+







<p>New database files created by SQLite use format 1 by default, so
that database files created by newer versions of SQLite can still
be read by older versions of SQLite.
The [legacy_file_format pragma] can be used to cause SQLite
to create new database files using format 4.  Future versions of 
SQLite may begin to create files using format 4 by default.</p>

<h4>1.2.10 Suggested cache size</h4>
<h4>1.2.11 Suggested cache size</h4>

<p>The 4-byte big-endian signed integer at offset 48 is the suggest
cache size in pages for the database file.  The value is a suggestion
only and SQLite is under no obligation to honor it.  The absolute value
of the integer is used as the suggested size.  The suggested cache size
can be set using the [default_cache_size pragma].</p>

<h4>1.2.11 Incremental vacuum settings</h4>
<h4>1.2.12 Incremental vacuum settings</h4>

<p>The two 4-byte big-endian integers at offsets 52 and 64 are used
to manage the [auto_vacuum] and [incremental_vacuum] modes.  If
the integer at offset 52 is zero then pointer-map (ptrmap) pages are
omitted from the database file and neither auto_vacuum nor
incremental_vacuum are supported.  If the integer at offset 52 is
non-zero then it is the page number of the largest root page in the
database file, the database file contain ptrmap pages, and the
mode must be either auto_vacuum or incremental_vacuum.  In this latter
case, the integer at offset 64 is true for incremental_vacuum and
false for auto_vacuum.  If the integer at offset 52 is zero then
the integer at offset 64 must also be zero.</p>

<h4>1.2.12 Text encoding</h4>
<h4>1.2.13 Text encoding</h4>

<p>The 4-byte big-endian integer at offset 56 determines the encoding
used for all text strings stored in the database.  A value of 1 means
UTF-8.  A value of 2 means UTF-16le.  A value of 3 means UTF-16be.
No other values are allowed.</p>

<h4>1.2.13 User version number</h4>
<h4>1.2.14 User version number</h4>

<p>The 4-byte big-endian integer at offset 60 is the user version which
is set and queried by the [user_version pragma].  The user version is
not used by SQLite.</p>

<tcl>hd_fragment validfor {version-valid-for number}</tcl>
<h4>1.2.14 Write library version number and version-valid-for number</h4>
<h4>1.2.15 Write library version number and version-valid-for number</h4>

<p>The 4-byte big-endian integer at offset 96 stores the 
[SQLITE_VERSION_NUMBER] value.  The 4-byte big-ending integer at
offset 92 is the value of the [change counter] when the version number
was stored.  The integer at offset 92 indicates which transaction
the version number is valid for and is sometimes called the
"version-valid-for number".

<h4>1.2.15 Header space reserved for expansion</h4>
<h4>1.2.16 Header space reserved for expansion</h4>

<p>All other bytes of the database file header are reserved for
future expansion and must be set to zero.</p>

<h3>1.3 The Lock-Byte Page</h3>

<p>The lock-byte page is the single page of the database file
509
510
511
512
513
514
515
516

517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535








536
537
538
539
540
541
542
523
524
525
526
527
528
529

530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564







-
+



















+
+
+
+
+
+
+
+







A value of 13 means the page is a leaf table b-tree page.
Any other value is an error.
<tr><td align=center valign=top>1<td align=center valign=top>2<td align=left>
Byte offset into the page of the first freeblock
<tr><td align=center valign=top>3<td align=center valign=top>2<td align=left>
Number of cells on this page
<tr><td align=center valign=top>5<td align=center valign=top>2<td align=left>
Offset to the first byte of the cell content area
Offset to the first byte of the cell content area.  A zero value is used to represent an offset of 65536, which occurs on an empty root page when using a 65536-byte page size.
<tr><td align=center valign=top>7<td align=center valign=top>1<td align=left>
Number of fragmented free bytes within the cell content area
<tr><td align=center valign=top>8<td align=center valign=top>4<td align=left>
The right-most pointer (interior b-tree pages only)
</table></blockquote></center>

<p>The cell pointer array of a b-tree page immediately follows the b-tree
page header.  Let K be the number of cells on the btree.  The cell pointer
array consists of K 2-byte integer offsets to the cell contents.  The
cell pointers are arranged in key order with left-most cell (the cell with the
smallest key) first and the right-most cell (the cell with the largest
key) last.</p>

<p>Cell content is stored in the cell content region of the b-tree page.
SQLite strives to place cells as far toward the end of the b-tree page as
it can, in order to leave space for future growth of the cell pointer array.
The area in between the last cell pointer array entry and the beginning of
the first cell is the unallocated region.
</p>

<p>If a page contains no cells (which is only possible for a root page
of a table that contains no rows) then the offset to the cell content
area will equal the page size minus the bytes of reserved space.  If
the database uses a 65536-byte page size and the reserved space is zero
(the usual value for reserved space) then the cell content offset would
want to be 65536.  However, that integer is too large to be stored in a
2-byte unsigned integer, so a value of 0 is used in its place.

<p>A freeblock is a structure used to identify unallocated space within
a b-tree page.  Freeblocks are organized on a chain.  The first 2 bytes of
a freeblock are a big-endian integer which is the offset in the b-tree page
of the next freeblock in the chain, or zero if the freeblock is the last on
the chain.  The third and fourth bytes of each freeblock form
a big-endian integer which is the size of the freeblock in bytes, including
Changes to pages/wal.in.
48
49
50
51
52
53
54





55
56
57
58
59
60
61
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66







+
+
+
+
+







<li>There is an additional quasi-persistent "<tt>-wal</tt>" file and
    "<tt>-shm</tt> shared memory file associated with each
    database, which can make SQLite less appealing for use as an 
    [application file-format].
<li>There is the extra operation of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
<li>WAL works best with smaller transactions.  WAL does
    not work as well as tranditional rollback journal modes when used on
    exceedingly large transactions (transactions where the size of the
    change to the database file reaches into the gigabyte range).
    
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal file
and then writing changes directly into the database file.  In the
Changes to remove_carets.sh.
8
9
10
11
12
13
14
15

16
17
18
19
20
21
22
8
9
10
11
12
13
14

15
16
17

18
19
20








-
+


-



-
# certain character sequences from those files.  Character sequences
# removed are:
#
#     ^(
#     )^
#     ^
#
echo -n 'Removing ^ characters '
echo 'Removing ^ characters '
find $1 -name '*.html' -print | grep -v matrix | while read file
do
  echo -n .
  mv $file x.html
  sed -e 's/\^(//g' -e 's/)^//g' -e 's/\^//g' x.html >$file
done
echo ' done'