Documentation Source Text

Check-in [3f66c70302]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Begin adding the "WAL-mode File Format" document. Much work left to be done.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 3f66c70302d1c85a139518a6c5b0a8da2a9c5ce5bebf2b7152b16ff3a7a578ff
User & Date: drh 2017-11-04 17:08:42.880
Context
2017-11-04
20:26
Continuing work on the WAL file format document. (check-in: 0d151f472f user: drh tags: trunk)
17:08
Begin adding the "WAL-mode File Format" document. Much work left to be done. (check-in: 3f66c70302 user: drh tags: trunk)
15:02
Merge fixes from the 3.21.0 release branch. (check-in: 411a4d6dde user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fileformat2.in.
167
168
169
170
171
172
173

174
175
176
177
178
179
180
at offset 16 is 0x00 0x01.
This value can be interpreted as a big-endian
1 and thought of is as a magic number to represent the 65536 page size.
Or one can view the two-byte field as a little endian number and say
that it represents the page size divided by 256.  These two 
interpretations of the page-size field are equivalent.</p>


<h3>File format version numbers</h3>

<p>The file format write version and file format read version at offsets
18 and 19 are intended to allow for enhancements of the file format
in future versions of SQLite.  In current versions of SQLite, both of
these values are 1 for rollback journalling modes and 2 for [WAL]
journalling mode.  If a version of SQLite coded to the current







>







167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
at offset 16 is 0x00 0x01.
This value can be interpreted as a big-endian
1 and thought of is as a magic number to represent the 65536 page size.
Or one can view the two-byte field as a little endian number and say
that it represents the page size divided by 256.  These two 
interpretations of the page-size field are equivalent.</p>

<tcl>hd_fragment vnums {file format version numbers}</tcl>
<h3>File format version numbers</h3>

<p>The file format write version and file format read version at offsets
18 and 19 are intended to allow for enhancements of the file format
in future versions of SQLite.  In current versions of SQLite, both of
these values are 1 for rollback journalling modes and 2 for [WAL]
journalling mode.  If a version of SQLite coded to the current
Added pages/walformat.in.












































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
<title>WAL-mode File Format</title>
<tcl>hd_keywords {wal-mode file format}</tcl>

<table_of_contents>

<p>This document describes low-level details on how [WAL mode] is
implemented on unix and windows.

<p>The separate [file format] description provides details on the
structure of a database file and of the write-head log file for
[WAL mode].  But details of the locking protocol and of the format
of the [WAL-index format] are deliberately omitted since those details
are left to descretion of individual [VFS] implementations.  This
document fills in the missing details for the unix and windows [VFSes].

<p>For completeness, some of the higher level formatting information
contains in the [file format] document and elsewhere is replicated here,
when it pertains to WAL mode processing.

<h1>Files On Disk</h1>

<p>When in active use, the state of a WAL mode database is described
by three separate files:

<ol>
<li> The main database file with an arbitrary name "X".
<li> The write-ahead log file, usually named "X-wal".
<li> The wal-index file, usually named "X-shm".
</ol>

<p>The format of the main database file is as described in the
[file format] document.  The [file format version numbers] at offsets
18 and 19 into the main database must both be 2 to indicate that the
database is in WAL mode.  The main database may have an arbitrary
name allowed by the underlying filesystem.  No special file suffixes
are required, though ".db", ".sqlite", and ".sqlite3" seem to be
popular choices.

<p>The write-ahead log or "wal" file is a roll-forward style log file
that records transactions that have been committed but not yet applied
to the main database.  Details on the format of the wal file are
describe in the [WAL format] subsection of the main [file format]
document.  The wal file is named by appending the four characters
"-wal" to the end of the name of the main database file.  Except
on 8+3 filesystems, such names are not allowed, and in that case
the file suffix is changed to ".WAL".  But as 8+3 filesystems are
increasingly rare, that exceptional case can usually be ignored.

<p>The wal-index file or "shm" file is not actually used as a file.
Rather, individual database clients mmap the shm file and use it
as shared memory for coordinating access to the database and as a cache
for quickly locating frame within the wal file.  The name
of the shm file is the main database file name with the four characters
"-shm" appended.  Or, for 8+3 filesystems, the shm file is the main
database file with the suffix changed to ".SHM".

<p>The shm does not contain any database content and is not required
to recover the database following a crash.  For that reason, the first
client to connect to a quiescent database will normally truncate the
shm file if it exists.  Since the content of the shm file does not need
to be preserved across a crash, the shm file is never fsync()-ed to disk.
In fact, if there were a mechanism by which SQLite could tell the 
operating system to never persist the shm file to disk but always hold
it in cache memory, SQLite would use that mechanism to avoid any
unnecessary disk I/O associated with the shm file.  However, no such
mechanism exists in standard posix.

<p>When a WAL mode database is in active use, all three of the above
files exist.

<p>If the last client using the database shuts down cleanly by
calling [sqlite3_close()], then a [checkpoint] is run automatically
in order to transfer all information from the wal file
over into the main database, then both the shm file
and the wal file are unlinked.  Thus, when the database is not in
use by any client, it is usually the case that only the main
database file exists on disk.
However, if the last client did not call [sqlite3_close()] before it
shut down, then the final cleanup operation does not occur and the
shm and wal files may still exist on disk even when the database is
not in use.

<h2>Variations</h2>

<p>When [PRAGMA locking_mode=EXCLUSIVE] is set, that means that only a
single client is allowed to have the database open at one time.  Since
only a single client can use the database, the shm file is omitted.
The single client uses a buffer in heap memory as a substitute for the
memory-mapped shm file.

<h1>The WAL-Index File Format</h1>

<p>
The WAL-index or "shm" file is used to coordinate access to the database
by multiple clients, and as a cache to help clients quickly locate frames
within the wal file.

<p>
Because the shm file is not involved in recovery, the shm file does not
need to be machine byte-order independent.  Hence, numeric values in
the shm file are written in the native byte order of the host computer,
rather than being converted into a specific cross-platform byte order as
is done with the main database file and the wal file.

<p>
The shm file consists of one or more hash tables, where each hash table
is 32768 bytes in size.  Except, a 136-byte header is carved out of the
front of the very first hash table, so the first hash table is only
32632 bytes in size.  The total size of the shm file is always a multiple
of 32768.  In most cases, the total size of the shm file is exactly 32768
bytes.  The shm file only needs to grow beyond a single hash table if
when the wal file grows very large (more than 4079 frames).  Since the
default [sqlite3_wal_autocheckpoint|automatic checkpoint threshold] is
1000, WAL files rare reach the 4079 threshold needed to make the shm file
grow.

<h2>The WAL-Index Header</h2>

<p>The first 136 bytes of the shm file is a header.  The shm header has
three main divisions as follows:

<center>
<i>WAL-Index Header Divisions</i><br>
<table width="80%" border=1>
<tr><th>Bytes<th>Description
<tr><td valign=top>0..47</td><td>First copy of the WAL Index Information</td>
<tr><td valign=top>48..95</td><td>Second copy of the WAL Index Information</td>
<tr><td valign=top>96..135</td><td>Checkpoint Information and Locks</td>
</table>
</center>

<p>Individual fields of the shm header are all unsigned integers
in the native byte-order of the host machine.
The size of integers may be 8, 16, 32, or 64 bits.
A detailed breakout of the individual fields of the shm header
follows:

<center>
<i>WAL-Index Header Details</i><br>
<table width="80%" border=1>
<tr><th>Bytes</th><th>Meaning</th></tr>
<tr>
<td>0..3</td>
<td>The WAL-index format version number.  Always 3007000.</td>
</tr>
<tr>
<td>4..7</td>
<td>Unused padding space.  Must be zero.
</tr>
<tr>
<td>8..11</td>
<td>Unsigned integer counter, incremented with each transaction
</tr>
<tr>
<td>12</td>
<td>The "isInit" flag.  1 when the shm file has been initialized.
</tr>
<tr>
<td>13</td>
<td>True if the WAL file uses big-ending checksums.  0 if the WAL
    uses little-endian checksums.
</tr>
<tr>
<td>14..15</td>
<td>The database page size in bytes, or 1 if the page size is 65536.
</tr>
<tr>
<td>16..19</td>
<td>Index of the last valid frame in the WAL file.
</tr>
<tr>
<td>20..23</td>
<td>Size of the database file in pages.
</tr>
<tr>
<td>24..31</td>
<td>Checksum of the last frame in the WAL file.
</tr>
<tr>
<td>32..39</td>
<td>The two salt value copied from the WAL file header.
</tr>
<tr>
<td>40..47</td>
<td>A checksum over bytes 0 through 39 of this header.
</tr>
<tr>
<td>48..95</td>
<td>A copy of bytes 0 through 47 of this header.
</tr>
<tr>
<td>96..99</td>
<td>Number of WAL frames that have already been backfilled into the database
    by prior checkpoints
</tr>
<tr>
<td>100..119</td>
<td>Five "read marks".  Each read mark is a 32-bit unsigned integer (4 bytes).
</tr>
<tr>
<td>120..127</td>
<td>Unused space set aside for locking.
</tr>
<tr>
<td>128..132</td>
<td>Number of WAL frames that have attempted to be backfilled but which might
    not have been backfilled successfully.
</tr>
<tr>
<td>132..136</td>
<td>Unused space reserved for futher expansion.
</tr>
</table>
</center>