Documentation Source Text

Check-in [74c71c744b]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Continuing work on the walformat.html document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 74c71c744b4646507a0704b070ece46713ef13e4db8f5be677a5d234682400bf
User & Date: drh 2017-11-13 23:06:54.415
Context
2017-11-25
14:02
Add a description of the "^" syntax to fts5.in. (check-in: 58bbcf8b2f user: dan tags: trunk)
2017-11-13
23:06
Continuing work on the walformat.html document. (check-in: 74c71c744b user: drh tags: trunk)
17:03
Continuing work on the walformat.html document. This is an incremental check-in. (check-in: 627feecddc user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fileformat2.in.
1805
1806
1807
1808
1809
1810
1811












1812

1813
1814
1815
1816
1817





1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830

1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851

1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
^Then valid content of the WAL is transferred into the database file.
^Finally, the database is flushed to persistent storage using another
xSync method call.
The xSync operations serve as write barriers - all writes launched
before the xSync must complete before any write that launches after the
xSync begins.</p>













<p>^After a checkpoint, new write transactions overwrite

the WAL file from the beginning.  ^At the start of the first new
write transaction, the WAL header salt-1 value is incremented
and the salt-2 value is randomized.  These changes to the salts invalidate
old frames in the WAL that have already been checkpointed but not yet
overwritten, and prevent them from being checkpointed again.</p>






<tcl>hd_fragment walread {WAL read algorithm}</tcl>
<h2>Reader Algorithm</h2>

<p>^(To read a page from the database (call it page number P), a reader
first checks the WAL to see if it contains page P.  If so, then the
last valid instance of page P that is followed by a commit frame
or is a commit frame itself becomes the value read.)^  ^If the WAL
contains no copies of page P that are valid and which are a commit
frame or are followed by a commit frame, then page P is read from
the database file.</p>

<p>To start a read transaction, the reader records the index of the last

valid frame in the WAL.  The reader uses this recorded "mxFrame" value
for all subsequent read operations.  New transactions can be appended
to the WAL, but as long as the reader uses its original mxFrame value
and ignores subsequently appended content, the reader will see a 
consistent snapshot of the database from a single point in time.  
^This technique allows multiple concurrent readers to view different 
versions of the database content simultaneously.</p>

<p>The reader algorithm in the previous paragraphs works correctly, but 
because frames for page P can appear anywhere within the WAL, the
reader has to scan the entire WAL looking for page P frames.  If the
WAL is large (multiple megabytes is typical) that scan can be slow,
and read performance suffers.  ^To overcome this problem, a separate
data structure called the wal-index is maintained to expedite the
search for frames of a particular page.</p>

<tcl>hd_fragment walindexformat {wal-index} {WAL-index format}</tcl>
<h2>WAL-Index Format</h2>

<p>Conceptually, the wal-index is shared memory, though the current
VFS implementations use a mmapped file for the wal-index.  ^The mmapped

file is in the same directory as the database and has the same name
as the database with a "<tt>-shm</tt>" suffix appended.  Because
the wal-index is shared memory, SQLite does not support 
[PRAGMA journal_mode | journal_mode=WAL] 
on a network filesystem when clients are on different machines.
All users of the database must be able to share the same memory.</p>

<p>The purpose of the wal-index is to answer this question quickly:</p>

<blockquote><i>
Given a page number P and a maximum WAL frame index M,
return the largest WAL frame index for page P that does not exceed M, 
or return NULL if there are no frames for page P that do not exceed M.







>
>
>
>
>
>
>
>
>
>
>
>
|
>
|




>
>
>
>
>












|
>
|















|



|
>




|
|







1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
^Then valid content of the WAL is transferred into the database file.
^Finally, the database is flushed to persistent storage using another
xSync method call.
The xSync operations serve as write barriers - all writes launched
before the xSync must complete before any write that launches after the
xSync begins.</p>

<p>A checkpoint need not run to completion.  It might be that some
readers are still using older transactions with data that is contained
in the database file.  In that case, transferring content for newer
transactions from the WAL file into the database would delete the content
out from under readers still using the older transactions.  To avoid that,
checkpoints only run to completion if all reader are using the
last transaction in the WAL.

<tcl>hd_fragment walreset {reset the WAL} {WAL reset}</tcl>
<h2>WAL Reset</h2>

<p>^After a complete checkpoint, if no other connections are in transactions
that use the WAL, then subsequent write transactions can
overwrite the WAL file from the beginning.  This is called "resetting the
WAL".  ^At the start of the first new
write transaction, the WAL header salt-1 value is incremented
and the salt-2 value is randomized.  These changes to the salts invalidate
old frames in the WAL that have already been checkpointed but not yet
overwritten, and prevent them from being checkpointed again.</p>

<p>The WAL file can optionally be truncated on a reset, but it need not be.
Performance is usually a little better if the WAL is not truncate, since
filesystems generally will overwrite an existing file faster than they
will grow a file.

<tcl>hd_fragment walread {WAL read algorithm}</tcl>
<h2>Reader Algorithm</h2>

<p>^(To read a page from the database (call it page number P), a reader
first checks the WAL to see if it contains page P.  If so, then the
last valid instance of page P that is followed by a commit frame
or is a commit frame itself becomes the value read.)^  ^If the WAL
contains no copies of page P that are valid and which are a commit
frame or are followed by a commit frame, then page P is read from
the database file.</p>

<p>To start a read transaction, the reader records the number of value
frames in the WAL as "mxFrame".  ([mxFrame|More detail])
The reader uses this recorded mxFrame value
for all subsequent read operations.  New transactions can be appended
to the WAL, but as long as the reader uses its original mxFrame value
and ignores subsequently appended content, the reader will see a 
consistent snapshot of the database from a single point in time.  
^This technique allows multiple concurrent readers to view different 
versions of the database content simultaneously.</p>

<p>The reader algorithm in the previous paragraphs works correctly, but 
because frames for page P can appear anywhere within the WAL, the
reader has to scan the entire WAL looking for page P frames.  If the
WAL is large (multiple megabytes is typical) that scan can be slow,
and read performance suffers.  ^To overcome this problem, a separate
data structure called the wal-index is maintained to expedite the
search for frames of a particular page.</p>

<tcl>hd_fragment walindexformat</tcl>
<h2>WAL-Index Format</h2>

<p>Conceptually, the wal-index is shared memory, though the current
VFS implementations use a memory-mapped file for operating-system
portability.  ^The memory-mapped
file is in the same directory as the database and has the same name
as the database with a "<tt>-shm</tt>" suffix appended.  Because
the wal-index is shared memory, SQLite does not support 
[PRAGMA journal_mode | journal_mode=WAL] 
on a network filesystem when clients are on different machines, as
all clients of the database must be able to share the same memory.</p>

<p>The purpose of the wal-index is to answer this question quickly:</p>

<blockquote><i>
Given a page number P and a maximum WAL frame index M,
return the largest WAL frame index for page P that does not exceed M, 
or return NULL if there are no frames for page P that do not exceed M.
1877
1878
1879
1880
1881
1882
1883
1884
1885
Hence, unlike the database and WAL file formats which store all values
as big endian, the wal-index stores multi-byte values in the native
byte order of the host computer.</p>

<p>This document is concerned with the persistent state of the database
file, and since the wal-index is a transient structure, no further 
information about the format of the wal-index will be provided here.
Complete details on the format of the wal-index are contained within
comments in SQLite source code.</p>







|
|
1897
1898
1899
1900
1901
1902
1903
1904
1905
Hence, unlike the database and WAL file formats which store all values
as big endian, the wal-index stores multi-byte values in the native
byte order of the host computer.</p>

<p>This document is concerned with the persistent state of the database
file, and since the wal-index is a transient structure, no further 
information about the format of the wal-index will be provided here.
Additional details on the format of the wal-index are contained in
the separate [WAL-index File Format] document.</p>
Changes to pages/walformat.in.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<title>WAL-mode File Format</title>
<tcl>hd_keywords {wal-mode file format}</tcl>

<table_of_contents>

<p>This document describes low-level details on how [WAL mode] is
implemented on unix and windows.

<p>The separate [file format] description provides details on the
structure of a database file and of the write-head log file used in
[WAL mode].  But details of the locking protocol and of the format
of the [WAL-index format] are deliberately omitted since those details
are left to descretion of individual [VFS] implementations.  This
document fills in those missing details for the unix and windows [VFSes].

<p>For completeness, some of the higher level formatting information
contains in the [file format] document and elsewhere is replicated here,
when it pertains to WAL mode processing.


|









|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<title>WAL-mode File Format</title>
<tcl>hd_keywords {wal-index} {WAL-index format} {WAL-index File Format}</tcl>

<table_of_contents>

<p>This document describes low-level details on how [WAL mode] is
implemented on unix and windows.

<p>The separate [file format] description provides details on the
structure of a database file and of the write-head log file used in
[WAL mode].  But details of the locking protocol and of the format
of the WAL-index are deliberately omitted since those details
are left to descretion of individual [VFS] implementations.  This
document fills in those missing details for the unix and windows [VFSes].

<p>For completeness, some of the higher level formatting information
contains in the [file format] document and elsewhere is replicated here,
when it pertains to WAL mode processing.

84
85
86
87
88
89
90







91
92
93
94
95
96
97
<h2>Variations</h2>

<p>When [PRAGMA locking_mode=EXCLUSIVE] is set, only a
single client is allowed to have the database open at one time.  Since
only a single client can use the database, the shm file is omitted.
The single client uses a buffer in heap memory as a substitute for the
memory-mapped shm file.








<h1>The WAL-Index File Format</h1>

<p>
The WAL-index or "shm" file is used to coordinate access to the database
by multiple clients, and as a cache to help clients quickly locate frames
within the wal file.







>
>
>
>
>
>
>







84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
<h2>Variations</h2>

<p>When [PRAGMA locking_mode=EXCLUSIVE] is set, only a
single client is allowed to have the database open at one time.  Since
only a single client can use the database, the shm file is omitted.
The single client uses a buffer in heap memory as a substitute for the
memory-mapped shm file.

<p>If a read/write client invokes
[sqlite3_file_control]([SQLITE_FCNTL_PERSIST_WAL]) prior to shutdown,
then at shutdown a checkpoint is still run, but the shm file and wal
file are not deleted.
This allows subsequent read-only clients to connect to and read the
database.

<h1>The WAL-Index File Format</h1>

<p>
The WAL-index or "shm" file is used to coordinate access to the database
by multiple clients, and as a cache to help clients quickly locate frames
within the wal file.
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215






































216
217
218
219
220
221
222
The size of integers may be 8, 16, 32, or 64 bits.
A detailed breakout of the individual fields of the shm header
follows:

<center>
<i>WAL-Index Header Details</i><br>
<table width="80%" border=1>
<tr><th>Bytes</th><th>Meaning</th></tr>
<tr>
<td>0..3</td>
<td>The WAL-index format version number.  Always 3007000.</td>
</tr>
<tr>
<td>4..7</td>
<td>Unused padding space.  Must be zero.
</tr>
<tr>
<td>8..11</td>
<td>Unsigned integer counter, incremented with each transaction
</tr>
<tr>
<td>12</td>
<td>The "isInit" flag.  1 when the shm file has been initialized.
</tr>
<tr>
<td>13</td>
<td>True if the WAL file uses big-ending checksums.  0 if the WAL
    uses little-endian checksums.
</tr>
<tr>
<td>14..15</td>
<td>The database page size in bytes, or 1 if the page size is 65536.
</tr>
<tr>
<td>16..19</td>
<td>Index of the last valid frame in the WAL file.
</tr>
<tr>
<td>20..23</td>
<td>Size of the database file in pages.
</tr>
<tr>
<td>24..31</td>
<td>Checksum of the last frame in the WAL file.
</tr>
<tr>
<td>32..39</td>
<td>The two salt value copied from the WAL file header.
</tr>
<tr>
<td>40..47</td>
<td>A checksum over bytes 0 through 39 of this header.
</tr>
<tr>
<td>48..95</td>
<td>A copy of bytes 0 through 47 of this header.
</tr>
<tr>
<td>96..99</td>
<td>Number of WAL frames that have already been backfilled into the database
    by prior checkpoints
</tr>
<tr>
<td>100..119</td>
<td>Five "read marks".  Each read mark is a 32-bit unsigned integer (4 bytes).
</tr>
<tr>
<td>120..127</td>
<td>Unused space set aside for 8 file locks.
</tr>
<tr>
<td>128..132</td>
<td>Number of WAL frames that have attempted to be backfilled but which might
    not have been backfilled successfully.
</tr>
<tr>
<td>132..136</td>
<td>Unused space reserved for futher expansion.
</tr>
</table>
</center>







































<tcl>hd_fragment locks {WAL-mode locks}</tcl>
<h3>WAL Locks</h3>

<p>Eight bytes of space are set aside in the header to support
file locking using the xShmLock() method in the [sqlite3_io_methods]
object.  These eight bytes are never read nor written by SQLite since







|

|



|



|



|



|




|



|
|


|



|



|



|



|



|




|



|



|




|




>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
The size of integers may be 8, 16, 32, or 64 bits.
A detailed breakout of the individual fields of the shm header
follows:

<center>
<i>WAL-Index Header Details</i><br>
<table width="80%" border=1>
<tr><th>Bytes</th><th>Name</th><th>Meaning</th></tr>
<tr>
<td>0..3</td><td>iVersion</td>
<td>The WAL-index format version number.  Always 3007000.</td>
</tr>
<tr>
<td>4..7</td><td>&nbsp;</td>
<td>Unused padding space.  Must be zero.
</tr>
<tr>
<td>8..11</td><td>iChange</td>
<td>Unsigned integer counter, incremented with each transaction
</tr>
<tr>
<td>12</td><td>isInit</td>
<td>The "isInit" flag.  1 when the shm file has been initialized.
</tr>
<tr>
<td>13</td><td>bigEndCksum</td>
<td>True if the WAL file uses big-ending checksums.  0 if the WAL
    uses little-endian checksums.
</tr>
<tr>
<td>14..15</td><td>szPage</td>
<td>The database page size in bytes, or 1 if the page size is 65536.
</tr>
<tr>
<td>16..19</td><td>mxFrame</td>
<td>Number of valid frames in the WAL file.
</tr>
<tr>
<td>20..23</td><td>nPage</td>
<td>Size of the database file in pages.
</tr>
<tr>
<td>24..31</td><td>aFrameCksum</td>
<td>Checksum of the last frame in the WAL file.
</tr>
<tr>
<td>32..39</td><td>aSalt</td>
<td>The two salt value copied from the WAL file header.
</tr>
<tr>
<td>40..47</td><td>aCksum</td>
<td>A checksum over bytes 0 through 39 of this header.
</tr>
<tr>
<td>48..95</td><td>&nbsp;</td>
<td>A copy of bytes 0 through 47 of this header.
</tr>
<tr>
<td>96..99</td><td>nBackfill</td>
<td>Number of WAL frames that have already been backfilled into the database
    by prior checkpoints
</tr>
<tr>
<td>100..119</td><td>read-mark&#91;0..4&#93;
<td>Five "read marks".  Each read mark is a 32-bit unsigned integer (4 bytes).
</tr>
<tr>
<td>120..127</td><td>&nbsp;</td>
<td>Unused space set aside for 8 file locks.
</tr>
<tr>
<td>128..132</td><td>nBackfillAttempted</td>
<td>Number of WAL frames that have attempted to be backfilled but which might
    not have been backfilled successfully.
</tr>
<tr>
<td>132..136</td><td>&nbsp;</td>
<td>Unused space reserved for futher expansion.
</tr>
</table>
</center>

<tcl>hd_fragment mxframe mxFrame</tcl>
<h3>The mxFrame field</h3>

<p>The 32-bit unisgned integer at offset 16 (and repeated at offset 64)
is the number of valid frames in the WAL.
<p>
When this field is zero, it indicates that the WAL is empty and that
all content should be obtained directly from the database file.
<p>
When mxFrame is equal to [nBackfill], that indicates that all content
in the WAL has been written back into the database.
In that case, all content can be read directly from the database.
Furthermore, the next writer is free to [reset the WAL] if no other
connections are holding locks on WAL_READ_LOCK(N) for N>0.
<p>
The mxFrame value is always greater than or equal to both
[nBackfill] and nBackfillAttempted.


<tcl>hd_fragment nbackfill nBackfill</tcl>
<h3>The nBackfill field</h3>

<p>The 32-bit unsigned integer at offset 128 in the WAL-index header
is called the "nBackfill".
this field holds the number of frames in the WAL file which
have been copied back into the main database.

<p>The nBackfill number is never greater than [mxFrame].
When nBackfill equals [mxFrame], that means that the WAL content
has been completely written back into the database and it is
ok to [reset the WAL] if there are no locks held on any of WAL_READ_LOCK(N)
for N>0.

<p>The nBackfill can only be increased while holding the
WAL_CKPT_LOCK.  However, nBackfill is changed to zero
during a [WAL reset], and this happens while holding
the WAL_WRITE_LOCK.

<tcl>hd_fragment locks {WAL-mode locks}</tcl>
<h3>WAL Locks</h3>

<p>Eight bytes of space are set aside in the header to support
file locking using the xShmLock() method in the [sqlite3_io_methods]
object.  These eight bytes are never read nor written by SQLite since
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
<td>WAL_READ_LOCK(4)
<td>7
<td>127
</tr>
</table>
</center>

<tcl>hd_fragment nbackfill nBackfill</tcl>
<h3>The nBackfill field</h3>

<p>The 32-bit unsigned integer at offset 128 in the WAL-index header
is called the "nBackfill".
this field holds the number of frames in the WAL file for which
an attempt has been made to checkpoint.  In other words, this
is the number of frames that have been copied from the WAL back
into the main database.  The number is actually the number of
"attempts" since the number is updated prior to syncing the
main database, so 

<p>The nBackfill can only be increased while holding the
WAL_CKPT_LOCK.  However, nBackfill is changed to zero
when the WAL file resets, and this happens while holding
the WAL_WRITE_LOCK.



<p><i>TBD:  More information about the header</i>

<h2>WAL-Index Hash Tables</h2>

<p>The hash tables in the shm file are designed to answer the







<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







320
321
322
323
324
325
326

















327
328
329
330
331
332
333
<td>WAL_READ_LOCK(4)
<td>7
<td>127
</tr>
</table>
</center>




















<p><i>TBD:  More information about the header</i>

<h2>WAL-Index Hash Tables</h2>

<p>The hash tables in the shm file are designed to answer the
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
This is different from rollback mode, where the SQLITE_LOCK_SHARED is
released at the end of each transaction.

<li><p><b>SQLITE_LOCK_EXCLUSIVE</b>
<p>
Connections hold an exclusive lock when change in between WAL mode
and any of the various rollback-modes.  Connections might also attempt to
obtain an EXCLUSIVE lock when they disconnection from WAL mode.  If
a connection is able to obtain an EXCLUSIVE lock, that means it is the
only connection to the database and so it may attempt to checkpoint
and then delete the WAL-index and WAL files.
<p>
When a connection is holding a SHARED lock on the main database,
that will prevent any other connection from acquiring the EXCLUSIVE
lock, which in turn prevents the WAL-index and WAL files from being
deleted out from under other users, and prevents a transition out of
WAL-mode while other users are accessing the database in WAL-mode.


<li><p><b>WAL_WRITE_LOCK</b>
<p>
The WAL_WRITE_LOCK is only locked exclusively.  There is never a shared
lock taken on WAL_WRITE_LOCK.
<p>
An EXCLUSIVE WAL_WRITE_LOCK is held by any connection that is appending
content to the end of the WAL.  Hence, only a single process at a time
can append content to the WAL.  If the WAL file resets as a consequence of
a write, then the [nBackfill] field of the WAL-index header is reset to
zero while holding this lock.
<p>An EXCLUSIVE is also held WAL_WRITE_LOCK, and on several other locking
bytes, when a connection is running [recovery] on the shared WAL-index.

<li><p><b>WAL_CKPT_LOCK</b>
<p>







|


















|







478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
This is different from rollback mode, where the SQLITE_LOCK_SHARED is
released at the end of each transaction.

<li><p><b>SQLITE_LOCK_EXCLUSIVE</b>
<p>
Connections hold an exclusive lock when change in between WAL mode
and any of the various rollback-modes.  Connections might also attempt to
obtain an EXCLUSIVE lock when they disconnect from WAL mode.  If
a connection is able to obtain an EXCLUSIVE lock, that means it is the
only connection to the database and so it may attempt to checkpoint
and then delete the WAL-index and WAL files.
<p>
When a connection is holding a SHARED lock on the main database,
that will prevent any other connection from acquiring the EXCLUSIVE
lock, which in turn prevents the WAL-index and WAL files from being
deleted out from under other users, and prevents a transition out of
WAL-mode while other users are accessing the database in WAL-mode.


<li><p><b>WAL_WRITE_LOCK</b>
<p>
The WAL_WRITE_LOCK is only locked exclusively.  There is never a shared
lock taken on WAL_WRITE_LOCK.
<p>
An EXCLUSIVE WAL_WRITE_LOCK is held by any connection that is appending
content to the end of the WAL.  Hence, only a single process at a time
can append content to the WAL.  If a [WAL reset] occurs as a consequence of
a write, then the [nBackfill] field of the WAL-index header is reset to
zero while holding this lock.
<p>An EXCLUSIVE is also held WAL_WRITE_LOCK, and on several other locking
bytes, when a connection is running [recovery] on the shared WAL-index.

<li><p><b>WAL_CKPT_LOCK</b>
<p>
514
515
516
517
518
519
520
521
522








































523
524
525
526
527
528
529
they are within a transaction.
Connections also obtain an exclusive lock on read locks, one at a time,
for the brief moment while they are updating the values of the corresponding
read-marks.
Read locks 1 through 4 are held exclusively when running [recovery].
<p>
Each read lock byte corresponds to one of the five 32-bit read-mark
integers located in bytes 100 through 119 of the WAL-index header.









































</ul>

<h3>Operations that require locks and which locks those operations use</h3>

<ul>
<li><p><b>Transition into and out of WAL-mode</b>








|
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
they are within a transaction.
Connections also obtain an exclusive lock on read locks, one at a time,
for the brief moment while they are updating the values of the corresponding
read-marks.
Read locks 1 through 4 are held exclusively when running [recovery].
<p>
Each read lock byte corresponds to one of the five 32-bit read-mark
integers located in bytes 100 through 119 of the WAL-index header, as
follows:
<center>
<table border=1>
<tr><th>Lock Name<th>Lock offset<th>Read-mark name<th>Read-mark offset
<tr><td>WAL_READ_LOCK(0)<td>123<td>read-mark[0]<td>100..103
<tr><td>WAL_READ_LOCK(1)<td>124<td>read-mark[1]<td>104..107
<tr><td>WAL_READ_LOCK(2)<td>125<td>read-mark[2]<td>108..111
<tr><td>WAL_READ_LOCK(3)<td>126<td>read-mark[3]<td>112..115
<tr><td>WAL_READ_LOCK(4)<td>127<td>read-mark[4]<td>116..119
</table>
</center>
<p>
When a connection holds a shared lock on WAL_READ_LOCK(N), that is a
promise by the connection that it will use the WAL and not the database
file for any database pages that are modified by the first
read-mark[N] entries in the WAL.
The read-mark[0] is always zero.  If a connection holds a shared lock
on WAL_READ_LOCK(0), that means the connection expects to be able to ignore
the WAL and read any content it wants from the main database.
If N>0 then the connection is free to use more of the WAL file beyond
read-mark[N] if it wants to, up to the first mxFrame frames.
But when a connection holds a shared lock on WAL_READ_LOCK(0), that is a
promise that it will never read content from the WAL
and will acquire all content directly from the main database.
<p>
When a checkpoint runs, if it sees a lock on WAL_READ_LOCK(N), then it
must not move WAL content into the main database for more than the first
read-mark[N] frames.  Were it to do so, it would overwrite content that
the process holding the lock was expecting to be able to read out of the
main database file.  A consequence of if this is that f the WAL file
contains more than read-mark[N] frames (if mxFrame>read-mark[N]
for any read-mark for which WAL_READ_LOCK(N) is held by another process),
then the checkpoint cannot run to completion.
<p>
When a writer wants to [reset the WAL], it must ensure that there are
no locks on WAL_READ_LOCK(N) for N>0 because such locks indicate
that some other connection is still using the current WAL file and 
a [WAL reset] would delete content out from those other connections.  It is
ok for a [WAL reset] to occur if other connections are holding WAL_READ_LOCK(0)
because by holding WAL_READ_LOCK(0), those other connections are promising
not to use any content from the WAL.
</ul>

<h3>Operations that require locks and which locks those operations use</h3>

<ul>
<li><p><b>Transition into and out of WAL-mode</b>

564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
without write permission on the database, WAL, or WAL-index files
to open the database read-only.
If the WAL and WAL-index files are missing, then a process that
lacks permission to create and initialize those files will not be
able to open the database, unless the database is designated
as immutable using the [immutable query parameter].


<li><p><b>Create a new WAL and WAL-index as the initial connection to
          a WAL-mode database</b>



<li><p><b>Reconstruct the global shared WAL-index during [recovery]</b>

<li><p><b>Append a new transaction to the end of the WAL</b>

<li><p><b>Read content from the database and WAL as part of a transaction</b>

<li><p><b>Run a checkpoint</b>








|
<
<


<
<







632
633
634
635
636
637
638
639


640
641


642
643
644
645
646
647
648
without write permission on the database, WAL, or WAL-index files
to open the database read-only.
If the WAL and WAL-index files are missing, then a process that
lacks permission to create and initialize those files will not be
able to open the database, unless the database is designated
as immutable using the [immutable query parameter].

<li><p><b>Reconstruct the global shared WAL-index during [recovery]</b>







<li><p><b>Append a new transaction to the end of the WAL</b>

<li><p><b>Read content from the database and WAL as part of a transaction</b>

<li><p><b>Run a checkpoint</b>