Documentation Source Text

Check-in [627feecddc]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Continuing work on the walformat.html document. This is an incremental check-in.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 627feecddc267ec7b8bfd2bd07961aab9bffeab78901f75703eef76f599c1db4
User & Date: drh 2017-11-13 17:03:01.815
Context
2017-11-13
23:06
Continuing work on the walformat.html document. (check-in: 74c71c744b user: drh tags: trunk)
17:03
Continuing work on the walformat.html document. This is an incremental check-in. (check-in: 627feecddc user: drh tags: trunk)
13:11
Typos in the new walindex.html document. (check-in: 59536a697b user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fileformat2.in.
380
381
382
383
384
385
386

387
388
389
390
391
392
393
"version-valid-for number".

<h3>Header space reserved for expansion</h3>

<p>All other bytes of the database file header are reserved for
future expansion and must be set to zero.</p>


<h2>The Lock-Byte Page</h2>

<p>The lock-byte page is the single page of the database file
that contains the bytes at offsets between 1073741824 and 1073742335,
inclusive.  A database file that is less than or equal to 1073741824 bytes 
in size contains no lock-byte page.  A database file larger than
1073741824 contains exactly one lock-byte page.







>







380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
"version-valid-for number".

<h3>Header space reserved for expansion</h3>

<p>All other bytes of the database file header are reserved for
future expansion and must be set to zero.</p>

<tcl>hd_fragment lockbyte {lock-byte page}</tcl>
<h2>The Lock-Byte Page</h2>

<p>The lock-byte page is the single page of the database file
that contains the bytes at offsets between 1073741824 and 1073742335,
inclusive.  A database file that is less than or equal to 1073741824 bytes 
in size contains no lock-byte page.  A database file larger than
1073741824 contains exactly one lock-byte page.
Changes to pages/walformat.in.
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215




















































































216
217
218
219
220
221
222
</tr>
<tr>
<td>100..119</td>
<td>Five "read marks".  Each read mark is a 32-bit unsigned integer (4 bytes).
</tr>
<tr>
<td>120..127</td>
<td>Unused space set aside for locking.
</tr>
<tr>
<td>128..132</td>
<td>Number of WAL frames that have attempted to be backfilled but which might
    not have been backfilled successfully.
</tr>
<tr>
<td>132..136</td>
<td>Unused space reserved for futher expansion.
</tr>
</table>
</center>





















































































<p><i>TBD:  More information about the header</i>

<h2>WAL-Index Hash Tables</h2>

<p>The hash tables in the shm file are designed to answer the
following question quickly:







|












>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
</tr>
<tr>
<td>100..119</td>
<td>Five "read marks".  Each read mark is a 32-bit unsigned integer (4 bytes).
</tr>
<tr>
<td>120..127</td>
<td>Unused space set aside for 8 file locks.
</tr>
<tr>
<td>128..132</td>
<td>Number of WAL frames that have attempted to be backfilled but which might
    not have been backfilled successfully.
</tr>
<tr>
<td>132..136</td>
<td>Unused space reserved for futher expansion.
</tr>
</table>
</center>

<tcl>hd_fragment locks {WAL-mode locks}</tcl>
<h3>WAL Locks</h3>

<p>Eight bytes of space are set aside in the header to support
file locking using the xShmLock() method in the [sqlite3_io_methods]
object.  These eight bytes are never read nor written by SQLite since
some VFSes (ex: Windows) might implement locks using mandatory file locks.

<p>These are the eight locks supported:

<center>
<i>WAL-Index Locks Controlled By xShmLock()</i><br>
<table border=1>
<tr><th rowspan=2>Name<th colspan=2>Offset
<tr><th>xShmLock<th>File
<tr>
<td>WAL_WRITE_LOCK
<td>0
<td>120
</tr>

<tr>
<td>WAL_CKPT_LOCK
<td>1
<td>121
</tr>

<tr>
<td>WAL_RECOVER_LOCK
<td>2
<td>122
</tr>

<tr>
<td>WAL_READ_LOCK(0)
<td>3
<td>123
</tr>

<tr>
<td>WAL_READ_LOCK(1)
<td>4
<td>124
</tr>

<tr>
<td>WAL_READ_LOCK(2)
<td>5
<td>125
</tr>

<tr>
<td>WAL_READ_LOCK(3)
<td>6
<td>126
</tr>

<tr>
<td>WAL_READ_LOCK(4)
<td>7
<td>127
</tr>
</table>
</center>

<tcl>hd_fragment nbackfill nBackfill</tcl>
<h3>The nBackfill field</h3>

<p>The 32-bit unsigned integer at offset 128 in the WAL-index header
is called the "nBackfill".
this field holds the number of frames in the WAL file for which
an attempt has been made to checkpoint.  In other words, this
is the number of frames that have been copied from the WAL back
into the main database.  The number is actually the number of
"attempts" since the number is updated prior to syncing the
main database, so 

<p>The nBackfill can only be increased while holding the
WAL_CKPT_LOCK.  However, nBackfill is changed to zero
when the WAL file resets, and this happens while holding
the WAL_WRITE_LOCK.



<p><i>TBD:  More information about the header</i>

<h2>WAL-Index Hash Tables</h2>

<p>The hash tables in the shm file are designed to answer the
following question quickly:
321
322
323
324
325
326
327




























































































































































































very small.

<p>Note that each 32768-byte unit of the shm file has its own aHash and
aPgno arrays.  The aHash array for a single unit is only helpful in finding
aPgno entries in that same unit.  The overall FindFrame(P,M) function
needs to do hash lookups beginning with the latest unit and working
backwards to the oldest unit until it finds an answer.



































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
very small.

<p>Note that each 32768-byte unit of the shm file has its own aHash and
aPgno arrays.  The aHash array for a single unit is only helpful in finding
aPgno entries in that same unit.  The overall FindFrame(P,M) function
needs to do hash lookups beginning with the latest unit and working
backwards to the oldest unit until it finds an answer.

<h2>Locking Matrix</h2>

<p>Access is coordinated in WAL mode using both the legacy DELETE-mode
locks controlled by the xLock and xUnlock methods of the [sqlite3_io_methods]
object and the WAL locks controlled by the xShmLock method of the
[sqlite3_io_methods] object.

<p>Conceptually, there is just a single DELETE-mode lock.  The DELETE-mode
lock for a single database connection can be in exactly one of the
following states:
<ol>
<li value='0'> SQLITE_LOCK_NONE (unlocked)
<li value='1'> SQLITE_LOCK_SHARED (reading)
<li value='2'> SQLITE_LOCK_RESERVED (reading, waiting to write)
<li value='3'> SQLITE_LOCK_PENDING (new readers blocked, waiting to write)
<li value='4'> SQLITE_LOCK_EXCLUSIVE (writing)
</ol>
<p>The DELETE-mode locks are stored on the [lock-byte page] of the
main database file.
Only SQLITE_LOCK_SHARED and SQLITE_LOCK_EXCLUSIVE are factors for WAL-mode
databases.
The other locking states are used in rollback-mode, but not in WAL-mode.

<p>The [WAL-mode locks] are described above.

<h3>How the various locks are used</h3>

<p>The following rules show how each of the locks is used.

<ul>
<li><p><b>SQLITE_LOCK_SHARED</b>
<p>
All connections hold SQLITE_LOCK_SHARED continuously while attached
to a WAL-mode database.  This is true for both read/write connections
and read-only connections.
The SQLITE_LOCK_SHARED lock is held even by connections that are
not within transaction.
This is different from rollback mode, where the SQLITE_LOCK_SHARED is
released at the end of each transaction.

<li><p><b>SQLITE_LOCK_EXCLUSIVE</b>
<p>
Connections hold an exclusive lock when change in between WAL mode
and any of the various rollback-modes.  Connections might also attempt to
obtain an EXCLUSIVE lock when they disconnection from WAL mode.  If
a connection is able to obtain an EXCLUSIVE lock, that means it is the
only connection to the database and so it may attempt to checkpoint
and then delete the WAL-index and WAL files.
<p>
When a connection is holding a SHARED lock on the main database,
that will prevent any other connection from acquiring the EXCLUSIVE
lock, which in turn prevents the WAL-index and WAL files from being
deleted out from under other users, and prevents a transition out of
WAL-mode while other users are accessing the database in WAL-mode.


<li><p><b>WAL_WRITE_LOCK</b>
<p>
The WAL_WRITE_LOCK is only locked exclusively.  There is never a shared
lock taken on WAL_WRITE_LOCK.
<p>
An EXCLUSIVE WAL_WRITE_LOCK is held by any connection that is appending
content to the end of the WAL.  Hence, only a single process at a time
can append content to the WAL.  If the WAL file resets as a consequence of
a write, then the [nBackfill] field of the WAL-index header is reset to
zero while holding this lock.
<p>An EXCLUSIVE is also held WAL_WRITE_LOCK, and on several other locking
bytes, when a connection is running [recovery] on the shared WAL-index.

<li><p><b>WAL_CKPT_LOCK</b>
<p>
The WAL_CKPT_LOCK is only locked exclusively.  There is never a shared
lock taken on WAL_CKPT_LOCK.
<p>
An EXCLUSIVE WAL_CKPT_LOCK is held by any connection that is running
a [checkpoint].  The [nBackfill] field of the WAL-index header may be
increased while holding this exclusive lock, but it may not be decreased.
<p>An EXCLUSIVE is also held WAL_CKPT_LOCK, and on several other locking
bytes, when a connection is running [recovery] on the shared WAL-index.

<li><p><b>WAL_RECOVER_LOCK</b>
<p>
The WAL_RECOVER_LOCK is only locked exclusively.  There is never a shared
lock taken on WAL_RECOVER_LOCK.
<p>
An EXCLUSIVE WAL_RECOVER_LOCK is held by any connection that is running
[recovery] to reconstruct the shared WAL-index.
<p>
A read-only connection that is rebuilding its private heap-memory WAL-index
does not hold this lock.  (It cannot, since read-only connections are not
allowed to hold any exclusive locks.)  This lock is only held when rebuilding
the global shared WAL-index contained in the memory-mapped SHM file.
<p>
In addition to locking this byte, a connection running [recovery] also
gets an exclusive lock on all other WAL locks except for WAL_READ_LOCK(0).

<li><p><b>WAL_READ_LOCK(N)</b>
<p>
There are five separate read locks, numbers 0 through 4.
Read locks may be either SHARED or EXCLUSIVE.
Connections obtain a shared lock on one of the read locks bytes while
they are within a transaction.
Connections also obtain an exclusive lock on read locks, one at a time,
for the brief moment while they are updating the values of the corresponding
read-marks.
Read locks 1 through 4 are held exclusively when running [recovery].
<p>
Each read lock byte corresponds to one of the five 32-bit read-mark
integers located in bytes 100 through 119 of the WAL-index header.

</ul>

<h3>Operations that require locks and which locks those operations use</h3>

<ul>
<li><p><b>Transition into and out of WAL-mode</b>

<p>The SQLITE_LOCK_EXCLUSIVE lock must be held by a connection that wants
to transition into our out of WAL mode.
Transitioning into WAL mode is, therefore, just like any other write
transaction, since every write transaction in rollback mode requires
the SQLITE_LOCK_EXCLUSIVE lock.
If the database file is already in WAL mode (hence if the desire it to change
it back into rollback mode) and if there are two
or more connections to the database, then each of these connections will
be holding an SQLITE_LOCK_SHARED lock.  That means that the
SQLITE_LOCK_EXCLUSIVE cannot be obtained, and the transition out of
WAL mode will not be allowed.  This prevents one connection from deleting
WAL mode out from under another.  It also means that the only way to move
a database from WAL mode into rollback mode is to close all but one
connection to the database.


<li><p><b>Close a connection to a WAL mode database</b>

<p>When a database connection closes (via [sqlite3_close()] or
[sqlite3_close_v2()]), an attempt is made to acquire
SQLITE_LOCK_EXCLUSIVE.  If this attempt is successful, that means
the connection that is closing is the last connection to the database.
In that case, it is desirable to clean up the WAL and WAL-index files,
so the closing connection runs a [checkpoint] (while holding
SQLITE_LOCK_EXCLUSIVE) and the deletes both the WAL and WAL-index files.
The SQLITE_LOCK_EXCLUSIVE is not released until after both the
WAL and WAL-index files have been deleted.

<p>If the application invokes
[sqlite3_file_control]([SQLITE_FCNTL_PERSIST_WAL]) on the database
connection prior to closing, then the final checkpoint is still
run but the WAL and WAL-index files are
not deleted as they normally would be.
This leaves the database in a state that allows other processes
without write permission on the database, WAL, or WAL-index files
to open the database read-only.
If the WAL and WAL-index files are missing, then a process that
lacks permission to create and initialize those files will not be
able to open the database, unless the database is designated
as immutable using the [immutable query parameter].


<li><p><b>Create a new WAL and WAL-index as the initial connection to
          a WAL-mode database</b>



<li><p><b>Reconstruct the global shared WAL-index during [recovery]</b>

<li><p><b>Append a new transaction to the end of the WAL</b>

<li><p><b>Read content from the database and WAL as part of a transaction</b>

<li><p><b>Run a checkpoint</b>

<li><p><b>Reset the WAL file</b>

</ul>

<tcl>hd_fragment recovery recovery {WAL-mode crash recovery}</tcl>
<h1>Crash Recovery</h1>

<p>
After an application crash, or a system crash, or a power failure, the
database might be left in an inconsistent state.  When that happens,
the first connection to the database will need to run a recovery
procedure to restore the database to a consistent state before it begins
using the database.  This section describes that procedure.

<p><i>TBD...</i></p>