Documentation Source Text

Check-in [400de5a3af]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Expand the description of the large-transaction limitatioo for WAL mode. Identify additional requirements in the second file format document.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 400de5a3af8b2179f3a027c823c717cf66765278
User & Date: drh 2010-08-31 15:53:57
Context
2010-08-31
16:14
Add a fragment for the custom r-tree query section of the r-tree docs. Add the addition of custom r-tree query to the "changes" page. check-in: c96a663926 user: drh tags: trunk
15:53
Expand the description of the large-transaction limitatioo for WAL mode. Identify additional requirements in the second file format document. check-in: 400de5a3af user: drh tags: trunk
15:03
Add a link to example geometry callback code to rtree.in. check-in: ed402895f8 user: dan tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fileformat2.in.

1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
....
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
....
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
....
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
....
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
<p>^A single WAL file can be reused multiple times.  ^In other words, the
WAL can fill up with frames and then be checkpointed and then new
frames can overwrite the old ones.  ^A WAL always grows from beginning
toward the end.  Checksums and counters attached to each frame are
used to determine which frames within the WAL are valid and which
are leftovers from prior checkpoints.</p>

<p>The WAL header is 32 bytes in size and consists of the following eight
big-endian 32-bit unsigned integer values:</p>

<center>
<i>WAL Header Format</i><br>
<table width="80%" border=1>
<tr><th>Offset<th>Size<th>Description
<tr><td valign=top align=center>0<td valign=top align=center>4
................................................................................
<tr><td valign=top align=center>20<td valign=top align=center>4
    <td>Salt-2: a different random number for each checkpoint
<tr><td valign=top align=center>24<td valign=top align=center>4
    <td>Checksum-1: First part of a checksum on the first 24 bytes of header
<tr><td valign=top align=center>28<td valign=top align=center>4
    <td>Checksum-2: Second part of the checksum on the first 24 bytes of header
</table>
</center>

<p>Immediately following the wal-header are zero or more frames. Each
frame consists of a 24-byte frame-header followed by a <i>page-size</i> bytes
of page data. The frame-header is six big-endian 32-bit unsigned 
integer values, as follows:

<center>
<i>WAL Frame Header Format</i><br>
<table width="80%" border=1>
<tr><th>Offset<th>Size<th>Description
<tr><td valign=top align=center>0<td valign=top align=center>4
................................................................................
<tr><td valign=top align=center>12<td valign=top align=center>4
    <td>Salt-2 copied from the WAL header
<tr><td valign=top align=center>16<td valign=top align=center>4
    <td>Checksum-1:  Cumulative checksum up through and including this page
<tr><td valign=top align=center>20<td valign=top align=center>4
    <td>Checksum-2:  Second half of the cumulative checksum.
</table>
</center>

^(<p>A frame is considered valid if and only if the following conditions are
true:</p>

<ol>
<li><p>The salt-1 and salt-2 values in the frame-header match
       salt values in the wal-header</p></li>
................................................................................
for i from 0 to n-1 step 2:
   s0 += x(i) + s1;
   s1 += x(i+1) + s0;
endfor
# result in s0 and s1
</pre></blockquote>)^

<p>The outputs s0 and s1 are both weighted checksums using Fibonacci weights
in reverse order (the largest Fibonacci weight occurs on the first element
of the sequence being summed.)  The s1 value spans all 32-bit integer
terms of the sequence whereas s0 omits the final term.</p>

<h3>4.3 Checkpoint Algorithm</h3>

<p>On a [checkpoint], the WAL is first flushed to persistent storage using
the xSync method of the [sqlite3_io_methods | VFS]. 
Then valid content of the WAL is transferred into the database file.
Finally, the database is flushed to persistent storage using another
xSync method call.
The xSync operations serve as write barriers - all writes launched
before the xSync must complete before any write that launches after the
xSync begins.</p>

<p>^After each checkpoint, the WAL header salt-1 value is incremented and the 
salt-2 value is randomized.  This prevents old and new frames in the WAL from
................................................................................
data structure called the wal-index is maintained to expedite the
search for frames of a particular page.</p>

<tcl>hd_fragment walindexformat {wal-index} {WAL-index format}</tcl>
<h3>4.5 WAL-Index Format</h3>

<p>Conceptually, the wal-index is shared memory, though the current
VFS implementations use a mmapped file for the wal-index.  The mmapped
file is in the same directory as the database and has the same name
as the database with a "<tt>-shm</tt>" suffix appended.  Because
the wal-index is shared memory, SQLite does not support 
[PRAGMA journal_mode | journal_mode=WAL] 
on a network filesystem when clients are on different machines.
All users of the database must be able to share the same memory.</p>








|







 







|

|

|







 







|







 







|
|
|




|

|
|







 







|







1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
....
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
....
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
....
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
....
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
<p>^A single WAL file can be reused multiple times.  ^In other words, the
WAL can fill up with frames and then be checkpointed and then new
frames can overwrite the old ones.  ^A WAL always grows from beginning
toward the end.  Checksums and counters attached to each frame are
used to determine which frames within the WAL are valid and which
are leftovers from prior checkpoints.</p>

<p>^(The WAL header is 32 bytes in size and consists of the following eight
big-endian 32-bit unsigned integer values:</p>

<center>
<i>WAL Header Format</i><br>
<table width="80%" border=1>
<tr><th>Offset<th>Size<th>Description
<tr><td valign=top align=center>0<td valign=top align=center>4
................................................................................
<tr><td valign=top align=center>20<td valign=top align=center>4
    <td>Salt-2: a different random number for each checkpoint
<tr><td valign=top align=center>24<td valign=top align=center>4
    <td>Checksum-1: First part of a checksum on the first 24 bytes of header
<tr><td valign=top align=center>28<td valign=top align=center>4
    <td>Checksum-2: Second part of the checksum on the first 24 bytes of header
</table>
</center>)^

<p>^Immediately following the wal-header are zero or more frames. ^Each
frame consists of a 24-byte frame-header followed by a <i>page-size</i> bytes
of page data. ^(The frame-header is six big-endian 32-bit unsigned 
integer values, as follows:

<center>
<i>WAL Frame Header Format</i><br>
<table width="80%" border=1>
<tr><th>Offset<th>Size<th>Description
<tr><td valign=top align=center>0<td valign=top align=center>4
................................................................................
<tr><td valign=top align=center>12<td valign=top align=center>4
    <td>Salt-2 copied from the WAL header
<tr><td valign=top align=center>16<td valign=top align=center>4
    <td>Checksum-1:  Cumulative checksum up through and including this page
<tr><td valign=top align=center>20<td valign=top align=center>4
    <td>Checksum-2:  Second half of the cumulative checksum.
</table>
</center>)^

^(<p>A frame is considered valid if and only if the following conditions are
true:</p>

<ol>
<li><p>The salt-1 and salt-2 values in the frame-header match
       salt values in the wal-header</p></li>
................................................................................
for i from 0 to n-1 step 2:
   s0 += x(i) + s1;
   s1 += x(i+1) + s0;
endfor
# result in s0 and s1
</pre></blockquote>)^

<p>^The outputs s0 and s1 are both weighted checksums using Fibonacci weights
in reverse order.  (^The largest Fibonacci weight occurs on the first element
of the sequence being summed.)  ^The s1 value spans all 32-bit integer
terms of the sequence whereas s0 omits the final term.</p>

<h3>4.3 Checkpoint Algorithm</h3>

<p>^On a [checkpoint], the WAL is first flushed to persistent storage using
the xSync method of the [sqlite3_io_methods | VFS]. 
^Then valid content of the WAL is transferred into the database file.
^Finally, the database is flushed to persistent storage using another
xSync method call.
The xSync operations serve as write barriers - all writes launched
before the xSync must complete before any write that launches after the
xSync begins.</p>

<p>^After each checkpoint, the WAL header salt-1 value is incremented and the 
salt-2 value is randomized.  This prevents old and new frames in the WAL from
................................................................................
data structure called the wal-index is maintained to expedite the
search for frames of a particular page.</p>

<tcl>hd_fragment walindexformat {wal-index} {WAL-index format}</tcl>
<h3>4.5 WAL-Index Format</h3>

<p>Conceptually, the wal-index is shared memory, though the current
VFS implementations use a mmapped file for the wal-index.  ^The mmapped
file is in the same directory as the database and has the same name
as the database with a "<tt>-shm</tt>" suffix appended.  Because
the wal-index is shared memory, SQLite does not support 
[PRAGMA journal_mode | journal_mode=WAL] 
on a network filesystem when clients are on different machines.
All users of the database must be able to share the same memory.</p>

Changes to pages/wal.in.

49
50
51
52
53
54
55

56
57
58



59
60
61
62
63
64
65
    "<tt>-shm</tt> shared memory file associated with each
    database, which can make SQLite less appealing for use as an 
    [application file-format].
<li>There is the extra operation of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
<li>WAL works best with smaller transactions.  WAL does

    not work as well as traditional rollback journal modes when used on
    exceedingly large transactions (transactions where the size of the
    change to the database file reaches into the gigabyte range).



    
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal file







>
|
<
|
>
>
>







49
50
51
52
53
54
55
56
57

58
59
60
61
62
63
64
65
66
67
68
    "<tt>-shm</tt> shared memory file associated with each
    database, which can make SQLite less appealing for use as an 
    [application file-format].
<li>There is the extra operation of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
<li>WAL works best with smaller transactions.  WAL does
    not work well for very large transactions.  For transactions larger than
    about 100 megabytes, traditional rollback journal modes will likely

    be faster.  For transactions in excess of a gigabyte, WAL mode may 
    fail with an I/O or disk-full error.
    It is recommended that one of the rollback journal modes be used for
    transactions larger than a few dozen megabytes.
    
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal file