Documentation Source Text

Check-in [34cbc3f991]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Tweaks to the wal.html document.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 34cbc3f99191f8e20c7a82b8a718e3679dad1f19
User & Date: drh 2010-05-24 22:39:48
Context
2010-05-25
23:40
Update appearance with new logo and color scheme. check-in: 225857b614 user: drh tags: trunk
2010-05-24
22:39
Tweaks to the wal.html document. check-in: 34cbc3f991 user: drh tags: trunk
2010-05-08
14:05
Fix typos in the testing document. Change "insure" to "ensure" throughout the website. Ticket [6afbaac77a52917fc]. check-in: 8c1658c1e2 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/wal.in.

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
..
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
..
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
..
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
...
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
...
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
...
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216


217

218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
Beginning with [version 3.7.0], a new "Write-Ahead Log" option
(hereafter referred to as "WAL") is available.</p>

<p>There are advantages and disadvantages to using WAL instead of
a rollback journal.  Advantages include:</p>

<ol>
<li>WAL is significately faster in the common case of many small transactions.
<li>WAL provides more concurrency as readers do not block writers and 
    are not blocked by a writer.  Reading and writing can proceed 
    concurrently.
<li>The sequence of disk I/O operations tends to be more sequential using WAL.
<li>WAL uses many fewer fsync() operations and is thus less vulnerable to
    problems on systems where the fsync() system call is broken.
</ol>

<p>But there are also disadvantages:</p>

<ol>
................................................................................
<li>Transactions that involve changes against multiple [ATTACH | ATTACHed]
    databases are atomic for each individual database, but are not
    atomic across all databases as a set.
<li>With WAL, it is not possible to change the database page sizes 
    using [VACUUM] or when restoring from a backup using the [backup API].
<li>WAL can be slower than the traditional rollback-journal approach
    for large transactions (such as when using the [VACUUM] command).
<li>There is the extra opertion of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
................................................................................
when the rollback journal is deleted.</p>

<p>The WAL approach inverts this.  The original content is preserved
in the database file and the changes are appended into a separate
WAL file.  A [COMMIT] occurs when a special record indicating a commit
is appended to the WAL.  Thus a COMMIT can happen without ever writing
to the original database, which allows reader to continue operating
from the original, unaltered database while changes are simultaneously
commited into the WAL.  Multiple transactions can be appended to the
end of a single WAL file.</p>

<tcl>hd_fragment ckpt checkpoint checkpointed checkpointing</tcl>
<h3>Checkpointing</h3>

<p>Of course, one wants to eventually transfer all the transactions that
are appended in the WAL file back into the original database.  Moving
................................................................................

<p>When a read operation begins on a WAL-mode database, it first
remembers the location of the last valid commit record in the WAL.
Call this point the "end mark".  Because the WAL can be growing and
adding new commit records while various readers connect to the database,
each reader can potentially have its own end mark.  But for any
particular reader, the end mark is unchanged for the duration of the
transaction, thus insuring that a single read transaction only sees
the database content as it existed at a single point in time.</p>

<p>When a reader needs a page of content, it first checks the WAL to
see if that page appears there, and if so it pulls in the last copy
of the page that occurs in the WAL prior to the reader's end mark.
If no copy of the page exists in the WAL prior to the reader's end mark,
then the page is read from the original database file.  Readers can
exist in separate processes, so to avoid forcing every reader to scan
the entire WAL looking for pages it needs (since WAL file can grow to
multiple megabytes, depending on how often checkpoints are run), a
data structure called the wal-index is maintained in shared memory
which helps readers locate pages in the WAL quickly and with a minimum
of I/O.  The wal-index greatly improves the performance of readers,
but the use of shared memory means that all readers must exist on the
same machine.  This is why the write-ahead log implementation will not
work on a network filesystem.</p>

<p>Writers merely append new content to the end of the WAL file.
................................................................................
a time.</p>

<p>A checkpoint operation takes all the content from the WAL file
and transfers it back into the original database file.
As long as all readers are looking at the entire WAL file (in other
words, as long as no reader is using an end mark that is earlier
than the actual end of the WAL) then it will be OK for the checkpoint
to run in parallel with the readers.  The database file changes
that the checkpointer makes will not interfere with the readers, because
the database pages changed all also exist in the WAL and so
the readers will never access them.  Thus, a checkpoint will never
block a reader and readers will not block a checkpoint as long as all
readers are looking at the entire WAL.  But, older readers that only
see a prefix of the WAL will block a checkpoint.  And writers will
block a checkpoint.  And a checkpoint will block a writer.</p>

<tcl>hd_fragment fast</tcl>
<h3>Performance Considerations</h3>
................................................................................
write transactions.</p>

<p>The default strategy is to allow successive write transactions to
grow the WAL until the WAL becomes about 1000 pages in size, then to
run a single checkpoint operation.  By default, the checkpoint will be
run automatically by the same thread that does the COMMIT that pushes
the WAL over its size limit.  This has the effect of causing most
COMMIT operations to be very fast but an occasional COMMIT that triggers
a checkpoint to be much slower.  If that effect is undersirable, then
the application can disable automatic checkpointing and run the
periodic checkpoints in a separate thread, or separate process.</p>

<p>Notice that there is a tradeoff between average read performance
and average write performance.  To maximize the read performance,
one wants to keep the WAL as small as possible and hence run checkpoints
frequently, perhaps as often as every COMMIT.  To maximize
write performance, one wants to amortize the cost of each checkpoint
over as many writes as possible, meaning that one wants to run checkpoints
infrequently and let the WAL grow as large as possible before each 
checkpoint.  The decision of how often to run checkpoints may therefore
vary from one application to another depending on the relative read
and write performance requirements of the application.
The default strategy of running a checkpoint once the WAL
reaches 1000 pages seems to work well in test applications on 
workstations, but other strategies might work better on different 
platforms or for different workloads.</p>

<h2>Activating And Configuring WAL Mode</h2>

<p>An SQLite database connection defaults to 
[journal_mode | journal_mode=DELETE].  To convert to WAL mode, use the
................................................................................
On success, the pragma will return the string "<tt>wal</tt>".  If 
the conversion to WAL could not be completed (for example, if the VFS
does not support the necessary shared-memory primitives) then the
journaling mode will be unchanged and the string returned from the
primitive will be the prior journaling mode (for example "<tt>delete</tt>").

<p>By default, SQLite will automatically checkpoint whenever the WAL
file reaches or exceeds 1000 pages of change, or when the last database
connection on a database file closes.  The default configuration is
intended to work well for most applications.  But programs that want
more control can force a checkpoint
using the [wal_checkpoint pragma] or by calling the
[sqlite3_wal_checkpoint()] C interface.  The automatic checkpoint
threshold can be changes or automatic checkpointing can be completely
disabled using the [wal_autocheckpoint pragma] or by calling the
[sqlite3_wal_autocheckpoint()] C interface.  A program can also 
use [sqlite3_wal_hook()] to register a callback to be invoked whenever
any transaction commits to the WAL.  This callback can then invoke
[sqlite3_wal_checkpoint()] to for a checkpoint based on whatever
criteria it thinks is appropriate.  (The automatic checkpoint mechanism
is implemented as a simple wrapper around [sqlite3_wal_hook()].)</p>

<h2>Backwards Compatibility</h2>

<p>The database file format is unchanged for WAL mode.  However, the


WAL file is a new concept and so older versions of SQLite will not know

how to recover a crashed SQLite database that was operating in WAL mode
when the crash occurred.
To prevent older versions of SQLite from trying to recover
a WAL-mode database (and making matters worse) the database file format
version numbers (bytes 18 and 19 in the [database header])
are incremented in WAL mode.
Thus, if an older version of SQLite attempts to connect to an SQLite
database that is operating in WAL mode, it will report an error along
the lines of "file is encrypted or is not a database".</p>

<p>One can explicitly changing out of WAL mode using a pragma such as
this:</p>

<blockquote><pre>
PRAGMA journal_mode=DELETE;
</pre></blockquote>

<p>Deliberately changing out of WAL mode lowers the database file format
version numbers so that older versions of SQLite can again access the
database file.</p>







|



|







 







|







 







|
|







 







|








|

|







 







|
|
|
|







 







|
|













|
|







 







|





|











>
>
|
>





|




|






|
|
|
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
..
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
..
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
..
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
...
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
...
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
...
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
Beginning with [version 3.7.0], a new "Write-Ahead Log" option
(hereafter referred to as "WAL") is available.</p>

<p>There are advantages and disadvantages to using WAL instead of
a rollback journal.  Advantages include:</p>

<ol>
<li>WAL is significantly faster in the common case of many small transactions.
<li>WAL provides more concurrency as readers do not block writers and 
    are not blocked by a writer.  Reading and writing can proceed 
    concurrently.
<li>Disk I/O operations tends to be more sequential using WAL.
<li>WAL uses many fewer fsync() operations and is thus less vulnerable to
    problems on systems where the fsync() system call is broken.
</ol>

<p>But there are also disadvantages:</p>

<ol>
................................................................................
<li>Transactions that involve changes against multiple [ATTACH | ATTACHed]
    databases are atomic for each individual database, but are not
    atomic across all databases as a set.
<li>With WAL, it is not possible to change the database page sizes 
    using [VACUUM] or when restoring from a backup using the [backup API].
<li>WAL can be slower than the traditional rollback-journal approach
    for large transactions (such as when using the [VACUUM] command).
<li>There is the extra operation of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
................................................................................
when the rollback journal is deleted.</p>

<p>The WAL approach inverts this.  The original content is preserved
in the database file and the changes are appended into a separate
WAL file.  A [COMMIT] occurs when a special record indicating a commit
is appended to the WAL.  Thus a COMMIT can happen without ever writing
to the original database, which allows reader to continue operating
from the original unaltered database while changes are simultaneously
committed into the WAL.  Multiple transactions can be appended to the
end of a single WAL file.</p>

<tcl>hd_fragment ckpt checkpoint checkpointed checkpointing</tcl>
<h3>Checkpointing</h3>

<p>Of course, one wants to eventually transfer all the transactions that
are appended in the WAL file back into the original database.  Moving
................................................................................

<p>When a read operation begins on a WAL-mode database, it first
remembers the location of the last valid commit record in the WAL.
Call this point the "end mark".  Because the WAL can be growing and
adding new commit records while various readers connect to the database,
each reader can potentially have its own end mark.  But for any
particular reader, the end mark is unchanged for the duration of the
transaction, thus ensuring that a single read transaction only sees
the database content as it existed at a single point in time.</p>

<p>When a reader needs a page of content, it first checks the WAL to
see if that page appears there, and if so it pulls in the last copy
of the page that occurs in the WAL prior to the reader's end mark.
If no copy of the page exists in the WAL prior to the reader's end mark,
then the page is read from the original database file.  Readers can
exist in separate processes, so to avoid forcing every reader to scan
the entire WAL looking for pages (the WAL file can grow to
multiple megabytes, depending on how often checkpoints are run), a
data structure called the "wal-index" is maintained in shared memory
which helps readers locate pages in the WAL quickly and with a minimum
of I/O.  The wal-index greatly improves the performance of readers,
but the use of shared memory means that all readers must exist on the
same machine.  This is why the write-ahead log implementation will not
work on a network filesystem.</p>

<p>Writers merely append new content to the end of the WAL file.
................................................................................
a time.</p>

<p>A checkpoint operation takes all the content from the WAL file
and transfers it back into the original database file.
As long as all readers are looking at the entire WAL file (in other
words, as long as no reader is using an end mark that is earlier
than the actual end of the WAL) then it will be OK for the checkpoint
to run in parallel with the readers.  The changes to the database file
that the checkpointer copies over from the WAL will not interfere with
the readers, because the modified database pages all also exist in the
WAL and so the readers will never access them.  Thus, a checkpoint will never
block a reader and readers will not block a checkpoint as long as all
readers are looking at the entire WAL.  But, older readers that only
see a prefix of the WAL will block a checkpoint.  And writers will
block a checkpoint.  And a checkpoint will block a writer.</p>

<tcl>hd_fragment fast</tcl>
<h3>Performance Considerations</h3>
................................................................................
write transactions.</p>

<p>The default strategy is to allow successive write transactions to
grow the WAL until the WAL becomes about 1000 pages in size, then to
run a single checkpoint operation.  By default, the checkpoint will be
run automatically by the same thread that does the COMMIT that pushes
the WAL over its size limit.  This has the effect of causing most
COMMIT operations to be very fast but an occasional COMMIT (those that trigger
a checkpoint) to be much slower.  If that effect is undesirable, then
the application can disable automatic checkpointing and run the
periodic checkpoints in a separate thread, or separate process.</p>

<p>Notice that there is a tradeoff between average read performance
and average write performance.  To maximize the read performance,
one wants to keep the WAL as small as possible and hence run checkpoints
frequently, perhaps as often as every COMMIT.  To maximize
write performance, one wants to amortize the cost of each checkpoint
over as many writes as possible, meaning that one wants to run checkpoints
infrequently and let the WAL grow as large as possible before each 
checkpoint.  The decision of how often to run checkpoints may therefore
vary from one application to another depending on the relative read
and write performance requirements of the application.
The default strategy is to run a checkpoint once the WAL
reaches 1000 pages and this strategy seems to work well in test applications on 
workstations, but other strategies might work better on different 
platforms or for different workloads.</p>

<h2>Activating And Configuring WAL Mode</h2>

<p>An SQLite database connection defaults to 
[journal_mode | journal_mode=DELETE].  To convert to WAL mode, use the
................................................................................
On success, the pragma will return the string "<tt>wal</tt>".  If 
the conversion to WAL could not be completed (for example, if the VFS
does not support the necessary shared-memory primitives) then the
journaling mode will be unchanged and the string returned from the
primitive will be the prior journaling mode (for example "<tt>delete</tt>").

<p>By default, SQLite will automatically checkpoint whenever the WAL
file reaches or exceeds 1000 pages, or when the last database
connection on a database file closes.  The default configuration is
intended to work well for most applications.  But programs that want
more control can force a checkpoint
using the [wal_checkpoint pragma] or by calling the
[sqlite3_wal_checkpoint()] C interface.  The automatic checkpoint
threshold can be changed or automatic checkpointing can be completely
disabled using the [wal_autocheckpoint pragma] or by calling the
[sqlite3_wal_autocheckpoint()] C interface.  A program can also 
use [sqlite3_wal_hook()] to register a callback to be invoked whenever
any transaction commits to the WAL.  This callback can then invoke
[sqlite3_wal_checkpoint()] to for a checkpoint based on whatever
criteria it thinks is appropriate.  (The automatic checkpoint mechanism
is implemented as a simple wrapper around [sqlite3_wal_hook()].)</p>

<h2>Backwards Compatibility</h2>

<p>The database file format is unchanged for WAL mode.  However, the
WAL file (located in the same directory or folder a the original database
file and with the same name as the original database with 
"<tt>-wal</tt>" appended) is a new concept and so older versions of 
SQLite will not know
how to recover a crashed SQLite database that was operating in WAL mode
when the crash occurred.
To prevent older versions of SQLite from trying to recover
a WAL-mode database (and making matters worse) the database file format
version numbers (bytes 18 and 19 in the [database header])
are increased to two in WAL mode.
Thus, if an older version of SQLite attempts to connect to an SQLite
database that is operating in WAL mode, it will report an error along
the lines of "file is encrypted or is not a database".</p>

<p>One can explicitly change out of WAL mode using a pragma such as
this:</p>

<blockquote><pre>
PRAGMA journal_mode=DELETE;
</pre></blockquote>

<p>Deliberately changing out of WAL mode changes the database file format
version numbers back to one so that older versions of SQLite can again 
access the database file.</p>