Documentation Source Text

Check-in [766ec51bfe]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to the WAL page.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 766ec51bfea7dcfbd2e9488614d6804f6ba74e10
User & Date: drh 2010-06-01 20:50:33.000
Context
2010-06-08
14:21
Fix problem in fts3.html - snippet() takes between 1 and 6 arguments, not between 1 and 4. (check-in: 1bbbb8afd0 user: dan tags: trunk)
2010-06-01
20:50
Updates to the WAL page. (check-in: 766ec51bfe user: drh tags: trunk)
2010-05-29
14:11
Change the icon background to #ffffff. (check-in: 34b189f235 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/wal.in.
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

<p>There are advantages and disadvantages to using WAL instead of
a rollback journal.  Advantages include:</p>

<ol>
<li>WAL is significantly faster in the common case of many small transactions.
<li>WAL provides more concurrency as readers do not block writers and 
    are not blocked by a writer.  Reading and writing can proceed 
    concurrently.
<li>Disk I/O operations tends to be more sequential using WAL.
<li>WAL uses many fewer fsync() operations and is thus less vulnerable to
    problems on systems where the fsync() system call is broken.
</ol>

<p>But there are also disadvantages:</p>







|







10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

<p>There are advantages and disadvantages to using WAL instead of
a rollback journal.  Advantages include:</p>

<ol>
<li>WAL is significantly faster in the common case of many small transactions.
<li>WAL provides more concurrency as readers do not block writers and 
    a writer does not block readers.  Reading and writing can proceed 
    concurrently.
<li>Disk I/O operations tends to be more sequential using WAL.
<li>WAL uses many fewer fsync() operations and is thus less vulnerable to
    problems on systems where the fsync() system call is broken.
</ol>

<p>But there are also disadvantages:</p>
33
34
35
36
37
38
39



40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
<li>Transactions that involve changes against multiple [ATTACH | ATTACHed]
    databases are atomic for each individual database, but are not
    atomic across all databases as a set.
<li>With WAL, it is not possible to change the database page sizes 
    using [VACUUM] or when restoring from a backup using the [backup API].
<li>WAL can be slower than the traditional rollback-journal approach
    for large transactions (such as when using the [VACUUM] command).



<li>There is the extra operation of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal file
and then writing changes directly into the database file.  In the
event of a crash or [ROLLBACK], the original content contained in the
rollback journal is played back into the database file to
revert the database file to its original state.  The [COMMIT] occurs
when the rollback journal is deleted.</p>

<p>The WAL approach inverts this.  The original content is preserved
in the database file and the changes are appended into a separate
WAL file.  A [COMMIT] occurs when a special record indicating a commit
is appended to the WAL.  Thus a COMMIT can happen without ever writing
to the original database, which allows reader to continue operating
from the original unaltered database while changes are simultaneously
committed into the WAL.  Multiple transactions can be appended to the
end of a single WAL file.</p>

<tcl>hd_fragment ckpt checkpoint checkpointed checkpointing</tcl>
<h3>Checkpointing</h3>

<p>Of course, one wants to eventually transfer all the transactions that







>
>
>




















|







33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
<li>Transactions that involve changes against multiple [ATTACH | ATTACHed]
    databases are atomic for each individual database, but are not
    atomic across all databases as a set.
<li>With WAL, it is not possible to change the database page sizes 
    using [VACUUM] or when restoring from a backup using the [backup API].
<li>WAL can be slower than the traditional rollback-journal approach
    for large transactions (such as when using the [VACUUM] command).
<li>There is an additional persistent "*-wal" file associated with each
    database, which can make SQLite less appealing for use as an 
    [application file-format].
<li>There is the extra operation of [checkpointing] which, though automatic
    by default, is still something that application developers need to
    be mindful of.
</ol>

<h2>How WAL Works</h2>

<p>The traditional rollback journal works by writing a copy of the
original unchanged database content into a separate rollback journal file
and then writing changes directly into the database file.  In the
event of a crash or [ROLLBACK], the original content contained in the
rollback journal is played back into the database file to
revert the database file to its original state.  The [COMMIT] occurs
when the rollback journal is deleted.</p>

<p>The WAL approach inverts this.  The original content is preserved
in the database file and the changes are appended into a separate
WAL file.  A [COMMIT] occurs when a special record indicating a commit
is appended to the WAL.  Thus a COMMIT can happen without ever writing
to the original database, which allows reader to continue operating
from the original unaltered database while changes are simultaneously being
committed into the WAL.  Multiple transactions can be appended to the
end of a single WAL file.</p>

<tcl>hd_fragment ckpt checkpoint checkpointed checkpointing</tcl>
<h3>Checkpointing</h3>

<p>Of course, one wants to eventually transfer all the transactions that
105
106
107
108
109
110
111
112
113
114
115
116
117
118


119


120
121


122

123
124
125
126
127
128
129
130
131
132
133



134
135
136
137
138
139
140
141
142
143
144
145
146

147
148
149
150
151
152
153
154
155
156
157

158
159
160
161
162
163
164
165











166
167
168
169
170
171
172
173

<p>Writers merely append new content to the end of the WAL file.
Because writers do nothing that would interfere with the actions of
readers, writers and readers can run at the same time.  However,
since there is only one WAL file, there can only be one writer at
a time.</p>

<p>A checkpoint operation takes all the content from the WAL file
and transfers it back into the original database file.
As long as all readers are looking at the entire WAL file (in other
words, as long as no reader is using an end mark that is earlier
than the actual end of the WAL) then it will be OK for the checkpoint
to run in parallel with the readers.  The changes to the database file
that the checkpointer copies over from the WAL will not interfere with


the readers, because the modified database pages all also exist in the


WAL and so the readers will never access them.  Thus, a checkpoint will never
block a reader and readers will not block a checkpoint as long as all


readers are looking at the entire WAL.  But, older readers that only

see a prefix of the WAL will block a checkpoint.  And writers will
block a checkpoint.  And a checkpoint will block a writer.</p>

<tcl>hd_fragment fast</tcl>
<h3>Performance Considerations</h3>

<p>Write transactions are very fast since they only involve writing
the content once (versus twice for rollback-journal transactions)
and because the writes are all sequential.  Further, syncing the
content to the disk is not required, as long as the application is
willing to sacrifice durability following a power loss or hard reboot.</p>




<p>On the other hand, read performance deteriorates as the WAL file
grows in size since reader must check the WAL file for the content
they need and the time needed to check the WAL file is proportional
to the size of the WAL file.  The wal-index helps readers find content
in the WAL file much faster, but performance still falls off with
increasing WAL file size.  Hence, to maintain good read performance 
it is important to keep the WAL file size down by
run checkpoints at regular intervals.</p>

<p>Checkpointing does require sync operations in order to avoid
the possibility of database corruption following a power loss
or hard reboot.  The WAL must be synced to persistent storage

prior to the start of the checkpoint and the database file must by synced
prior to resetting the WAL.  Checkpoint also requires more seeking.
The checkpointer makes an effort to
do as many sequential page writes to the database as it can (the pages
are transferred from WAL to database in ascending order) but even
then there will typically be many seek operations interspersed among
the page writes.  These factors combine to make checkpoints slower than
write transactions.</p>

<p>The default strategy is to allow successive write transactions to
grow the WAL until the WAL becomes about 1000 pages in size, then to

run a single checkpoint operation.  By default, the checkpoint will be
run automatically by the same thread that does the COMMIT that pushes
the WAL over its size limit.  This has the effect of causing most
COMMIT operations to be very fast but an occasional COMMIT (those that trigger
a checkpoint) to be much slower.  If that effect is undesirable, then
the application can disable automatic checkpointing and run the
periodic checkpoints in a separate thread, or separate process.</p>












<p>Notice that there is a tradeoff between average read performance
and average write performance.  To maximize the read performance,
one wants to keep the WAL as small as possible and hence run checkpoints
frequently, perhaps as often as every COMMIT.  To maximize
write performance, one wants to amortize the cost of each checkpoint
over as many writes as possible, meaning that one wants to run checkpoints
infrequently and let the WAL grow as large as possible before each 
checkpoint.  The decision of how often to run checkpoints may therefore







|

|
|
|
|
|
>
>
|
>
>
|
|
>
>
|
>
|
|








|
>
>
>


|
|
|








>
|
|









>
|







>
>
>
>
>
>
>
>
>
>
>
|







108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199

<p>Writers merely append new content to the end of the WAL file.
Because writers do nothing that would interfere with the actions of
readers, writers and readers can run at the same time.  However,
since there is only one WAL file, there can only be one writer at
a time.</p>

<p>A checkpoint operation takes content from the WAL file
and transfers it back into the original database file.
A checkpoint can run concurrently with readers, however the checkpoint
must stop when it reaches a page in the WAL that is past the read mark
of any current reader.  The checkpoint has to stop at that point because
otherwise it might overwrite part of the database file that the reader
is actively using.  The checkpoint remembers (in the wal-index) how far
it got and will resume transferring content from the WAL to the database
from where it left off on the next invocation.</p>

<p>Thus a long-running read transaction can prevent a checkpointer from
making progress.  But presumably every read transactions will eventually
end and the checkpointer will be able to continue.</p>

<p>Whenever a write operation occurs, the writer checks how much progress
the checkpointer has made, and if the entire WAL has been transferred into
the database and synced and if no readers are making use of the WAL, then
the writer will rewind the WAL back to the beginning and start putting new
transactions at the beginning of the WAL.  This mechanism prevents a WAL
file from growing without bound.</p>

<tcl>hd_fragment fast</tcl>
<h3>Performance Considerations</h3>

<p>Write transactions are very fast since they only involve writing
the content once (versus twice for rollback-journal transactions)
and because the writes are all sequential.  Further, syncing the
content to the disk is not required, as long as the application is
willing to sacrifice durability following a power loss or hard reboot.
(Writers sync the WAL on every transaction commit if
[PRAGMA synchronous] is set to FULL but omit this sync if
[PRAGMA synchronous] is set to NORMAL.)</p>

<p>On the other hand, read performance deteriorates as the WAL file
grows in size since each reader must check the WAL file for the content
and the time needed to check the WAL file is proportional
to the size of the WAL file.  The wal-index helps find content
in the WAL file much faster, but performance still falls off with
increasing WAL file size.  Hence, to maintain good read performance 
it is important to keep the WAL file size down by
run checkpoints at regular intervals.</p>

<p>Checkpointing does require sync operations in order to avoid
the possibility of database corruption following a power loss
or hard reboot.  The WAL must be synced to persistent storage
prior to moving content from the WAL into the database and the
database file must by synced prior to resetting the WAL.
Checkpoint also requires more seeking.
The checkpointer makes an effort to
do as many sequential page writes to the database as it can (the pages
are transferred from WAL to database in ascending order) but even
then there will typically be many seek operations interspersed among
the page writes.  These factors combine to make checkpoints slower than
write transactions.</p>

<p>The default strategy is to allow successive write transactions to
grow the WAL until the WAL becomes about 1000 pages in size, then to
run a checkpoint operation for each subsequent COMMIT until the WAL
is reset to be smalller than 1000 pages.  By default, the checkpoint will be
run automatically by the same thread that does the COMMIT that pushes
the WAL over its size limit.  This has the effect of causing most
COMMIT operations to be very fast but an occasional COMMIT (those that trigger
a checkpoint) to be much slower.  If that effect is undesirable, then
the application can disable automatic checkpointing and run the
periodic checkpoints in a separate thread, or separate process.</p>

<p>Note that with [PRAGMA synchronous] set to NORMAL, the checkpoint
is the only operation to issue an I/O barrier or sync operation
(fsync() on unix or FlushFileBuffers() on windows).  If an application
therefore runs checkpoint in a separate thread or process, the main
thread or process that is doing database queries and updates will never
block on a sync operation.  This helps to prevent "latch-up" in applications
running on a busy disk drive.  The downside to
this configuration is that transactions are no longer durable and
might rollback following a power failure or hard reset.</p>


<p>Notice too that there is a tradeoff between average read performance
and average write performance.  To maximize the read performance,
one wants to keep the WAL as small as possible and hence run checkpoints
frequently, perhaps as often as every COMMIT.  To maximize
write performance, one wants to amortize the cost of each checkpoint
over as many writes as possible, meaning that one wants to run checkpoints
infrequently and let the WAL grow as large as possible before each 
checkpoint.  The decision of how often to run checkpoints may therefore
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
<p>The journal_mode pragma returns a string which is the new journal mode.
On success, the pragma will return the string "<tt>wal</tt>".  If 
the conversion to WAL could not be completed (for example, if the VFS
does not support the necessary shared-memory primitives) then the
journaling mode will be unchanged and the string returned from the
primitive will be the prior journaling mode (for example "<tt>delete</tt>").

<p>By default, SQLite will automatically checkpoint whenever the WAL
file reaches or exceeds 1000 pages, or when the last database
connection on a database file closes.  The default configuration is
intended to work well for most applications.  But programs that want
more control can force a checkpoint
using the [wal_checkpoint pragma] or by calling the
[sqlite3_wal_checkpoint()] C interface.  The automatic checkpoint
threshold can be changed or automatic checkpointing can be completely
disabled using the [wal_autocheckpoint pragma] or by calling the
[sqlite3_wal_autocheckpoint()] C interface.  A program can also 
use [sqlite3_wal_hook()] to register a callback to be invoked whenever
any transaction commits to the WAL.  This callback can then invoke







|
|
|
|
|







217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
<p>The journal_mode pragma returns a string which is the new journal mode.
On success, the pragma will return the string "<tt>wal</tt>".  If 
the conversion to WAL could not be completed (for example, if the VFS
does not support the necessary shared-memory primitives) then the
journaling mode will be unchanged and the string returned from the
primitive will be the prior journaling mode (for example "<tt>delete</tt>").

<p>By default, SQLite will automatically checkpoint whenever a [COMMIT]
occurs on the WAL file that is larger than 1000 pages, or when the 
last database connection on a database file closes.  The default 
configuration is intended to work well for most applications.
But programs that want more control can force a checkpoint
using the [wal_checkpoint pragma] or by calling the
[sqlite3_wal_checkpoint()] C interface.  The automatic checkpoint
threshold can be changed or automatic checkpointing can be completely
disabled using the [wal_autocheckpoint pragma] or by calling the
[sqlite3_wal_autocheckpoint()] C interface.  A program can also 
use [sqlite3_wal_hook()] to register a callback to be invoked whenever
any transaction commits to the WAL.  This callback can then invoke
Changes to pages/whentouse.in.
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
It is designed to replace [http://man.he.net/man3/fopen | fopen()].
</p>

<h2>Situations Where SQLite Works Well</h2>

<ul>

<a name="appfileformat"></a>
<li><p><b>Application File Format</b></p>

<p>
SQLite has been used with great success as the on-disk file format
for desktop applications such as financial analysis tools, CAD
packages, record keeping programs, and so forth.  The traditional
File/Open operation does an sqlite3_open() and executes a







|







52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
It is designed to replace [http://man.he.net/man3/fopen | fopen()].
</p>

<h2>Situations Where SQLite Works Well</h2>

<ul>

<tcl>hd_fragment appfileformat {application file-format}</tcl>
<li><p><b>Application File Format</b></p>

<p>
SQLite has been used with great success as the on-disk file format
for desktop applications such as financial analysis tools, CAD
packages, record keeping programs, and so forth.  The traditional
File/Open operation does an sqlite3_open() and executes a