Documentation Source Text

Check-in [9f38773d3b]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to WAL documentation.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 9f38773d3b6a0d384ca9d47c94d244ff8ed7812e
User & Date: drh 2010-06-15 12:07:58
Context
2010-06-15
12:50
Add the new book by Kay Droessler to the "Books About SQLite" page. check-in: 089a50e46d user: drh tags: trunk
12:07
Updates to WAL documentation. check-in: 9f38773d3b user: drh tags: trunk
2010-06-08
14:21
Fix problem in fts3.html - snippet() takes between 1 and 6 arguments, not between 1 and 4. check-in: 1bbbb8afd0 user: dan tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/atomiccommit.in.

24
25
26
27
28
29
30




31
32
33
34
35
36
37
<p>SQLite has the important property that transactions appear
to be atomic even if the transaction is interrupted by an
operating system crash or power failure.</p>

<p>This article describes the techniques used by SQLite to create the
illusion of atomic commit.</p>





<tcl>hd_fragment hardware</tcl>
<h2>2.0 Hardware Assumptions</h2>

<p>Throughout this article, we will call the mass storage device "disk"
even though the mass storage device might really be flash memory.</p>

<p>We assume that disk is written in chunks which we call a "sector".







>
>
>
>







24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
<p>SQLite has the important property that transactions appear
to be atomic even if the transaction is interrupted by an
operating system crash or power failure.</p>

<p>This article describes the techniques used by SQLite to create the
illusion of atomic commit.</p>

<p>The information in this article applies only when SQLite is not
using a [write-ahead log].  Support for write-ahead logging was added
in SQLite [version 3.7.0].  The write-ahead log is turned off by default.</p>

<tcl>hd_fragment hardware</tcl>
<h2>2.0 Hardware Assumptions</h2>

<p>Throughout this article, we will call the mass storage device "disk"
even though the mass storage device might really be flash memory.</p>

<p>We assume that disk is written in chunks which we call a "sector".

Changes to pages/pragma.in.

371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389

    <p>^The last two forms change the journaling mode.  ^The 4th form
    changes the journaling mode for a specific database connection.
    ^Use "main" for the main database (the database that was opened by
    the original [sqlite3_open()], [sqlite3_open16()], or
    [sqlite3_open_v2()] interface call) and use "temp" for database
    that holds TEMP tables.  ^The 3rd form changes the journaling mode
    on all databases and if the specified mode is not "WAL"
    it also changes the default journaling mode that
    will be used for new databases added by subsequent [ATTACH]
    commands.  ^The default journaling mode is never set to WAL.
    ^The new journal mode is returned.  ^If the journal mode
    could not be changed, the original journal mode is returned.</p>

    <p>^The DELETE journaling mode is the normal behavior.  ^In the DELETE
    mode, the rollback journal is deleted at the conclusion of each
    transaction.  Indeed, the delete operation is the action that causes
    the transaction to commit.
    (See the documented titled <a href="atomiccommit.html">







<
|

<
|







371
372
373
374
375
376
377

378
379

380
381
382
383
384
385
386
387

    <p>^The last two forms change the journaling mode.  ^The 4th form
    changes the journaling mode for a specific database connection.
    ^Use "main" for the main database (the database that was opened by
    the original [sqlite3_open()], [sqlite3_open16()], or
    [sqlite3_open_v2()] interface call) and use "temp" for database
    that holds TEMP tables.  ^The 3rd form changes the journaling mode

    on all databases and it also changes the default journaling mode that
    will be used for new databases added by subsequent [ATTACH]

    commands.  ^The new journal mode is returned.  ^If the journal mode
    could not be changed, the original journal mode is returned.</p>

    <p>^The DELETE journaling mode is the normal behavior.  ^In the DELETE
    mode, the rollback journal is deleted at the conclusion of each
    transaction.  Indeed, the delete operation is the action that causes
    the transaction to commit.
    (See the documented titled <a href="atomiccommit.html">

Changes to pages/wal.in.

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
..
70
71
72
73
74
75
76
77
78
79
80







81
82
83
84
85
86
87
...
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
...
172
173
174
175
176
177
178
179



180
181
182
183
184
185
186
...
217
218
219
220
221
222
223

224
225

226
227
228
229
230
231
232
...
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
Beginning with [version 3.7.0], a new "Write-Ahead Log" option
(hereafter referred to as "WAL") is available.</p>

<p>There are advantages and disadvantages to using WAL instead of
a rollback journal.  Advantages include:</p>

<ol>
<li>WAL is significantly faster in the common case of many small transactions.
<li>WAL provides more concurrency as readers do not block writers and 
    a writer does not block readers.  Reading and writing can proceed 
    concurrently.
<li>Disk I/O operations tends to be more sequential using WAL.
<li>WAL uses many fewer fsync() operations and is thus less vulnerable to
    problems on systems where the fsync() system call is broken.
</ol>
................................................................................
<p>Of course, one wants to eventually transfer all the transactions that
are appended in the WAL file back into the original database.  Moving
the WAL file transactions back into the database is called a
"<i>checkpoint</i>".<p>

<p>Another way to think about the difference between rollback and 
write-ahead log is that in the rollback-journal
approach, there are two primitive operations:  reading from the database
and writing changes into the database but in the write-ahead log approach,
there are now three primitive operations:  reading, writing, and
checkpointing.</p>








<tcl>hd_fragment concurrency {WAL concurrency}</tcl>
<h3>Concurrency</h3>

<p>When a read operation begins on a WAL-mode database, it first
remembers the location of the last valid commit record in the WAL.
Call this point the "end mark".  Because the WAL can be growing and
................................................................................
<p>On the other hand, read performance deteriorates as the WAL file
grows in size since each reader must check the WAL file for the content
and the time needed to check the WAL file is proportional
to the size of the WAL file.  The wal-index helps find content
in the WAL file much faster, but performance still falls off with
increasing WAL file size.  Hence, to maintain good read performance 
it is important to keep the WAL file size down by
run checkpoints at regular intervals.</p>

<p>Checkpointing does require sync operations in order to avoid
the possibility of database corruption following a power loss
or hard reboot.  The WAL must be synced to persistent storage
prior to moving content from the WAL into the database and the
database file must by synced prior to resetting the WAL.
Checkpoint also requires more seeking.
................................................................................
run a checkpoint operation for each subsequent COMMIT until the WAL
is reset to be smalller than 1000 pages.  By default, the checkpoint will be
run automatically by the same thread that does the COMMIT that pushes
the WAL over its size limit.  This has the effect of causing most
COMMIT operations to be very fast but an occasional COMMIT (those that trigger
a checkpoint) to be much slower.  If that effect is undesirable, then
the application can disable automatic checkpointing and run the
periodic checkpoints in a separate thread, or separate process.</p>




<p>Note that with [PRAGMA synchronous] set to NORMAL, the checkpoint
is the only operation to issue an I/O barrier or sync operation
(fsync() on unix or FlushFileBuffers() on windows).  If an application
therefore runs checkpoint in a separate thread or process, the main
thread or process that is doing database queries and updates will never
block on a sync operation.  This helps to prevent "latch-up" in applications
................................................................................
<p>The journal_mode pragma returns a string which is the new journal mode.
On success, the pragma will return the string "<tt>wal</tt>".  If 
the conversion to WAL could not be completed (for example, if the VFS
does not support the necessary shared-memory primitives) then the
journaling mode will be unchanged and the string returned from the
primitive will be the prior journaling mode (for example "<tt>delete</tt>").


<p>By default, SQLite will automatically checkpoint whenever a [COMMIT]
occurs on the WAL file that is larger than 1000 pages, or when the 

last database connection on a database file closes.  The default 
configuration is intended to work well for most applications.
But programs that want more control can force a checkpoint
using the [wal_checkpoint pragma] or by calling the
[sqlite3_wal_checkpoint()] C interface.  The automatic checkpoint
threshold can be changed or automatic checkpointing can be completely
disabled using the [wal_autocheckpoint pragma] or by calling the
................................................................................
[sqlite3_wal_checkpoint()] to for a checkpoint based on whatever
criteria it thinks is appropriate.  (The automatic checkpoint mechanism
is implemented as a simple wrapper around [sqlite3_wal_hook()].)</p>

<h2>Backwards Compatibility</h2>

<p>The database file format is unchanged for WAL mode.  However, the
WAL file (located in the same directory or folder a the original database
file and with the same name as the original database with 
"<tt>-wal</tt>" appended) is a new concept and so older versions of 
SQLite will not know
how to recover a crashed SQLite database that was operating in WAL mode
when the crash occurred.
To prevent older versions of SQLite from trying to recover
a WAL-mode database (and making matters worse) the database file format
version numbers (bytes 18 and 19 in the [database header])
are increased to two in WAL mode.
Thus, if an older version of SQLite attempts to connect to an SQLite
database that is operating in WAL mode, it will report an error along
the lines of "file is encrypted or is not a database".</p>

<p>One can explicitly change out of WAL mode using a pragma such as
this:</p>

<blockquote><pre>
PRAGMA journal_mode=DELETE;
</pre></blockquote>

<p>Deliberately changing out of WAL mode changes the database file format
version numbers back to one so that older versions of SQLite can again 
access the database file.</p>







|







 







|
|


>
>
>
>
>
>
>







 







|







 







|
>
>
>







 







>

<
>







 







|








|












|

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
..
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
...
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
...
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
...
227
228
229
230
231
232
233
234
235

236
237
238
239
240
241
242
243
...
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
Beginning with [version 3.7.0], a new "Write-Ahead Log" option
(hereafter referred to as "WAL") is available.</p>

<p>There are advantages and disadvantages to using WAL instead of
a rollback journal.  Advantages include:</p>

<ol>
<li>WAL is significantly faster in common usage patterns.
<li>WAL provides more concurrency as readers do not block writers and 
    a writer does not block readers.  Reading and writing can proceed 
    concurrently.
<li>Disk I/O operations tends to be more sequential using WAL.
<li>WAL uses many fewer fsync() operations and is thus less vulnerable to
    problems on systems where the fsync() system call is broken.
</ol>
................................................................................
<p>Of course, one wants to eventually transfer all the transactions that
are appended in the WAL file back into the original database.  Moving
the WAL file transactions back into the database is called a
"<i>checkpoint</i>".<p>

<p>Another way to think about the difference between rollback and 
write-ahead log is that in the rollback-journal
approach, there are two primitive operations, reading and writing,
whereas with a write-ahead log
there are now three primitive operations:  reading, writing, and
checkpointing.</p>

<p>By default, SQLite does a checkpoint automatically when the WAL file
reaches a threshold size of 1000 pages.  Applications using WAL do
not have to do anything in order to for these checkpoints to occur.  
But if they want to, applications can adjust the automatic checkpoint
threshold.  Or they can turn off automatic the checkpoint and run 
checkpoints during idle moments or in a separate thread or process.</p>

<tcl>hd_fragment concurrency {WAL concurrency}</tcl>
<h3>Concurrency</h3>

<p>When a read operation begins on a WAL-mode database, it first
remembers the location of the last valid commit record in the WAL.
Call this point the "end mark".  Because the WAL can be growing and
................................................................................
<p>On the other hand, read performance deteriorates as the WAL file
grows in size since each reader must check the WAL file for the content
and the time needed to check the WAL file is proportional
to the size of the WAL file.  The wal-index helps find content
in the WAL file much faster, but performance still falls off with
increasing WAL file size.  Hence, to maintain good read performance 
it is important to keep the WAL file size down by
running checkpoints at regular intervals.</p>

<p>Checkpointing does require sync operations in order to avoid
the possibility of database corruption following a power loss
or hard reboot.  The WAL must be synced to persistent storage
prior to moving content from the WAL into the database and the
database file must by synced prior to resetting the WAL.
Checkpoint also requires more seeking.
................................................................................
run a checkpoint operation for each subsequent COMMIT until the WAL
is reset to be smalller than 1000 pages.  By default, the checkpoint will be
run automatically by the same thread that does the COMMIT that pushes
the WAL over its size limit.  This has the effect of causing most
COMMIT operations to be very fast but an occasional COMMIT (those that trigger
a checkpoint) to be much slower.  If that effect is undesirable, then
the application can disable automatic checkpointing and run the
periodic checkpoints in a separate thread, or separate process.
(Links to commands and interfaces to accomplish this are
<a href="#how_to_checkpoint">shown below</a>.)</p>


<p>Note that with [PRAGMA synchronous] set to NORMAL, the checkpoint
is the only operation to issue an I/O barrier or sync operation
(fsync() on unix or FlushFileBuffers() on windows).  If an application
therefore runs checkpoint in a separate thread or process, the main
thread or process that is doing database queries and updates will never
block on a sync operation.  This helps to prevent "latch-up" in applications
................................................................................
<p>The journal_mode pragma returns a string which is the new journal mode.
On success, the pragma will return the string "<tt>wal</tt>".  If 
the conversion to WAL could not be completed (for example, if the VFS
does not support the necessary shared-memory primitives) then the
journaling mode will be unchanged and the string returned from the
primitive will be the prior journaling mode (for example "<tt>delete</tt>").

<a name="how_to_checkpoint">
<p>By default, SQLite will automatically checkpoint whenever a [COMMIT]

occurs that causes the WAL file to be 1000 pages or more in size, or when the 
last database connection on a database file closes.  The default 
configuration is intended to work well for most applications.
But programs that want more control can force a checkpoint
using the [wal_checkpoint pragma] or by calling the
[sqlite3_wal_checkpoint()] C interface.  The automatic checkpoint
threshold can be changed or automatic checkpointing can be completely
disabled using the [wal_autocheckpoint pragma] or by calling the
................................................................................
[sqlite3_wal_checkpoint()] to for a checkpoint based on whatever
criteria it thinks is appropriate.  (The automatic checkpoint mechanism
is implemented as a simple wrapper around [sqlite3_wal_hook()].)</p>

<h2>Backwards Compatibility</h2>

<p>The database file format is unchanged for WAL mode.  However, the
WAL file (located in the same directory or folder as the original database
file and with the same name as the original database with 
"<tt>-wal</tt>" appended) is a new concept and so older versions of 
SQLite will not know
how to recover a crashed SQLite database that was operating in WAL mode
when the crash occurred.
To prevent older versions of SQLite from trying to recover
a WAL-mode database (and making matters worse) the database file format
version numbers (bytes 18 and 19 in the [database header])
are increased from 1 to 2 in WAL mode.
Thus, if an older version of SQLite attempts to connect to an SQLite
database that is operating in WAL mode, it will report an error along
the lines of "file is encrypted or is not a database".</p>

<p>One can explicitly change out of WAL mode using a pragma such as
this:</p>

<blockquote><pre>
PRAGMA journal_mode=DELETE;
</pre></blockquote>

<p>Deliberately changing out of WAL mode changes the database file format
version numbers back to 1 so that older versions of SQLite can once again 
access the database file.</p>