Documentation Source Text

Check-in [c354e08449]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to the lockingv3.html document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: c354e084499b83e6949fd2b8007d7bab544b28c8
User & Date: drh 2010-07-13 13:57:53.000
Context
2010-07-13
14:50
Fix a typo on the temp_store_directory pragma documentation. (check-in: 40fa307f4e user: drh tags: trunk)
13:57
Updates to the lockingv3.html document. (check-in: c354e08449 user: drh tags: trunk)
2010-07-08
18:12
Update PRAGMA journal_mode documentation. Fix typos in the wal.html document. (check-in: b1e171c029 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/lockingv3.in.
27
28
29
30
31
32
33
34
35

36
37
38
39
40
41
42









43
44
45
46
47
48
49
set pnum(6) 0
set pnum(7) 0
set pnum(8) 0

HEADING 1 {File Locking And Concurrency In SQLite Version 3}
</tcl>

<p>Version 3 of SQLite introduces a more complex locking and journaling 
mechanism designed to improve concurrency and reduce the writer starvation 

problem.  The new mechanism also allows atomic commits of transactions
involving multiple database files.
This document describes the new locking mechanism.
The intended audience is programmers who want to understand and/or modify
the pager code and reviewers working to verify the design
of SQLite version 3.
</p>










<tcl>HEADING 1 {Overview} overview</tcl>

<p>
Locking and concurrency control are handled by the the 
<a href="http://www.sqlite.org/src/finfo?name=src/pager.c">
pager module</a>.







|
|
>







>
>
>
>
>
>
>
>
>







27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
set pnum(6) 0
set pnum(7) 0
set pnum(8) 0

HEADING 1 {File Locking And Concurrency In SQLite Version 3}
</tcl>

<p>SQLite [Version 3.0.0] introduced a new locking and journaling 
mechanism designed to improve concurrency over SQLite version 2
and to reduce the writer starvation 
problem.  The new mechanism also allows atomic commits of transactions
involving multiple database files.
This document describes the new locking mechanism.
The intended audience is programmers who want to understand and/or modify
the pager code and reviewers working to verify the design
of SQLite version 3.
</p>

<p>This document was originally created in early 2004 when SQLite version 2
was still in widesprad use and was written to introduce
the new concepts of SQLite version 3 to readers who were already familiar
with SQLite version 2.  But these days, most readers of this document have
probably never seen SQLite version 2 and are only familiar with SQLite
version 3.  Nevertheless, this document continues to serve as an
authoritative reference to how database file locking works in SQLite 
version 3.</p>

<tcl>HEADING 1 {Overview} overview</tcl>

<p>
Locking and concurrency control are handled by the the 
<a href="http://www.sqlite.org/src/finfo?name=src/pager.c">
pager module</a>.
146
147
148
149
150
151
152
153


154

155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170

171
172
173
174
175
176
177
178
A PENDING lock is always just a temporary
stepping stone on the path to an EXCLUSIVE lock and so the pager module
does not track PENDING locks.
</p>

<tcl>HEADING 1 {The Rollback Journal} rollback {rollback journal}</tcl>

<p>Any time a process wants to make a changes to a database file, it


first records enough information in the <em>rollback journal</em> to

restore the database file back to its initial condition.  Thus, before
altering any page of the database, the original contents of that page
must be written into the journal.  The journal also records the initial
size of the database so that if the database file grows it can be truncated
back to its original size on a rollback.</p>

<p>The rollback journal is a ordinary disk file that has the same name as
the database file with the suffix "<tt>-journal</tt>" added.</p>

<p>If SQLite is working with multiple databases at the same time
(using the ATTACH command) then each database has its own journal.
But there is also a separate aggregate journal
called the <em>master journal</em>.
The master journal does not contain page data used for rolling back
changes.  Instead the master journal contains the names of the
individual file journals for each of the ATTACHed databases.   Each of

the individual file journals also contain the name of the master journal.
If there are no ATTACHed databases (or if none of the ATTACHed database
is participating in the current transaction) no master journal is
created and the normal rollback journal contains an empty string
in the place normally reserved for recording the name of the master
journal.</p>

<tcl>hd_fragment hotjrnl {hot journal}</tcl>







|
>
>
|
>
|
|
|



<
<
<

|




|
>
|







156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173



174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
A PENDING lock is always just a temporary
stepping stone on the path to an EXCLUSIVE lock and so the pager module
does not track PENDING locks.
</p>

<tcl>HEADING 1 {The Rollback Journal} rollback {rollback journal}</tcl>

<p>When a process wants to change a database file (and it is not
in [WAL] mode), it
first records the original unchanged database content
in a <em>rollback journal</em>.  The rollback journal is an ordinary
disk file that is always located
in the same directory or folder as the database file and has the
same name as the database file with the addition of a <tt>-journal</tt>
suffix. The rollback journal also records the initial
size of the database so that if the database file grows it can be truncated
back to its original size on a rollback.</p>




<p>If SQLite is working with multiple databases at the same time
(using the [ATTACH] command) then each database has its own rollback journal.
But there is also a separate aggregate journal
called the <em>master journal</em>.
The master journal does not contain page data used for rolling back
changes.  Instead the master journal contains the names of the
individual database rollback journals for each of the ATTACHed databases. 
Each of the individual database rollback journals also contain the name 
of the master journal.
If there are no ATTACHed databases (or if none of the ATTACHed database
is participating in the current transaction) no master journal is
created and the normal rollback journal contains an empty string
in the place normally reserved for recording the name of the master
journal.</p>

<tcl>hd_fragment hotjrnl {hot journal}</tcl>
187
188
189
190
191
192
193
194

195
196
197
198
199
200
201
202
203
204
205
206


207
208
209
210
211
212
213
If everything is working correctly 
(that is, if there are no crashes or power failures)
you will never get a hot journal.
</p>

<p>
If no master journal is involved, then
a journal is hot if it exists and its corresponding database file

does not have a RESERVED lock.
If a master journal is named in the file journal, then the file journal
is hot if its master journal exists and there is no RESERVED
lock on the corresponding database file.
It is important to understand when a journal is hot so the
preceding rules will be repeated in bullets:
</p>

<ul>
<li>A journal is hot if...
    <ul>
    <li>It exists, and</li>


    <li>Its master journal exists or the master journal name is an
        empty string, and</li>
    <li>There is no RESERVED lock on the corresponding database file.</li>
    </ul>
</li>
</ul>








|
>












>
>







198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
If everything is working correctly 
(that is, if there are no crashes or power failures)
you will never get a hot journal.
</p>

<p>
If no master journal is involved, then
a journal is hot if it exists and has a non-zero header
and its corresponding database file
does not have a RESERVED lock.
If a master journal is named in the file journal, then the file journal
is hot if its master journal exists and there is no RESERVED
lock on the corresponding database file.
It is important to understand when a journal is hot so the
preceding rules will be repeated in bullets:
</p>

<ul>
<li>A journal is hot if...
    <ul>
    <li>It exists, and</li>
    <li>Its size is greater than 512 bytes, and</li>
    <li>The journal header is non-zero and well-formed, and</li>
    <li>Its master journal exists or the master journal name is an
        empty string, and</li>
    <li>There is no RESERVED lock on the corresponding database file.</li>
    </ul>
</li>
</ul>

235
236
237
238
239
240
241
242
243
244



245
246
247
248
249
250
251
    (Note: Do not acquire a RESERVED lock because that would make
    other processes think the journal was no longer hot.)  If we
    fail to acquire these locks it means another process
    is already trying to do the rollback.  In that case,
    drop all locks, close the database, and return SQLITE_BUSY. </li>
<li>Read the journal file and roll back the changes.</li>
<li>Wait for the rolled back changes to be written onto 
    the surface of the disk.  This protects the integrity of the database
    in case another power failure or crash occurs.</li>
<li>Delete the journal file.</li>



<li>Delete the master journal file if it is safe to do so.
    This step is optional.  It is here only to prevent stale
    master journals from cluttering up the disk drive.
    See the discussion below for details.</li>
<li>Drop the EXCLUSIVE and PENDING locks but retain the SHARED lock.</li>
</ol>








|

|
>
>
>







249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
    (Note: Do not acquire a RESERVED lock because that would make
    other processes think the journal was no longer hot.)  If we
    fail to acquire these locks it means another process
    is already trying to do the rollback.  In that case,
    drop all locks, close the database, and return SQLITE_BUSY. </li>
<li>Read the journal file and roll back the changes.</li>
<li>Wait for the rolled back changes to be written onto 
    persistent storage.  This protects the integrity of the database
    in case another power failure or crash occurs.</li>
<li>Delete the journal file (or truncate the journal to zero bytes in
    length if [PRAGMA journal_mode | PRAGMA journal_mode=TRUNCATE] is
    set, or zero the journal header if
    [PRAGMA journal_mode | PRAGMA journal_mode=PERSIST] is set).</li>
<li>Delete the master journal file if it is safe to do so.
    This step is optional.  It is here only to prevent stale
    master journals from cluttering up the disk drive.
    See the discussion below for details.</li>
<li>Drop the EXCLUSIVE and PENDING locks but retain the SHARED lock.</li>
</ol>

338
339
340
341
342
343
344


345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
<ol>
<li value="4">
   Obtain an EXCLUSIVE lock on the database file and
   make sure all memory changes have been written to the database file
   using the algorithm of steps 1-3 above.</li>
<li>Flush all database file changes to the disk.  Wait for those changes
    to actually be written onto the disk surface.</li>


<li>Delete the journal file.  This is the instant when the changes are
    committed.  Prior to deleting the journal file, if a power failure
    or crash occurs, the next process to open the database will see that
    it has a hot journal and will roll the changes back.
    After the journal is deleted, there will no longer be a hot journal
    and the changes will persist.
    </li>
<li>Drop the EXCLUSIVE and PENDING locks from the database file.
    </li>
</ol>

<p>As soon as PENDING lock is released from the database file, other
processes can begin reading the database again.  In the current implementation,
the RESERVED lock is also released, but that is not essential.  Future
versions of SQLite might provide a "CHECKPOINT" SQL command that will
commit all changes made so far within a transaction but retain the
RESERVED lock so that additional changes can be made without given
any other process an opportunity to write.</p>








>
>
|










|







355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
<ol>
<li value="4">
   Obtain an EXCLUSIVE lock on the database file and
   make sure all memory changes have been written to the database file
   using the algorithm of steps 1-3 above.</li>
<li>Flush all database file changes to the disk.  Wait for those changes
    to actually be written onto the disk surface.</li>
<li>Delete the journal file.  (Or if the [PRAGMA journal_mode] is TRUNCATE or
    PERSIST, truncate the journal file or zero the header of the journal file,
    respectively.)  This is the instant when the changes are
    committed.  Prior to deleting the journal file, if a power failure
    or crash occurs, the next process to open the database will see that
    it has a hot journal and will roll the changes back.
    After the journal is deleted, there will no longer be a hot journal
    and the changes will persist.
    </li>
<li>Drop the EXCLUSIVE and PENDING locks from the database file.
    </li>
</ol>

<p>As soon as the PENDING lock is released from the database file, other
processes can begin reading the database again.  In the current implementation,
the RESERVED lock is also released, but that is not essential.  Future
versions of SQLite might provide a "CHECKPOINT" SQL command that will
commit all changes made so far within a transaction but retain the
RESERVED lock so that additional changes can be made without given
any other process an opportunity to write.</p>

409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
will prevent new readers from coming in.  Assuming existing readers do
eventually complete, all SHARED locks will eventually clear and the
writer will be given a chance to make its changes.</p>

<tcl>HEADING 1 {How To Corrupt Your Database Files} how_to_corrupt \
{How To Corrupt Your Database Files}</tcl>

<p>The pager module is robust but it is not completely failsafe.
It can be subverted.  This section attempts to identify and explain
the risks.
(See also the [Things That Can Go Wrong] section of the article
on [Atomic Commit].</p>

<p>
Clearly, a hardware or operating system fault that introduces incorrect data
into the middle of the database file or journal will cause problems.
Likewise, 







|
|
<







428
429
430
431
432
433
434
435
436

437
438
439
440
441
442
443
will prevent new readers from coming in.  Assuming existing readers do
eventually complete, all SHARED locks will eventually clear and the
writer will be given a chance to make its changes.</p>

<tcl>HEADING 1 {How To Corrupt Your Database Files} how_to_corrupt \
{How To Corrupt Your Database Files}</tcl>

<p>The pager module is very robust but it can be subverted.  
This section attempts to identify and explain the risks.

(See also the [Things That Can Go Wrong] section of the article
on [Atomic Commit].</p>

<p>
Clearly, a hardware or operating system fault that introduces incorrect data
into the middle of the database file or journal will cause problems.
Likewise, 
452
453
454
455
456
457
458











459
460
461
462
463
464
465
in volatile cache memory in the disk drive electronics.  There are also
reports that Windows sometimes chooses to ignore FlushFileBuffers() for
unspecified reasons.  The author cannot verify any of these reports.
But if they are true, it means that database corruption is a possibility
following an unexpected power loss.  These are hardware and/or operating
system bugs that SQLite is unable to defend against.
</p>












<p>
If a crash or power failure occurs and results in a hot journal but that
journal is deleted, the next process to open the database will not
know that it contains changes that need to be rolled back.  The rollback
will not occur and the database will be left in an inconsistent state.
Rollback journals might be deleted for any number of reasons:







>
>
>
>
>
>
>
>
>
>
>







470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
in volatile cache memory in the disk drive electronics.  There are also
reports that Windows sometimes chooses to ignore FlushFileBuffers() for
unspecified reasons.  The author cannot verify any of these reports.
But if they are true, it means that database corruption is a possibility
following an unexpected power loss.  These are hardware and/or operating
system bugs that SQLite is unable to defend against.
</p>

<p>In the Linux <a href="http://en.wikipedia.org/wiki/Ext3">ext3</a>
file system, if the filesystem is not mounted with the "barrier=1" option
in the <a href="http://en.wikipedia.org/wiki/fstab">/etc/fstab</a>
then filesystem corruption can occur following a power loss or OS crash.
We are told that most Linux distributions do not use barrier=1 and so most
Linux distributions are vulnerable to this problem.  Note that this is an
operating system issue and that there is nothing that SQLite can do to
work around it.  
<a href="http://ozlabs.org/~rusty/index.cgi/tech/2009-10-20.html">
Other database engines</a> have also run into this same problem.</p>

<p>
If a crash or power failure occurs and results in a hot journal but that
journal is deleted, the next process to open the database will not
know that it contains changes that need to be rolled back.  The rollback
will not occur and the database will be left in an inconsistent state.
Rollback journals might be deleted for any number of reasons: