Documentation Source Text

Check-in [b17469386d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos in the atomiccommit.html document. Update the proposed release date for 3.7.16.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: b17469386db3bd138dad5b0ed2668796db266d53
User & Date: drh 2013-03-11 20:22:24.400
Context
2013-03-13
17:11
Documentation updates and typo fixes in preparation for the next release. (check-in: d29bab5808 user: drh tags: trunk)
2013-03-11
20:22
Fix typos in the atomiccommit.html document. Update the proposed release date for 3.7.16. (check-in: b17469386d user: drh tags: trunk)
2013-03-09
15:12
Fix various documentation typos. (check-in: 8a8b9f4250 user: drh tags: trunk)
Changes
Unified Diff Show Whitespace Changes Patch
Changes to pages/atomiccommit.in.
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<p>This article describes the techniques used by SQLite to create the
illusion of atomic commit.</p>

<p>The information in this article applies only when SQLite is operating
in "rollback mode", or in other words when SQLite is not 
using a [write-ahead log].  SQLite still supports atomic commit when
write-ahead logging is enabled, but it accomplishes atomic commit by
a different mechanism from the one describe in this article.  See
the [WAL | write-ahead log documentation] for additional information on how
SQLite supports atomic commit in that context.</p>

<tcl>hd_fragment hardware</tcl>
<h2>2.0 Hardware Assumptions</h2>

<p>Throughout this article, we will call the mass storage device "disk"







|







28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
<p>This article describes the techniques used by SQLite to create the
illusion of atomic commit.</p>

<p>The information in this article applies only when SQLite is operating
in "rollback mode", or in other words when SQLite is not 
using a [write-ahead log].  SQLite still supports atomic commit when
write-ahead logging is enabled, but it accomplishes atomic commit by
a different mechanism from the one described in this article.  See
the [WAL | write-ahead log documentation] for additional information on how
SQLite supports atomic commit in that context.</p>

<tcl>hd_fragment hardware</tcl>
<h2>2.0 Hardware Assumptions</h2>

<p>Throughout this article, we will call the mass storage device "disk"
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
SQLite does not add any redundancy to the database file for
the purpose of detecting corruption or I/O errors.
SQLite assumes that the data it reads is exactly the same data 
that it previously wrote.</p>

<p>By default, SQLite assumes that an operating system call to write
a range of bytes will not damage or alter any bytes outside of that range
even if a power lose or OS crash occurs during that write.  We
call this the "[PSOW | powersafe overwrite]" property.  Prior to version 3.7.9,
SQLite did not assume powersafe overwrite.  But with the standard
sector size increasing from 512 to 4096 bytes on most disk drives, it
has become necessary to assume powersafe overwrite in order to maintain
historical performance levels and so powersafe overwrite is assumed by
default in recent versions of SQLite.  The assumption of powersafe 
overwrite property can be disabled at compile-time or a run-time if







|







158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
SQLite does not add any redundancy to the database file for
the purpose of detecting corruption or I/O errors.
SQLite assumes that the data it reads is exactly the same data 
that it previously wrote.</p>

<p>By default, SQLite assumes that an operating system call to write
a range of bytes will not damage or alter any bytes outside of that range
even if a power loss or OS crash occurs during that write.  We
call this the "[PSOW | powersafe overwrite]" property.  Prior to version 3.7.9,
SQLite did not assume powersafe overwrite.  But with the standard
sector size increasing from 512 to 4096 bytes on most disk drives, it
has become necessary to assume powersafe overwrite in order to maintain
historical performance levels and so powersafe overwrite is assumed by
default in recent versions of SQLite.  The assumption of powersafe 
overwrite property can be disabled at compile-time or a run-time if
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
lock allows two or more database connections to read from the
database file at the same time.  But a shared lock prevents
another database connection from writing to the database file
while we are reading it.  This is necessary because if another
database connection were writing to the database file at the
same time we are reading from the database file, we might read
some data before the change and other data after the change.
This would make it appears as if the change made by the other
process is not atomic.</p>

<p>Notice that the shared lock is on the operating system
disk cache, not on the disk itself.  File locks
really are just flags within the operating system kernel,
usually.  (The details depend on the specific OS layer
interface.)  Hence, the lock will instantly vanish if the







|







222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
lock allows two or more database connections to read from the
database file at the same time.  But a shared lock prevents
another database connection from writing to the database file
while we are reading it.  This is necessary because if another
database connection were writing to the database file at the
same time we are reading from the database file, we might read
some data before the change and other data after the change.
This would make it appear as if the change made by the other
process is not atomic.</p>

<p>Notice that the shared lock is on the operating system
disk cache, not on the disk itself.  File locks
really are just flags within the operating system kernel,
usually.  (The details depend on the specific OS layer
interface.)  Hence, the lock will instantly vanish if the
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
and shared lock allow other processes to read from the database
file.  A single reserve lock can coexist with multiple shared
locks from other processes.  However, there can only be a
single reserved lock on the database file.  Hence only a
single process can be attempting to write to the database
at one time.</p>

<p>The idea behind a reserved locks is that it signals that
a process intends to modify the database file in the near
future but has not yet started to make the modifications.
And because the modifications have not yet started, other
processes can continue to read from the database.  However,
no other process should also begin trying to write to the
database.</p>








|







273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
and shared lock allow other processes to read from the database
file.  A single reserve lock can coexist with multiple shared
locks from other processes.  However, there can only be a
single reserved lock on the database file.  Hence only a
single process can be attempting to write to the database
at one time.</p>

<p>The idea behind a reserved lock is that it signals that
a process intends to modify the database file in the near
future but has not yet started to make the modifications.
And because the modifications have not yet started, other
processes can continue to read from the database.  However,
no other process should also begin trying to write to the
database.</p>

402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429

<p>Another flush must occur to make sure that all the
database changes are written into nonvolatile storage.
This is a critical step to ensure that the database will
survive a power loss without damage.  However, because
of the inherent slowness of writing to disk or flash memory, 
this step together with the rollback journal file flush in section
3.7 above takes up most the time required to complete a
transaction commit in SQLite.</p>

<br clear="both">
<a name="section_3_11"></a>
<h3>3.11 Deleting The Rollback Journal</h3>
<img src="images/ac/commit-A.gif" align="right" hspace="15">

<p>After the database changes are all safely on the mass
storage device, the rollback journal file is deleted.
This is the instant where the transaction commits.
If a power failure or system crash occurs prior to this
point, then recovery processes to be described later make
it appears as if no changes were ever made to the database
file.  If a power failure or system crash occurs after
the rollback journal is deleted, then it appears as if
all changes have been written to disk.  Thus, SQLite gives
the appearance of having made no changes to the database
file or having made the complete set of changes to the
database file depending on whether or not the rollback
journal file exists.</p>







|












|







402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429

<p>Another flush must occur to make sure that all the
database changes are written into nonvolatile storage.
This is a critical step to ensure that the database will
survive a power loss without damage.  However, because
of the inherent slowness of writing to disk or flash memory, 
this step together with the rollback journal file flush in section
3.7 above takes up most of the time required to complete a
transaction commit in SQLite.</p>

<br clear="both">
<a name="section_3_11"></a>
<h3>3.11 Deleting The Rollback Journal</h3>
<img src="images/ac/commit-A.gif" align="right" hspace="15">

<p>After the database changes are all safely on the mass
storage device, the rollback journal file is deleted.
This is the instant where the transaction commits.
If a power failure or system crash occurs prior to this
point, then recovery processes to be described later make
it appear as if no changes were ever made to the database
file.  If a power failure or system crash occurs after
the rollback journal is deleted, then it appears as if
all changes have been written to disk.  Thus, SQLite gives
the appearance of having made no changes to the database
file or having made the complete set of changes to the
database file depending on whether or not the rollback
journal file exists.</p>
499
500
501
502
503
504
505
506

507
508
509
510
511
512
513
514
515

516
517
518
519
520
521
522
523
524
525
526

527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
have to "rollback" any partial changes and restore the database to
the state it was in prior to the beginning of the transaction.</p>

<tcl>hd_fragment crisis</tcl>
<h3>4.1 When Something Goes Wrong...</h3>
<img src="images/ac/rollback-0.gif" align="right" hspace="15">

<p>Suppose the power loss occurred during step 3.10 above,

while the database changes were being written to disk.
After power is restored, the situation might be something
like what is shown to the right.  We were trying to change
three pages of the database file but only one page was
successfully written.  Another page was partially written
and a third page was not written at all.</p>

<p>The rollback journal is complete and intact on disk when
the power is restored.  This is a key point.  The reason for

the flush operation in step 3.7 is to make absolutely sure that
all of the rollback journal is safely on nonvolatile storage
prior to making any changes to the database file itself.</p>

<br clear="both">
<a name="section_4_2"></a>
<h3>4.2 Hot Rollback Journals</h3>
<img src="images/ac/rollback-1.gif" align="right" hspace="15">

<p>The first time that any SQLite process attempts to access
the database file, it obtains a shared lock as described in

section 3.2 above.  But then it notices that there is a 
rollback journal file present.  SQLite then checks to see if
the rollback journal is a "hot journal".   A hot journal is
a rollback journal that needs to be played back in order to
restore the database to a sane state.  A hot journal only
exists when an earlier process was in the middle of committing
a transaction when it crashed or lost power.</p>

<p>A rollback journal is a "hot" journal if all of the following
are true:</p>

<ul>
<li>The rollback journal exist.
<li>The rollback journal is not an empty file.
<li>There is no reserved lock on the main database file.
<li>The header of the rollback journal is well-formed and in particular
    has not been zeroed out.
<li>The rollback journal does not
contain the name of a master journal file (see
<a href="#section_5_5">section 5.5</a> below) or if does







|
>









>
|










>
|











|







499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
have to "rollback" any partial changes and restore the database to
the state it was in prior to the beginning of the transaction.</p>

<tcl>hd_fragment crisis</tcl>
<h3>4.1 When Something Goes Wrong...</h3>
<img src="images/ac/rollback-0.gif" align="right" hspace="15">

<p>Suppose the power loss occurred
during <a href="#section_3_10">step 3.10</a> above,
while the database changes were being written to disk.
After power is restored, the situation might be something
like what is shown to the right.  We were trying to change
three pages of the database file but only one page was
successfully written.  Another page was partially written
and a third page was not written at all.</p>

<p>The rollback journal is complete and intact on disk when
the power is restored.  This is a key point.  The reason for
the flush operation in <a href="#section_3_7">step 3.7</a>
is to make absolutely sure that
all of the rollback journal is safely on nonvolatile storage
prior to making any changes to the database file itself.</p>

<br clear="both">
<a name="section_4_2"></a>
<h3>4.2 Hot Rollback Journals</h3>
<img src="images/ac/rollback-1.gif" align="right" hspace="15">

<p>The first time that any SQLite process attempts to access
the database file, it obtains a shared lock as described in
<a href="section_3_2">section 3.2</a> above.
But then it notices that there is a 
rollback journal file present.  SQLite then checks to see if
the rollback journal is a "hot journal".   A hot journal is
a rollback journal that needs to be played back in order to
restore the database to a sane state.  A hot journal only
exists when an earlier process was in the middle of committing
a transaction when it crashed or lost power.</p>

<p>A rollback journal is a "hot" journal if all of the following
are true:</p>

<ul>
<li>The rollback journal exists.
<li>The rollback journal is not an empty file.
<li>There is no reserved lock on the main database file.
<li>The header of the rollback journal is well-formed and in particular
    has not been zeroed out.
<li>The rollback journal does not
contain the name of a master journal file (see
<a href="#section_5_5">section 5.5</a> below) or if does
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
<a name="section_4_4"></a>
<h3>4.4 Rolling Back Incomplete Changes</h3>
<img src="images/ac/rollback-3.gif" align="right" hspace="15">

<p>Once a process obtains an exclusive lock, it is permitted
to write to the database file.  It then proceeds to read the
original content of pages out of the rollback journal and write
that content back to were it came from in the database file.
Recall that the header of the rollback journal records the original
size of the database file prior to the start of the aborted
transaction.  SQLite uses this information to truncate the
database file back to its original size in cases where the
incomplete transaction caused the database to grow.  At the
end of this step, the database should be the same size and
contain the same information as it did before the start of







|







572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
<a name="section_4_4"></a>
<h3>4.4 Rolling Back Incomplete Changes</h3>
<img src="images/ac/rollback-3.gif" align="right" hspace="15">

<p>Once a process obtains an exclusive lock, it is permitted
to write to the database file.  It then proceeds to read the
original content of pages out of the rollback journal and write
that content back to where it came from in the database file.
Recall that the header of the rollback journal records the original
size of the database file prior to the start of the aborted
transaction.  SQLite uses this information to truncate the
database file back to its original size in cases where the
incomplete transaction caused the database to grow.  At the
end of this step, the database should be the same size and
contain the same information as it did before the start of
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
played back into the database file (and flushed to disk in case
we encounter yet another power failure), the hot rollback journal
can be deleted.</p>

<p>As in <a href="#section_3_11">section 3.11</a>, the journal
file might be truncated to zero length or its header might
be overwritten with zeros as an optimization on systems where
deleting a file is expense.  Either way, the journal is no 
long hot after this step.</p>

<br clear="both">
<tcl>hd_fragment cont</tcl>
<h3>4.6 Continue As If The Uncompleted Writes Had Never Happened</h3>
<img src="images/ac/rollback-5.gif" align="right" hspace="15">

<p>The final recovery step is to reduce the exclusive lock back







|
|







595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
played back into the database file (and flushed to disk in case
we encounter yet another power failure), the hot rollback journal
can be deleted.</p>

<p>As in <a href="#section_3_11">section 3.11</a>, the journal
file might be truncated to zero length or its header might
be overwritten with zeros as an optimization on systems where
deleting a file is expensive.  Either way, the journal is no 
longer hot after this step.</p>

<br clear="both">
<tcl>hd_fragment cont</tcl>
<h3>4.6 Continue As If The Uncompleted Writes Had Never Happened</h3>
<img src="images/ac/rollback-5.gif" align="right" hspace="15">

<p>The final recovery step is to reduce the exclusive lock back
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962

<a name="section_7_2"></a>
<h3>7.2 Exclusive Access Mode</h3>

<p>SQLite version 3.3.14 adds the concept of "Exclusive Access Mode".
In exclusive access mode, SQLite retains the exclusive
database lock at the conclusion of each transaction.  This prevents
other processes for accessing the database, but in many deployments
only a single process is using a database so this is not a
serious problem.  The advantage of exclusive access mode is that
disk I/O can be reduced in three ways:</p>

<ol>
<li><p>It is not necessary to increment the change counter in the
database header for transactions after the first transaction.  This







|







951
952
953
954
955
956
957
958
959
960
961
962
963
964
965

<a name="section_7_2"></a>
<h3>7.2 Exclusive Access Mode</h3>

<p>SQLite version 3.3.14 adds the concept of "Exclusive Access Mode".
In exclusive access mode, SQLite retains the exclusive
database lock at the conclusion of each transaction.  This prevents
other processes from accessing the database, but in many deployments
only a single process is using a database so this is not a
serious problem.  The advantage of exclusive access mode is that
disk I/O can be reduced in three ways:</p>

<ol>
<li><p>It is not necessary to increment the change counter in the
database header for transactions after the first transaction.  This
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
writes.  When power is lost, the drive uses energy stored in capacitors
and/or the angular momentum of the disk platter to provide power to 
complete any operation in progress.  Nevertheless, there are so many
layers in between the write system call and the on-board disk drive
electronics that we take the safe approach in both Unix and w32 VFS
implementations and assume that sector writes are not atomic.  On the
other hand, device
manufactures with more control over their filesystems might want
to consider enabling the atomic write property of xDeviceCharacteristics
if their hardware really does do atomic writes.</p>

<p>When sector writes are atomic and the page size of a database is
the same as a sector size, and when there is a database change that
only touches a single database page, then SQLite skips the whole
journaling and syncing process and simply writes the modified page







|







1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
writes.  When power is lost, the drive uses energy stored in capacitors
and/or the angular momentum of the disk platter to provide power to 
complete any operation in progress.  Nevertheless, there are so many
layers in between the write system call and the on-board disk drive
electronics that we take the safe approach in both Unix and w32 VFS
implementations and assume that sector writes are not atomic.  On the
other hand, device
manufacturers with more control over their filesystems might want
to consider enabling the atomic write property of xDeviceCharacteristics
if their hardware really does do atomic writes.</p>

<p>When sector writes are atomic and the page size of a database is
the same as a sector size, and when there is a database change that
only touches a single database page, then SQLite skips the whole
journaling and syncing process and simply writes the modified page
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
<tcl>hd_fragment mvhotjrnl</tcl>
<h3>9.5 Deleting Or Renaming A Hot Journal</h3>

<p>If a crash or power loss does occur and a hot journal is left on
the disk, it is essential that the original database file and the hot
journal remain on disk with their original names until the database
file is opened by another SQLite process and rolled back.  
During recovery at <a href="section_4_2">step 4.2</a> SQLite locates
the hot journal by looking for a file in the same directory as the
database being opened and whose name is derived from the name of the
file being opened.  If either the original database file or the
hot journal have been moved or renamed, then the hot journal will
not be seen and the database will not be rolled back.</p>

<p>We suspect that a common failure mode for SQLite recovery happens







|







1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
<tcl>hd_fragment mvhotjrnl</tcl>
<h3>9.5 Deleting Or Renaming A Hot Journal</h3>

<p>If a crash or power loss does occur and a hot journal is left on
the disk, it is essential that the original database file and the hot
journal remain on disk with their original names until the database
file is opened by another SQLite process and rolled back.  
During recovery at <a href="#section_4_2">step 4.2</a> SQLite locates
the hot journal by looking for a file in the same directory as the
database being opened and whose name is derived from the name of the
file being opened.  If either the original database file or the
hot journal have been moved or renamed, then the hot journal will
not be seen and the database will not be rolled back.</p>

<p>We suspect that a common failure mode for SQLite recovery happens
Changes to pages/changes.in.
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
      http://www.sqlite.org/src/timeline</a>.</p>
    }
    hd_close_aux
    hd_enable_main 1
  }
}

chng {2013-03-12 (3.7.16)} {
<li>Added the [PRAGMA foreign_key_check] command.
<li>Added new extended error codes for all SQLITE_CONSTRAINT errors
<li>Added SQL functions [unicode(A)] and [char(X1,...,XN)].
<li>Performance improvements for [PRAGMA incremental_vacuum], especially in
    cases where the number of free pages is greater than what will fit on a 
    single trunk page of the freelist.
<li>Improved optimization of queries containing aggregate min() or max().







|







38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
      http://www.sqlite.org/src/timeline</a>.</p>
    }
    hd_close_aux
    hd_enable_main 1
  }
}

chng {2013-03-18 (3.7.16)} {
<li>Added the [PRAGMA foreign_key_check] command.
<li>Added new extended error codes for all SQLITE_CONSTRAINT errors
<li>Added SQL functions [unicode(A)] and [char(X1,...,XN)].
<li>Performance improvements for [PRAGMA incremental_vacuum], especially in
    cases where the number of free pages is greater than what will fit on a 
    single trunk page of the freelist.
<li>Improved optimization of queries containing aggregate min() or max().