Documentation Source Text

Check-in [ee0e82d0aa]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the atomiccommit document to reflect the latest PRAGMA journal_mode changes.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: ee0e82d0aaaf6eda0549c8814171cdcda7cac3a7
User & Date: drh 2008-04-25 12:31:16.000
Context
2008-04-27
15:10
Updates to the CREATE TABLE documentation. (check-in: be570dc031 user: drh tags: trunk)
2008-04-25
12:31
Update the atomiccommit document to reflect the latest PRAGMA journal_mode changes. (check-in: ee0e82d0aa user: drh tags: trunk)
02:42
Updates to the temporary-files document. Create a link to the document from the main documentation index page. Also patch the journal_mode pragma documentation. (check-in: a192161bbc user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/atomiccommit.in.
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
after power is restored the file is only partially deleted,
if some of its data has been altered or erased,
or the file has been truncated but not completely removed, then
database corruption will likely result.</p>

<p>SQLite assumes that the detection and/or correction of 
bit errors caused by cosmic rays, thermal noise, quantum
fluctuations, device driver bugs, or some other mechanism, is the 
responsibility of the underlying hardware and operating system.  
SQLite does not add any redundancy to the database file for
the purpose of detecting corruption or I/O errors.
SQLite assumes that the data it reads is exactly the same data 
that it previously wrote.</p>

<a name="section_3_0"></a>







|







139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
after power is restored the file is only partially deleted,
if some of its data has been altered or erased,
or the file has been truncated but not completely removed, then
database corruption will likely result.</p>

<p>SQLite assumes that the detection and/or correction of 
bit errors caused by cosmic rays, thermal noise, quantum
fluctuations, device driver bugs, or other mechanisms, is the 
responsibility of the underlying hardware and operating system.  
SQLite does not add any redundancy to the database file for
the purpose of detecting corruption or I/O errors.
SQLite assumes that the data it reads is exactly the same data 
that it previously wrote.</p>

<a name="section_3_0"></a>
413
414
415
416
417
418
419














420
421
422
423
424
425
426

<p>The existence of a transaction depends on whether or
not the rollback journal file exists and the deletion
of a file appears to be an atomic operation from the point of
view of a user-space process.  Therefore, 
a transaction appears to be an atomic operation.</p>















<br clear="both">
<a name="section_3_12"></a>
<h3>3.12 Releasing The Lock</h3>
<img src="images/ac/commit-B.gif" align="right" hspace="15">

<p>The last step in the commit process is to release the
exclusive lock so that other processes can once again







>
>
>
>
>
>
>
>
>
>
>
>
>
>







413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440

<p>The existence of a transaction depends on whether or
not the rollback journal file exists and the deletion
of a file appears to be an atomic operation from the point of
view of a user-space process.  Therefore, 
a transaction appears to be an atomic operation.</p>

<p>The act of deleting a file is expensive on many systems.
As an optimization, SQLite can be configured to truncate
the journal file to zero bytes in length
or overwrite the journal file header with zeros.  In either
case, the resulting journal file is no longer capable of rolling
back and so the transaction still commits.  Truncating a file
to zero length, like deleting a file, is assumed to be an atomic
operation from the point of view of a user process.  Overwriting
the header of the journal with zeros is not atomic, but if any
part of the header is malformed the journal will not roll back.
Hence, one can say that the commit occurs as soon as the header
is sufficiently changed to make it invalid.  Typically this happens
as soon as the first byte of the header is zeroed.</p>

<br clear="both">
<a name="section_3_12"></a>
<h3>3.12 Releasing The Lock</h3>
<img src="images/ac/commit-B.gif" align="right" hspace="15">

<p>The last step in the commit process is to release the
exclusive lock so that other processes can once again
491
492
493
494
495
496
497


498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
<p>A rollback journal is a "hot" journal if all of the following
are true:</p>

<ul>
<li>The rollback journal exist.
<li>The rollback journal is not an empty file.
<li>There is no reserved lock on the main database file.


<li>The rollback journal header does not
contain the name of a master journal file (see
<a href="#section_5_5">section 5.5</a> below) or if does
contain the name of a master journal, then that master journal
file exists.
</ul>

<p>The presence of a hot journal is our indication
that a previous process was trying to commit a transaction but
it aborted for some reason prior to the completion of the
commit.  The presence of a hot journal is our indication that
the database file is in an inconsistent state and needs to
be repaired (by rollback) prior to being used.</p>

<br clear="both">
<h3>4.3 Obtaining An Exclusive Lock On The Database</h3>
<img src="images/ac/rollback-2.gif" align="right" hspace="15">








>
>
|









|







505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
<p>A rollback journal is a "hot" journal if all of the following
are true:</p>

<ul>
<li>The rollback journal exist.
<li>The rollback journal is not an empty file.
<li>There is no reserved lock on the main database file.
<li>The header of the rollback journal is well-formed and in particular
    has not been zeroed out.
<li>The rollback journal does not
contain the name of a master journal file (see
<a href="#section_5_5">section 5.5</a> below) or if does
contain the name of a master journal, then that master journal
file exists.
</ul>

<p>The presence of a hot journal is our indication
that a previous process was trying to commit a transaction but
it aborted for some reason prior to the completion of the
commit.  A hot journal means that
the database file is in an inconsistent state and needs to
be repaired (by rollback) prior to being used.</p>

<br clear="both">
<h3>4.3 Obtaining An Exclusive Lock On The Database</h3>
<img src="images/ac/rollback-2.gif" align="right" hspace="15">

540
541
542
543
544
545
546






547
548
549
550
551
552
553
<h3>4.5 Deleting The Hot Journal</h3>
<img src="images/ac/rollback-4.gif" align="right" hspace="15">

<p>After all information in the rollback journal has been
played back into the database file (and flushed to disk in case
we encounter yet another power failure), the hot rollback journal
can be deleted.</p>







<br clear="both">
<h3>4.6 Continue As If The Uncompleted Writes Had Never Happened</h3>
<img src="images/ac/rollback-5.gif" align="right" hspace="15">

<p>The final recovery step is to reduce the exclusive lock back
to a shared lock.  Once this happens, the database is back in the







>
>
>
>
>
>







556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
<h3>4.5 Deleting The Hot Journal</h3>
<img src="images/ac/rollback-4.gif" align="right" hspace="15">

<p>After all information in the rollback journal has been
played back into the database file (and flushed to disk in case
we encounter yet another power failure), the hot rollback journal
can be deleted.</p>

<p>As in <a href="#section_3_11">section 3.11</a>, the journal
file might be truncated to zero length or its header might
be overwritten with zeros as an optimization on systems where
deleting a file is expense.  Either way, the journal is no 
long hot after this step.</p>

<br clear="both">
<h3>4.6 Continue As If The Uncompleted Writes Had Never Happened</h3>
<img src="images/ac/rollback-5.gif" align="right" hspace="15">

<p>The final recovery step is to reduce the exclusive lock back
to a shared lock.  Once this happens, the database is back in the
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
and thus corrupt the database file.</p>

<p>SQLite uses two defenses against this problem.  In the first place,
SQLite records the number of pages in the rollback journal in the header
of the rollback journal.  This number is initially zero.  So during an
attempt to rollback an incomplete (and possibly corrupt) rollback
journal, the process doing the rollback will see that the journal
contains zero pages will thus make no changes to the database.  Prior
to a commit, the rollback journal is flushed to disk to ensure that
all content has been synched to disk and there is no "garbage" left
in the file, and only then is the page count in the header increased
true number of pages in the rollback journal.  The rollback journal
header is always kept in a separate sector from any page data so that
it can be overwritten and flushed without risking damage to a data
page if a power outage occurs.  Notice that the rollback journal
is flushed to disk twice: once to write the page content and a second
time to write the page count in the header.</p>

<p>The previous paragraph describes what happens when the







|


|
|







778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
and thus corrupt the database file.</p>

<p>SQLite uses two defenses against this problem.  In the first place,
SQLite records the number of pages in the rollback journal in the header
of the rollback journal.  This number is initially zero.  So during an
attempt to rollback an incomplete (and possibly corrupt) rollback
journal, the process doing the rollback will see that the journal
contains zero pages and will thus make no changes to the database.  Prior
to a commit, the rollback journal is flushed to disk to ensure that
all content has been synched to disk and there is no "garbage" left
in the file, and only then is the page count in the header changed from
zero to true number of pages in the rollback journal.  The rollback journal
header is always kept in a separate sector from any page data so that
it can be overwritten and flushed without risking damage to a data
page if a power outage occurs.  Notice that the rollback journal
is flushed to disk twice: once to write the page content and a second
time to write the page count in the header.</p>

<p>The previous paragraph describes what happens when the
875
876
877
878
879
880
881

882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901

902


903

904
905
906

907
908
909
910
911
912
913

914
915
916
917
918
919
920
SQLite uses a counter in the database header (in bytes 24 through 27)
which is incremented during every change operation.  SQLite saves a copy
of this counter prior to releasing its database lock.  Then after
acquiring the next database lock it compares the saved counter value
against the current counter value and erases the cache if the values
are different, or reuses the cache if they are the same.</p>


<h3>7.2 Exclusive Access Mode</h3>

<p>SQLite version 3.3.14 adds the concept of "Exclusive Access Mode".
In exclusive access mode, SQLite retains the exclusive
database lock at the conclusion of each transaction.  This prevents
other processes for accessing the database, but in many deployments
only a single process is using a database so often this is not a
serious problem.  The advantage of exclusive access mode is that
disk I/O can be reduced in three ways:</p>

<ol>
<li><p>It is not necessary to increment the change counter in the
database header for transactions after the first transaction.  This
will often save a write of page one to both the rollback
journal and the main database file.</p></li>

<li><p>No other processes can change the database so there is never
a need to check the change counter and clear the user-space cache
at the beginning of a transaction.</p></li>


<li><p>The rollback journal file can be truncated rather than deleted


at the end of each transaction.  On many operating systems, truncating

a file is much faster than deleting it.</p></li>
</ol>


<p>The third optimization, truncating rather than deleting the rollback
journal file, does not depend on holding an exclusive lock at all times.
We could, in theory, do that optimization at all times, not just when
exclusive access mode is enabled, and we may well choose to do so in 
some future release of SQLite.  But for now (version 3.5.0) the
rollback journal truncation optimization is only enabled in conjunction
with exclusive access mode.</p>


<h3>7.3 Do Not Journal Freelist Pages</h3>

<p>When information is deleted from an SQLite database, the pages used
to hold the deleted information are added to a "freelist".  Subsequent
inserts will draw pages off of this freelist rather than expanding the
database file.</p>







>






|













>
|
>
>
|
>
|


>
|
|
|
<
<
|
<
>







897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937


938

939
940
941
942
943
944
945
946
SQLite uses a counter in the database header (in bytes 24 through 27)
which is incremented during every change operation.  SQLite saves a copy
of this counter prior to releasing its database lock.  Then after
acquiring the next database lock it compares the saved counter value
against the current counter value and erases the cache if the values
are different, or reuses the cache if they are the same.</p>

<a name="section_7_2"></a>
<h3>7.2 Exclusive Access Mode</h3>

<p>SQLite version 3.3.14 adds the concept of "Exclusive Access Mode".
In exclusive access mode, SQLite retains the exclusive
database lock at the conclusion of each transaction.  This prevents
other processes for accessing the database, but in many deployments
only a single process is using a database so this is not a
serious problem.  The advantage of exclusive access mode is that
disk I/O can be reduced in three ways:</p>

<ol>
<li><p>It is not necessary to increment the change counter in the
database header for transactions after the first transaction.  This
will often save a write of page one to both the rollback
journal and the main database file.</p></li>

<li><p>No other processes can change the database so there is never
a need to check the change counter and clear the user-space cache
at the beginning of a transaction.</p></li>

<li><p>Each transaction can be committed by overwriting the rollback
journal header with zeros rather than deleting the journal file.
This avoids having to modify the directory entry for the journal file
and it avoids having to deallocate disk sectors associated with the 
journal.  Furthermore, the next transaction will overwrite existing
journal file content rather than append new content and on most systems
overwriting is much faster than appending.</p></li>
</ol>

<p>The third optimization, zeroing the journal file header rather than
deleting the rollback journal file,
does not depend on holding an exclusive lock at all times.
This optimization can be set independently of exclusive lock mode


using the <a href="pragma.html#pragma_journal_mode">journal_mode</a>

pragma as described in <a href="#section_7_6">section 7.6</a> below.</p>

<h3>7.3 Do Not Journal Freelist Pages</h3>

<p>When information is deleted from an SQLite database, the pages used
to hold the deleted information are added to a "freelist".  Subsequent
inserts will draw pages off of this freelist rather than expanding the
database file.</p>
996
997
998
999
1000
1001
1002













































1003
1004
1005
1006
1007
1008
1009
size.  This -1 value is never changed.  So that when a commit
occurs, we save a single flush operation and a sector write of
the first page of the journal file.  Furthermore, when a cache
spill occurs we no longer need to append a new journal header
to the end of the journal; we can simply continue appending
new pages to the end of the existing journal.</p>














































<h2>8.0 Testing Atomic Commit Behavior</h2>

<p>The developers of SQLite are confident that it is robust
in the face of power failures and system crashes because the
automatic test procedures do extensive checks on
the ability of SQLite to recover from simulated power loss.
We call these the "crash tests".</p>







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
size.  This -1 value is never changed.  So that when a commit
occurs, we save a single flush operation and a sector write of
the first page of the journal file.  Furthermore, when a cache
spill occurs we no longer need to append a new journal header
to the end of the journal; we can simply continue appending
new pages to the end of the existing journal.</p>

<a name="section_7_6"></a>
<h3>7.6 Persistent Rollback Journals</h3>

<p>Deleting a file is an expensive operation on many systems.
So as an optimization, SQLite can be configured to avoid the
delete operation of <a href="#section_3_11">section 3.11</a>.
Instead of deleting the journal file in order to commit a transaction,
the file is either truncated to zero bytes in length or its
header is overwritten with zeros.  Truncating the file to zero
length saves having to make modifications to the directory containing
the file since the file is not removed from the directory. 
Overwriting the header has the additional savings of not having
to update the length of the file (in the "inode" on many systems)
and not having to deal with newly freed disk sectors.  Furthermore,
at the next transaction the journal will be created by overwriting
existing content rather than appending new content onto the end
of a file, and overwriting is often much faster than appending.</p>

<p>SQLite can be configured to commit transactions by overwriting
the journal header with zeros instead of deleting the journal file
by setting the "PERSIST" journaling mode using the 
<a href="pragma.html#pragma_journal_mode">journal_mode</a> PRAGMA.
For example:</p>

<blockquote><pre>
PRAGMA journal_mode=PERSIST;
</per></blockquote>

<p>The use of persistent journal mode provide a noticable performance
improvement on many systems.  Of course, the drawback is that the 
journal files remain on the disk, using disk space and cluttering
directories, long after the transaction commits.  The only safe way
to delete a persistent journal file is to commit a transaction
with journaling mode set to DELETE:</p>

<blockquote><pre>
PRAGMA journal_mode=DELETE;
BEGIN EXCLUSIVE;
COMMIT;
</per></blockquote>

<p>Beware of deleting persistent journal files by any other means
since the journal file might be hot, in which case deleting it will
corrupt the corresponding database file.</p>

<h2>8.0 Testing Atomic Commit Behavior</h2>

<p>The developers of SQLite are confident that it is robust
in the face of power failures and system crashes because the
automatic test procedures do extensive checks on
the ability of SQLite to recover from simulated power loss.
We call these the "crash tests".</p>