Documentation Source Text: Check-in [b17469386d]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Comment:	Fix typos in the atomiccommit.html document. Update the proposed release date for 3.7.16.
Downloads:	Tarball \| ZIP archive
Timelines:	family \| ancestors \| descendants \| both \| trunk
Files:	files \| file ages \| folders
SHA1:	b17469386db3bd138dad5b0ed2668796db266d53
User & Date:	drh 2013-03-11 20:22:24.400

Context

2013-03-13
17:11		Documentation updates and typo fixes in preparation for the next release. (check-in: d29bab5808 user: drh tags: trunk)
2013-03-11
20:22		Fix typos in the atomiccommit.html document. Update the proposed release date for 3.7.16. (check-in: b17469386d user: drh tags: trunk)
2013-03-09
15:12		Fix various documentation typos. (check-in: 8a8b9f4250 user: drh tags: trunk)

Changes

Changes to pages/atomiccommit.in.

Changes to pages/changes.in.

︙			︙
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42	<p>This article describes the techniques used by SQLite to create the illusion of atomic commit.</p> <p>The information in this article applies only when SQLite is operating in "rollback mode", or in other words when SQLite is not using a [write-ahead log]. SQLite still supports atomic commit when write-ahead logging is enabled, but it accomplishes atomic commit by ~~a different mechanism from the one describe in this article. See~~ the [WAL \| write-ahead log documentation] for additional information on how SQLite supports atomic commit in that context.</p> <tcl>hd_fragment hardware</tcl> <h2>2.0 Hardware Assumptions</h2> <p>Throughout this article, we will call the mass storage device "disk"	\|	28 29 30 31 32 33 34 35 36 37 38 39 40 41 42	<p>This article describes the techniques used by SQLite to create the illusion of atomic commit.</p> <p>The information in this article applies only when SQLite is operating in "rollback mode", or in other words when SQLite is not using a [write-ahead log]. SQLite still supports atomic commit when write-ahead logging is enabled, but it accomplishes atomic commit by a different mechanism from the one described in this article. See the [WAL \| write-ahead log documentation] for additional information on how SQLite supports atomic commit in that context.</p> <tcl>hd_fragment hardware</tcl> <h2>2.0 Hardware Assumptions</h2> <p>Throughout this article, we will call the mass storage device "disk"
︙			︙
158 159 160 161 162 163 164 ~~165~~ 166 167 168 169 170 171 172	SQLite does not add any redundancy to the database file for the purpose of detecting corruption or I/O errors. SQLite assumes that the data it reads is exactly the same data that it previously wrote.</p> <p>By default, SQLite assumes that an operating system call to write a range of bytes will not damage or alter any bytes outside of that range ~~even if a power lose or OS crash occurs during that write. We~~ call this the "[PSOW \| powersafe overwrite]" property. Prior to version 3.7.9, SQLite did not assume powersafe overwrite. But with the standard sector size increasing from 512 to 4096 bytes on most disk drives, it has become necessary to assume powersafe overwrite in order to maintain historical performance levels and so powersafe overwrite is assumed by default in recent versions of SQLite. The assumption of powersafe overwrite property can be disabled at compile-time or a run-time if	\|	158 159 160 161 162 163 164 165 166 167 168 169 170 171 172	SQLite does not add any redundancy to the database file for the purpose of detecting corruption or I/O errors. SQLite assumes that the data it reads is exactly the same data that it previously wrote.</p> <p>By default, SQLite assumes that an operating system call to write a range of bytes will not damage or alter any bytes outside of that range even if a power loss or OS crash occurs during that write. We call this the "[PSOW \| powersafe overwrite]" property. Prior to version 3.7.9, SQLite did not assume powersafe overwrite. But with the standard sector size increasing from 512 to 4096 bytes on most disk drives, it has become necessary to assume powersafe overwrite in order to maintain historical performance levels and so powersafe overwrite is assumed by default in recent versions of SQLite. The assumption of powersafe overwrite property can be disabled at compile-time or a run-time if
︙			︙
222 223 224 225 226 227 228 ~~229~~ 230 231 232 233 234 235 236	lock allows two or more database connections to read from the database file at the same time. But a shared lock prevents another database connection from writing to the database file while we are reading it. This is necessary because if another database connection were writing to the database file at the same time we are reading from the database file, we might read some data before the change and other data after the change. ~~This would make it appears as if the change made by the other~~ process is not atomic.</p> <p>Notice that the shared lock is on the operating system disk cache, not on the disk itself. File locks really are just flags within the operating system kernel, usually. (The details depend on the specific OS layer interface.) Hence, the lock will instantly vanish if the	\|	222 223 224 225 226 227 228 229 230 231 232 233 234 235 236	lock allows two or more database connections to read from the database file at the same time. But a shared lock prevents another database connection from writing to the database file while we are reading it. This is necessary because if another database connection were writing to the database file at the same time we are reading from the database file, we might read some data before the change and other data after the change. This would make it appear as if the change made by the other process is not atomic.</p> <p>Notice that the shared lock is on the operating system disk cache, not on the disk itself. File locks really are just flags within the operating system kernel, usually. (The details depend on the specific OS layer interface.) Hence, the lock will instantly vanish if the
︙			︙
273 274 275 276 277 278 279 ~~280~~ 281 282 283 284 285 286 287	and shared lock allow other processes to read from the database file. A single reserve lock can coexist with multiple shared locks from other processes. However, there can only be a single reserved lock on the database file. Hence only a single process can be attempting to write to the database at one time.</p> ~~<p>The idea behind a reserved locks is that it signals that~~ a process intends to modify the database file in the near future but has not yet started to make the modifications. And because the modifications have not yet started, other processes can continue to read from the database. However, no other process should also begin trying to write to the database.</p>	\|	273 274 275 276 277 278 279 280 281 282 283 284 285 286 287	and shared lock allow other processes to read from the database file. A single reserve lock can coexist with multiple shared locks from other processes. However, there can only be a single reserved lock on the database file. Hence only a single process can be attempting to write to the database at one time.</p> <p>The idea behind a reserved lock is that it signals that a process intends to modify the database file in the near future but has not yet started to make the modifications. And because the modifications have not yet started, other processes can continue to read from the database. However, no other process should also begin trying to write to the database.</p>
︙			︙
402 403 404 405 406 407 408 ~~409~~ 410 411 412 413 414 415 416 417 418 419 420 421 ~~422~~ 423 424 425 426 427 428 429	<p>Another flush must occur to make sure that all the database changes are written into nonvolatile storage. This is a critical step to ensure that the database will survive a power loss without damage. However, because of the inherent slowness of writing to disk or flash memory, this step together with the rollback journal file flush in section ~~3.7 above takes up most the time required to complete a~~ transaction commit in SQLite.</p> <br clear="both"> <a name="section_3_11"></a> <h3>3.11 Deleting The Rollback Journal</h3> <img src="images/ac/commit-A.gif" align="right" hspace="15"> <p>After the database changes are all safely on the mass storage device, the rollback journal file is deleted. This is the instant where the transaction commits. If a power failure or system crash occurs prior to this point, then recovery processes to be described later make ~~it appears as if no changes were ever made to the database~~ file. If a power failure or system crash occurs after the rollback journal is deleted, then it appears as if all changes have been written to disk. Thus, SQLite gives the appearance of having made no changes to the database file or having made the complete set of changes to the database file depending on whether or not the rollback journal file exists.</p>	\| \|	402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429	<p>Another flush must occur to make sure that all the database changes are written into nonvolatile storage. This is a critical step to ensure that the database will survive a power loss without damage. However, because of the inherent slowness of writing to disk or flash memory, this step together with the rollback journal file flush in section 3.7 above takes up most of the time required to complete a transaction commit in SQLite.</p> <br clear="both"> <a name="section_3_11"></a> <h3>3.11 Deleting The Rollback Journal</h3> <img src="images/ac/commit-A.gif" align="right" hspace="15"> <p>After the database changes are all safely on the mass storage device, the rollback journal file is deleted. This is the instant where the transaction commits. If a power failure or system crash occurs prior to this point, then recovery processes to be described later make it appear as if no changes were ever made to the database file. If a power failure or system crash occurs after the rollback journal is deleted, then it appears as if all changes have been written to disk. Thus, SQLite gives the appearance of having made no changes to the database file or having made the complete set of changes to the database file depending on whether or not the rollback journal file exists.</p>
︙			︙
499 500 501 502 503 504 505 ~~506~~ 507 508 509 510 511 512 513 514 515 ~~516~~ 517 518 519 520 521 522 523 524 525 526 ~~527~~ 528 529 530 531 532 533 534 535 536 537 538 ~~539~~ 540 541 542 543 544 545 546	have to "rollback" any partial changes and restore the database to the state it was in prior to the beginning of the transaction.</p> <tcl>hd_fragment crisis</tcl> <h3>4.1 When Something Goes Wrong...</h3> <img src="images/ac/rollback-0.gif" align="right" hspace="15"> ~~<p>Suppose the power loss occurred ~~during step 3.10 above,~~~~ while the database changes were being written to disk. After power is restored, the situation might be something like what is shown to the right. We were trying to change three pages of the database file but only one page was successfully written. Another page was partially written and a third page was not written at all.</p> <p>The rollback journal is complete and intact on disk when the power is restored. This is a key point. The reason for ~~~~the flush operation in step 3.7~~ is to make absolutely sure that~~ all of the rollback journal is safely on nonvolatile storage prior to making any changes to the database file itself.</p> <br clear="both"> <a name="section_4_2"></a> <h3>4.2 Hot Rollback Journals</h3> <img src="images/ac/rollback-1.gif" align="right" hspace="15"> <p>The first time that any SQLite process attempts to access the database file, it obtains a shared lock as described in ~~~~section 3.2 above.~~ But then it notices that there is a~~ rollback journal file present. SQLite then checks to see if the rollback journal is a "hot journal". A hot journal is a rollback journal that needs to be played back in order to restore the database to a sane state. A hot journal only exists when an earlier process was in the middle of committing a transaction when it crashed or lost power.</p> <p>A rollback journal is a "hot" journal if all of the following are true:</p> <ul> ~~<li>The rollback journal exist.~~ <li>The rollback journal is not an empty file. <li>There is no reserved lock on the main database file. <li>The header of the rollback journal is well-formed and in particular has not been zeroed out. <li>The rollback journal does not contain the name of a master journal file (see <a href="#section_5_5">section 5.5</a> below) or if does	\| > > \| > \| \|	499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549	have to "rollback" any partial changes and restore the database to the state it was in prior to the beginning of the transaction.</p> <tcl>hd_fragment crisis</tcl> <h3>4.1 When Something Goes Wrong...</h3> <img src="images/ac/rollback-0.gif" align="right" hspace="15"> <p>Suppose the power loss occurred during <a href="#section_3_10">step 3.10</a> above, while the database changes were being written to disk. After power is restored, the situation might be something like what is shown to the right. We were trying to change three pages of the database file but only one page was successfully written. Another page was partially written and a third page was not written at all.</p> <p>The rollback journal is complete and intact on disk when the power is restored. This is a key point. The reason for the flush operation in <a href="#section_3_7">step 3.7</a> is to make absolutely sure that all of the rollback journal is safely on nonvolatile storage prior to making any changes to the database file itself.</p> <br clear="both"> <a name="section_4_2"></a> <h3>4.2 Hot Rollback Journals</h3> <img src="images/ac/rollback-1.gif" align="right" hspace="15"> <p>The first time that any SQLite process attempts to access the database file, it obtains a shared lock as described in <a href="section_3_2">section 3.2</a> above. But then it notices that there is a rollback journal file present. SQLite then checks to see if the rollback journal is a "hot journal". A hot journal is a rollback journal that needs to be played back in order to restore the database to a sane state. A hot journal only exists when an earlier process was in the middle of committing a transaction when it crashed or lost power.</p> <p>A rollback journal is a "hot" journal if all of the following are true:</p> <ul> <li>The rollback journal exists. <li>The rollback journal is not an empty file. <li>There is no reserved lock on the main database file. <li>The header of the rollback journal is well-formed and in particular has not been zeroed out. <li>The rollback journal does not contain the name of a master journal file (see <a href="#section_5_5">section 5.5</a> below) or if does
︙			︙
569 570 571 572 573 574 575 ~~576~~ 577 578 579 580 581 582 583	<a name="section_4_4"></a> <h3>4.4 Rolling Back Incomplete Changes</h3> <img src="images/ac/rollback-3.gif" align="right" hspace="15"> <p>Once a process obtains an exclusive lock, it is permitted to write to the database file. It then proceeds to read the original content of pages out of the rollback journal and write ~~that content back to were it came from in the database file.~~ Recall that the header of the rollback journal records the original size of the database file prior to the start of the aborted transaction. SQLite uses this information to truncate the database file back to its original size in cases where the incomplete transaction caused the database to grow. At the end of this step, the database should be the same size and contain the same information as it did before the start of	\|	572 573 574 575 576 577 578 579 580 581 582 583 584 585 586	<a name="section_4_4"></a> <h3>4.4 Rolling Back Incomplete Changes</h3> <img src="images/ac/rollback-3.gif" align="right" hspace="15"> <p>Once a process obtains an exclusive lock, it is permitted to write to the database file. It then proceeds to read the original content of pages out of the rollback journal and write that content back to where it came from in the database file. Recall that the header of the rollback journal records the original size of the database file prior to the start of the aborted transaction. SQLite uses this information to truncate the database file back to its original size in cases where the incomplete transaction caused the database to grow. At the end of this step, the database should be the same size and contain the same information as it did before the start of
︙			︙
592 593 594 595 596 597 598 ~~599 600~~ 601 602 603 604 605 606 607	played back into the database file (and flushed to disk in case we encounter yet another power failure), the hot rollback journal can be deleted.</p> <p>As in <a href="#section_3_11">section 3.11</a>, the journal file might be truncated to zero length or its header might be overwritten with zeros as an optimization on systems where ~~deleting a file is expense. Either way, the journal is no long hot after this step.</p>~~ <br clear="both"> <tcl>hd_fragment cont</tcl> <h3>4.6 Continue As If The Uncompleted Writes Had Never Happened</h3> <img src="images/ac/rollback-5.gif" align="right" hspace="15"> <p>The final recovery step is to reduce the exclusive lock back	\| \|	595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610	played back into the database file (and flushed to disk in case we encounter yet another power failure), the hot rollback journal can be deleted.</p> <p>As in <a href="#section_3_11">section 3.11</a>, the journal file might be truncated to zero length or its header might be overwritten with zeros as an optimization on systems where deleting a file is expensive. Either way, the journal is no longer hot after this step.</p> <br clear="both"> <tcl>hd_fragment cont</tcl> <h3>4.6 Continue As If The Uncompleted Writes Had Never Happened</h3> <img src="images/ac/rollback-5.gif" align="right" hspace="15"> <p>The final recovery step is to reduce the exclusive lock back
︙			︙
948 949 950 951 952 953 954 ~~955~~ 956 957 958 959 960 961 962	<a name="section_7_2"></a> <h3>7.2 Exclusive Access Mode</h3> <p>SQLite version 3.3.14 adds the concept of "Exclusive Access Mode". In exclusive access mode, SQLite retains the exclusive database lock at the conclusion of each transaction. This prevents ~~other processes for accessing the database, but in many deployments~~ only a single process is using a database so this is not a serious problem. The advantage of exclusive access mode is that disk I/O can be reduced in three ways:</p> <ol> <li><p>It is not necessary to increment the change counter in the database header for transactions after the first transaction. This	\|	951 952 953 954 955 956 957 958 959 960 961 962 963 964 965	<a name="section_7_2"></a> <h3>7.2 Exclusive Access Mode</h3> <p>SQLite version 3.3.14 adds the concept of "Exclusive Access Mode". In exclusive access mode, SQLite retains the exclusive database lock at the conclusion of each transaction. This prevents other processes from accessing the database, but in many deployments only a single process is using a database so this is not a serious problem. The advantage of exclusive access mode is that disk I/O can be reduced in three ways:</p> <ol> <li><p>It is not necessary to increment the change counter in the database header for transactions after the first transaction. This
︙			︙
1030 1031 1032 1033 1034 1035 1036 ~~1037~~ 1038 1039 1040 1041 1042 1043 1044	writes. When power is lost, the drive uses energy stored in capacitors and/or the angular momentum of the disk platter to provide power to complete any operation in progress. Nevertheless, there are so many layers in between the write system call and the on-board disk drive electronics that we take the safe approach in both Unix and w32 VFS implementations and assume that sector writes are not atomic. On the other hand, device ~~manufactures with more control over their filesystems might want~~ to consider enabling the atomic write property of xDeviceCharacteristics if their hardware really does do atomic writes.</p> <p>When sector writes are atomic and the page size of a database is the same as a sector size, and when there is a database change that only touches a single database page, then SQLite skips the whole journaling and syncing process and simply writes the modified page	\|	1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047	writes. When power is lost, the drive uses energy stored in capacitors and/or the angular momentum of the disk platter to provide power to complete any operation in progress. Nevertheless, there are so many layers in between the write system call and the on-board disk drive electronics that we take the safe approach in both Unix and w32 VFS implementations and assume that sector writes are not atomic. On the other hand, device manufacturers with more control over their filesystems might want to consider enabling the atomic write property of xDeviceCharacteristics if their hardware really does do atomic writes.</p> <p>When sector writes are atomic and the page size of a database is the same as a sector size, and when there is a database change that only touches a single database page, then SQLite skips the whole journaling and syncing process and simply writes the modified page
︙			︙
1278 1279 1280 1281 1282 1283 1284 ~~1285~~ 1286 1287 1288 1289 1290 1291 1292	<tcl>hd_fragment mvhotjrnl</tcl> <h3>9.5 Deleting Or Renaming A Hot Journal</h3> <p>If a crash or power loss does occur and a hot journal is left on the disk, it is essential that the original database file and the hot journal remain on disk with their original names until the database file is opened by another SQLite process and rolled back. ~~During recovery at <a href="section_4_2">step 4.2</a> SQLite locates~~ the hot journal by looking for a file in the same directory as the database being opened and whose name is derived from the name of the file being opened. If either the original database file or the hot journal have been moved or renamed, then the hot journal will not be seen and the database will not be rolled back.</p> <p>We suspect that a common failure mode for SQLite recovery happens	\|	1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295	<tcl>hd_fragment mvhotjrnl</tcl> <h3>9.5 Deleting Or Renaming A Hot Journal</h3> <p>If a crash or power loss does occur and a hot journal is left on the disk, it is essential that the original database file and the hot journal remain on disk with their original names until the database file is opened by another SQLite process and rolled back. During recovery at <a href="#section_4_2">step 4.2</a> SQLite locates the hot journal by looking for a file in the same directory as the database being opened and whose name is derived from the name of the file being opened. If either the original database file or the hot journal have been moved or renamed, then the hot journal will not be seen and the database will not be rolled back.</p> <p>We suspect that a common failure mode for SQLite recovery happens
︙			︙