Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Fix typos in the new locking document. (CVS 1571) |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
022075517cc0edc5f21de4b92ef188bd |
User & Date: | drh 2004-06-11 22:04:31.000 |
Context
2004-06-12
| ||
00:42 | Use the SQLITE_UTF* symbols instead of the old internal TEXT_Utf* symbols. (CVS 1572) (check-in: 9b84f2f488 user: danielk1977 tags: trunk) | |
2004-06-11
| ||
22:04 | Fix typos in the new locking document. (CVS 1571) (check-in: 022075517c user: drh tags: trunk) | |
17:48 | Documentation of the new pager locking mechanism. (CVS 1570) (check-in: 13cf1ba825 user: drh tags: trunk) | |
Changes
Changes to www/lockingv3.tcl.
︙ | ︙ | |||
27 28 29 30 31 32 33 | set pnum(6) 0 set pnum(7) 0 set pnum(8) 0 HEADING 1 {File Locking And Concurrency In SQLite Version 3} puts { | | | > > | | | | > > > > > | | | | | 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | set pnum(6) 0 set pnum(7) 0 set pnum(8) 0 HEADING 1 {File Locking And Concurrency In SQLite Version 3} puts { <p>Version 3 of SQLite introduces a more complex locking and journaling mechanism designed to improve concurrency and reduce the writer starvation problem. The new mechanism also allows atomic commits of transactions involving multiple database files. This document describes the new locking mechanism. The intended audience is programmers who want to understand and/or modify the pager code and reviewers working to verify the design of SQLite version 3. </p> } HEADING 1 {Overview} puts { <p> Locking and concurrency control are handled by the the <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/pager.c"> pager module</a>. The pager module is responsible for making SQLite "ACID" (Atomic, Consistent, Isolated, and Durable). The pager module makes sure changes happen all at once, that either all changes occur or none of them do, that two or more processes do not try to access the database in incompatible ways at the same time, and that once changes have been written they persist until explicitly deleted. The pager also provides an memory cache of some of the contents of the disk file.</p> <p>The pager is unconcerned with the details of B-Trees, text encodings, indices, and so forth. From the point of view of the pager the database consists of a single file of uniform-sized blocks. Each block is called a "page" and is usually 1024 bytes in size. The pages are numbered beginning with 1. So the first 1024 bytes of the database are called "page 1" and the second 1024 bytes are call "page 2" and so forth. All other encoding details are handled by higher layers of the library. The pager communicates with the operating system using one of several modules (Examples: <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/os_unix.c"> os_unix.c</a>, <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/os_win.c"> os_win.c</a>) that provides a uniform abstraction for operating system services. </p> <p>The pager module effectively controls access for separate threads, or separate processes, or both. Throughout this document whenever the word "process" is written you may substitute the word "thread" without changing the truth of the statement.</p> } HEADING 1 {Locking} puts { <p> From the point of view of a single process, a database file can be in one of five locking states: </p> <p> <table cellpadding="20"> <tr><td valign="top">UNLOCKED</td> <td valign="top"> No locks are held on the database. The database may be neither read nor written. Any internally cached data is considered suspect and subject to verification against the database file before being used. Other processes can read or write the database as their own locking states permit. This is the default state. </td></tr> <tr><td valign="top">SHARED</td> <td valign="top"> The database may be read but not written. Any number of processes can hold SHARED locks at the same time, hence there can be many simultaneous readers. But no other thread or process is allowed to write to the database file while one or more SHARED locks are active. </td></tr> <tr><td valign="top">RESERVED</td> <td valign="top"> |
︙ | ︙ | |||
130 131 132 133 134 135 136 | EXCLUSIVE locks are held. </td></tr> </table> </p> <p> The operating system interface layer understands and tracks all five | | | | | 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | EXCLUSIVE locks are held. </td></tr> </table> </p> <p> The operating system interface layer understands and tracks all five locking states described above. The pager module only tracks four of the five locking states. A PENDING lock is always just a temporary stepping stone on the path to an EXCLUSIVE lock and so the pager module does not track PENDING locks. </p> } HEADING 1 {The Rollback Journal} |
︙ | ︙ | |||
154 155 156 157 158 159 160 | back to its original size on a rollback.</p> <p>The rollback journal is a ordinary disk file that has the same name as the database file with the suffix "<tt>-journal</tt>" added.</p> <p>If SQLite is working with multiple databases at the same time (using the ATTACH command) then each database has its own journal. | | > | > > > > > > > > > > > > > > | | | | | | 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 | back to its original size on a rollback.</p> <p>The rollback journal is a ordinary disk file that has the same name as the database file with the suffix "<tt>-journal</tt>" added.</p> <p>If SQLite is working with multiple databases at the same time (using the ATTACH command) then each database has its own journal. But there is also a separate aggregate journal called the <em>master journal</em>. The master journal does not contain page data used for rolling back changes. Instead the master journal contains the names of the individual file journals for each of the ATTACHed databases. Each of the individual file journals also contain the name of the master journal. If there are no ATTACHed databases (or if none of the ATTACHed database is participating in the current transaction) no master journal is created and the normal rollback journal contains an empty string in the place normally reserved for recording the name of the master journal.</p> <p>A individual file journal is said to be <em>hot</em> if it needs to be rolled back in order to restore the integrity of its database. A hot journal is created when a process is in the middle of a database update and a program or operating system crash or power failure prevents the update from completing. Hot journals are an exception condition. Hot journals exist to facility recovery from crashes and power failures. If everything is working correctly (that is, if there are no crashes or power failures) you will never get a hot journal. </p> <p> If no master journal is involved, then a journal is hot if it exists and its corresponding database file does not have a RESERVED lock. If a master journal is named in the file journal, then the file journal is hot if its master journal exists and there is no RESERVED lock on the corresponding database file. It is important to understand when a journal is hot so the preceding rules will be repeated in bullets: </p> <ul> <li>A journal is hot if... <ul> <li>It exists, and</li> <li>It's master journal exists or the master journal name is an empty string, and</li> <li>There is no RESERVED lock on the corresponding database file.</li> </ul> </li> </ul> } HEADING 2 {Dealing with hot journals} puts { <p> Before reading from a a database file, SQLite always checks to see if that database file has a hot journal. If the file does have a hot journal, then the journal is rolled back before the file is read. In this way, we ensure that the database file is in a consistent state before it is read. </p> <p>When a process wants to read from a database file, it followed the following sequence of steps: </p> <ol> <li>Open the database file and obtain a SHARED lock. If the SHARED lock cannot be obtained, fail immediately and return SQLITE_BUSY.</li> <li>Check to see if the database file has a hot journal. If the file does not have a hot journal, we are done. Return immediately. If there is a hot journal, that journal must be rolled back by the subsequent steps of this algorithm.</li> <li>Acquire a PENDING lock then an EXCLUSIVE lock on the database file. (Note: Do not acquire a RESERVED lock because that would make other processes think the journal was no longer hot.) If we fail to acquire these locks it means another process is already trying to do the rollback. In that case, drop all locks, close the database, and return SQLITE_BUSY. </li> <li>Read the journal file and roll back the changes.</li> <li>Wait for the rolled back changes to be written onto the surface of the disk. This protects the integrity of the database in case another power failure or crash occurs.</li> <li>Delete the journal file.</li> |
︙ | ︙ | |||
248 249 250 251 252 253 254 | we check each of those file journals. If any of the file journals named in the master journal exists and points back to the master journal, then the master journal is not stale. If all file journals are either missing or refer to other master journals or no master journal at all, then the master journal we are testing is stale and can be safely deleted.</p> } | | | | | | 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 | we check each of those file journals. If any of the file journals named in the master journal exists and points back to the master journal, then the master journal is not stale. If all file journals are either missing or refer to other master journals or no master journal at all, then the master journal we are testing is stale and can be safely deleted.</p> } HEADING 1 {Writing to a database file} puts { <p>To write to a database, a process must first acquire a SHARED lock as described above (possibly rolling back incomplete changes if there is a hot journal). After a SHARED lock is obtained, a RESERVED lock must be acquired. The RESERVED lock signals that the process intends to write to the database at some point in the future. Only one process at a time can hold a RESERVED lock. But other processes can continue to read the database while the RESERVED lock is held. </p> <p>If the process that wants to write is unable to obtain a RESERVED lock, it must mean that another process already has a RESERVED lock. In that case, the write attempt fails and returns SQLITE_BUSY.</p> <p>After obtaining a RESERVED lock, the process that wants to write creates a rollback journal. The header of the journal is initialized with the original size of the database file. Space in the journal header is also reserved for a master journal name, though the master journal name is initially empty.</p> <p>Before making changes to any page of the database, the process writes the original content of that page into the rollback journal. Changes to pages are held in memory at first and are not written to the disk. The original database file remains unaltered, which means that other processes can continue to read the database.</p> <p>Eventually, the writing process will want to update the database file, either because its memory cache has filled up or because it is ready to commit its changes. Before this happens, the writer must |
︙ | ︙ | |||
305 306 307 308 309 310 311 | <p> If the reason for writing to the database file is because the memory cache was full, then the writer will not commit right away. Instead, the writer might continue to make changes to other pages. Before subsequent changes are written to the database file, the rollback journal must be flushed to disk again. Note also that the EXCLUSIVE lock that the writer obtained in order to write to the database initially | | > | | | 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 | <p> If the reason for writing to the database file is because the memory cache was full, then the writer will not commit right away. Instead, the writer might continue to make changes to other pages. Before subsequent changes are written to the database file, the rollback journal must be flushed to disk again. Note also that the EXCLUSIVE lock that the writer obtained in order to write to the database initially must be held until all changes are committed. That means that no other processes are able to access the database from the time the memory cache first spills to disk until the transaction commits. </p> <p> When a writer is ready to commit its changes, it executes the following steps: </p> |
︙ | ︙ | |||
373 374 375 376 377 378 379 380 381 382 383 384 | will persist. </li> <li>Delete all individual journal files. <li>Drop the EXCLUSIVE and PENDING locks from all database files. </li> </ol> } HEADING 1 {How To Corrupt Your Database Files} puts { <p>The pager module is robust but it is not completely failsafe. | > > > > > > > > > > > > > > > > > > > | | | | 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | will persist. </li> <li>Delete all individual journal files. <li>Drop the EXCLUSIVE and PENDING locks from all database files. </li> </ol> } HEADING 2 {Writer starvation} puts { <p>In SQLite version 2, if many processes are reading from the database, it might be the case that there is never a time when there are no active readers. And if there is always at least one read lock on the database, no process would ever be able to make changes to the database because it would be impossible to acquire a write lock. This situation is called <em>writer starvation</em>.</p> <p>SQLite version 3 seeks to avoid writer starvation through the use of the PENDING lock. The PENDING lock allows existing readers to continue but prevents new readers from connecting to the database. So when a process wants to write a busy database, it can set a PENDING lock which will prevent new readers from coming in. Assuming existing readers do eventually complete, all SHARED locks will eventually clear and the writer will be given a chance to make its changes.</p> } HEADING 1 {How To Corrupt Your Database Files} puts { <p>The pager module is robust but it is not completely failsafe. It can be subverted. This section attempts to identify and explain the risks.</p> <p> Clearly, a hardware or operating system fault that introduces incorrect data into the middle of the database file or journal will cause problems. Likewise, if a rogue process opens a database file or journal and writes malformed data into the middle of it, then the database will become corrupt. There is not much that can be done about these kinds of problems so they are given no further attention. </p> <p> SQLite uses POSIX advisory locks to implement locking on Unix. On windows it uses the LockFile(), LockFileEx(), and UnlockFile() system calls. SQLite assumes that these system calls all work as advertised. If that is not the case, then database corruption can result. One should note that POSIX advisory locking is known to be buggy or even unimplemented on many NFS implementations (including recent versions of Mac OS X) and that there are reports of locking problems for network filesystems under windows. Your best defense is to not use SQLite for files on a network filesystem. </p> <p> SQLite uses the fsync() system call to flush data to the disk under Unix and it uses the FlushFileBuffers() to do the same under windows. Once again, |
︙ | ︙ | |||
447 448 449 450 451 452 453 | <p> The last (fourth) bullet above merits additional comment. When SQLite creates a journal file on Unix, it opens the directory that contains that file and calls fsync() on the directory, in an effort to push the directory information to disk. But suppose some other process is adding or removing unrelated files to the directory that contains the database and journal at the the moment of a power failure. The supposedly unrelated actions of this other | | | 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 | <p> The last (fourth) bullet above merits additional comment. When SQLite creates a journal file on Unix, it opens the directory that contains that file and calls fsync() on the directory, in an effort to push the directory information to disk. But suppose some other process is adding or removing unrelated files to the directory that contains the database and journal at the the moment of a power failure. The supposedly unrelated actions of this other process might result in the journal file being dropped from the directory and moved into "lost+found". This is an unlikely scenario, but it could happen. The best defenses are to use a journaling filesystem or to keep the database and journal in a directory by themselves. </p> <p> For a commit involving multiple databases and a master journal, if the |
︙ | ︙ | |||
476 477 478 479 480 481 482 | HEADING 1 {Transaction Control At The SQL Level} puts { <p> The changes to locking and concurrency control in SQLite version 3 also introduce some subtle changes in the way transactions work at the SQL language level. | | > | > > > > > > > > > | 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 | HEADING 1 {Transaction Control At The SQL Level} puts { <p> The changes to locking and concurrency control in SQLite version 3 also introduce some subtle changes in the way transactions work at the SQL language level. By default, SQLite version 3 operates in <em>autocommit</em> mode. In autocommit mode, all changes to the database are committed as soon as all operations associated with the current database connection complete.</p> <p>The SQL command "BEGIN TRANSACTION" (the TRANSACTION keyword is optional) is used to take SQLite out of autocommit mode. Note that the BEGIN command does not acquire any locks on the database. After a BEGIN command, a SHARED lock will be acquired when the first SELECT statement is executed. A RESERVED lock will be acquired when the first INSERT, UPDATE, or DELETE statement is executed. No EXCLUSIVE lock is acquired until either the memory cache fills up and must be spilled to disk or until the transaction commits. In this way, the system delays blocking read access to the file file until the last possible moment. </p> <p>The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur. The SQL command "ROLLBACK" also operates by turning autocommit back on, but it also sets a flag that tells the autocommit logic to rollback rather than commit.</p> <p>If the SQL COMMIT command turns autocommit on and the autocommit logic then tries to commit change but fails because some other process is holding a SHARED lock, then autocommit is turned back off automatically. This allows the user to retry the COMMIT at a later time after the SHARED lock has had an opportunity to clear.</p> <p>If multiple commands are being executed against the same SQLite database connection at the same time, the autocommit is deferred until the very last command completes. For example, if a SELECT statement is being executed, the execution of the command will pause as each row of the result is returned. During this pause other INSERT, UPDATE, or DELETE commands can be executed against other tables in the database. But none of these changes will commit until the original SELECT statement finishes. </p> } footer $rcsid |