Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Continuing work on the walformat.html document. This is an incremental check-in. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
627feecddc267ec7b8bfd2bd07961aab |
User & Date: | drh 2017-11-13 17:03:01.815 |
Context
2017-11-13
| ||
23:06 | Continuing work on the walformat.html document. (check-in: 74c71c744b user: drh tags: trunk) | |
17:03 | Continuing work on the walformat.html document. This is an incremental check-in. (check-in: 627feecddc user: drh tags: trunk) | |
13:11 | Typos in the new walindex.html document. (check-in: 59536a697b user: drh tags: trunk) | |
Changes
Changes to pages/fileformat2.in.
︙ | ︙ | |||
380 381 382 383 384 385 386 387 388 389 390 391 392 393 | "version-valid-for number". <h3>Header space reserved for expansion</h3> <p>All other bytes of the database file header are reserved for future expansion and must be set to zero.</p> <h2>The Lock-Byte Page</h2> <p>The lock-byte page is the single page of the database file that contains the bytes at offsets between 1073741824 and 1073742335, inclusive. A database file that is less than or equal to 1073741824 bytes in size contains no lock-byte page. A database file larger than 1073741824 contains exactly one lock-byte page. | > | 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 | "version-valid-for number". <h3>Header space reserved for expansion</h3> <p>All other bytes of the database file header are reserved for future expansion and must be set to zero.</p> <tcl>hd_fragment lockbyte {lock-byte page}</tcl> <h2>The Lock-Byte Page</h2> <p>The lock-byte page is the single page of the database file that contains the bytes at offsets between 1073741824 and 1073742335, inclusive. A database file that is less than or equal to 1073741824 bytes in size contains no lock-byte page. A database file larger than 1073741824 contains exactly one lock-byte page. |
︙ | ︙ |
Changes to pages/walformat.in.
︙ | ︙ | |||
196 197 198 199 200 201 202 | </tr> <tr> <td>100..119</td> <td>Five "read marks". Each read mark is a 32-bit unsigned integer (4 bytes). </tr> <tr> <td>120..127</td> | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 | </tr> <tr> <td>100..119</td> <td>Five "read marks". Each read mark is a 32-bit unsigned integer (4 bytes). </tr> <tr> <td>120..127</td> <td>Unused space set aside for 8 file locks. </tr> <tr> <td>128..132</td> <td>Number of WAL frames that have attempted to be backfilled but which might not have been backfilled successfully. </tr> <tr> <td>132..136</td> <td>Unused space reserved for futher expansion. </tr> </table> </center> <tcl>hd_fragment locks {WAL-mode locks}</tcl> <h3>WAL Locks</h3> <p>Eight bytes of space are set aside in the header to support file locking using the xShmLock() method in the [sqlite3_io_methods] object. These eight bytes are never read nor written by SQLite since some VFSes (ex: Windows) might implement locks using mandatory file locks. <p>These are the eight locks supported: <center> <i>WAL-Index Locks Controlled By xShmLock()</i><br> <table border=1> <tr><th rowspan=2>Name<th colspan=2>Offset <tr><th>xShmLock<th>File <tr> <td>WAL_WRITE_LOCK <td>0 <td>120 </tr> <tr> <td>WAL_CKPT_LOCK <td>1 <td>121 </tr> <tr> <td>WAL_RECOVER_LOCK <td>2 <td>122 </tr> <tr> <td>WAL_READ_LOCK(0) <td>3 <td>123 </tr> <tr> <td>WAL_READ_LOCK(1) <td>4 <td>124 </tr> <tr> <td>WAL_READ_LOCK(2) <td>5 <td>125 </tr> <tr> <td>WAL_READ_LOCK(3) <td>6 <td>126 </tr> <tr> <td>WAL_READ_LOCK(4) <td>7 <td>127 </tr> </table> </center> <tcl>hd_fragment nbackfill nBackfill</tcl> <h3>The nBackfill field</h3> <p>The 32-bit unsigned integer at offset 128 in the WAL-index header is called the "nBackfill". this field holds the number of frames in the WAL file for which an attempt has been made to checkpoint. In other words, this is the number of frames that have been copied from the WAL back into the main database. The number is actually the number of "attempts" since the number is updated prior to syncing the main database, so <p>The nBackfill can only be increased while holding the WAL_CKPT_LOCK. However, nBackfill is changed to zero when the WAL file resets, and this happens while holding the WAL_WRITE_LOCK. <p><i>TBD: More information about the header</i> <h2>WAL-Index Hash Tables</h2> <p>The hash tables in the shm file are designed to answer the following question quickly: |
︙ | ︙ | |||
321 322 323 324 325 326 327 | very small. <p>Note that each 32768-byte unit of the shm file has its own aHash and aPgno arrays. The aHash array for a single unit is only helpful in finding aPgno entries in that same unit. The overall FindFrame(P,M) function needs to do hash lookups beginning with the latest unit and working backwards to the oldest unit until it finds an answer. | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 | very small. <p>Note that each 32768-byte unit of the shm file has its own aHash and aPgno arrays. The aHash array for a single unit is only helpful in finding aPgno entries in that same unit. The overall FindFrame(P,M) function needs to do hash lookups beginning with the latest unit and working backwards to the oldest unit until it finds an answer. <h2>Locking Matrix</h2> <p>Access is coordinated in WAL mode using both the legacy DELETE-mode locks controlled by the xLock and xUnlock methods of the [sqlite3_io_methods] object and the WAL locks controlled by the xShmLock method of the [sqlite3_io_methods] object. <p>Conceptually, there is just a single DELETE-mode lock. The DELETE-mode lock for a single database connection can be in exactly one of the following states: <ol> <li value='0'> SQLITE_LOCK_NONE (unlocked) <li value='1'> SQLITE_LOCK_SHARED (reading) <li value='2'> SQLITE_LOCK_RESERVED (reading, waiting to write) <li value='3'> SQLITE_LOCK_PENDING (new readers blocked, waiting to write) <li value='4'> SQLITE_LOCK_EXCLUSIVE (writing) </ol> <p>The DELETE-mode locks are stored on the [lock-byte page] of the main database file. Only SQLITE_LOCK_SHARED and SQLITE_LOCK_EXCLUSIVE are factors for WAL-mode databases. The other locking states are used in rollback-mode, but not in WAL-mode. <p>The [WAL-mode locks] are described above. <h3>How the various locks are used</h3> <p>The following rules show how each of the locks is used. <ul> <li><p><b>SQLITE_LOCK_SHARED</b> <p> All connections hold SQLITE_LOCK_SHARED continuously while attached to a WAL-mode database. This is true for both read/write connections and read-only connections. The SQLITE_LOCK_SHARED lock is held even by connections that are not within transaction. This is different from rollback mode, where the SQLITE_LOCK_SHARED is released at the end of each transaction. <li><p><b>SQLITE_LOCK_EXCLUSIVE</b> <p> Connections hold an exclusive lock when change in between WAL mode and any of the various rollback-modes. Connections might also attempt to obtain an EXCLUSIVE lock when they disconnection from WAL mode. If a connection is able to obtain an EXCLUSIVE lock, that means it is the only connection to the database and so it may attempt to checkpoint and then delete the WAL-index and WAL files. <p> When a connection is holding a SHARED lock on the main database, that will prevent any other connection from acquiring the EXCLUSIVE lock, which in turn prevents the WAL-index and WAL files from being deleted out from under other users, and prevents a transition out of WAL-mode while other users are accessing the database in WAL-mode. <li><p><b>WAL_WRITE_LOCK</b> <p> The WAL_WRITE_LOCK is only locked exclusively. There is never a shared lock taken on WAL_WRITE_LOCK. <p> An EXCLUSIVE WAL_WRITE_LOCK is held by any connection that is appending content to the end of the WAL. Hence, only a single process at a time can append content to the WAL. If the WAL file resets as a consequence of a write, then the [nBackfill] field of the WAL-index header is reset to zero while holding this lock. <p>An EXCLUSIVE is also held WAL_WRITE_LOCK, and on several other locking bytes, when a connection is running [recovery] on the shared WAL-index. <li><p><b>WAL_CKPT_LOCK</b> <p> The WAL_CKPT_LOCK is only locked exclusively. There is never a shared lock taken on WAL_CKPT_LOCK. <p> An EXCLUSIVE WAL_CKPT_LOCK is held by any connection that is running a [checkpoint]. The [nBackfill] field of the WAL-index header may be increased while holding this exclusive lock, but it may not be decreased. <p>An EXCLUSIVE is also held WAL_CKPT_LOCK, and on several other locking bytes, when a connection is running [recovery] on the shared WAL-index. <li><p><b>WAL_RECOVER_LOCK</b> <p> The WAL_RECOVER_LOCK is only locked exclusively. There is never a shared lock taken on WAL_RECOVER_LOCK. <p> An EXCLUSIVE WAL_RECOVER_LOCK is held by any connection that is running [recovery] to reconstruct the shared WAL-index. <p> A read-only connection that is rebuilding its private heap-memory WAL-index does not hold this lock. (It cannot, since read-only connections are not allowed to hold any exclusive locks.) This lock is only held when rebuilding the global shared WAL-index contained in the memory-mapped SHM file. <p> In addition to locking this byte, a connection running [recovery] also gets an exclusive lock on all other WAL locks except for WAL_READ_LOCK(0). <li><p><b>WAL_READ_LOCK(N)</b> <p> There are five separate read locks, numbers 0 through 4. Read locks may be either SHARED or EXCLUSIVE. Connections obtain a shared lock on one of the read locks bytes while they are within a transaction. Connections also obtain an exclusive lock on read locks, one at a time, for the brief moment while they are updating the values of the corresponding read-marks. Read locks 1 through 4 are held exclusively when running [recovery]. <p> Each read lock byte corresponds to one of the five 32-bit read-mark integers located in bytes 100 through 119 of the WAL-index header. </ul> <h3>Operations that require locks and which locks those operations use</h3> <ul> <li><p><b>Transition into and out of WAL-mode</b> <p>The SQLITE_LOCK_EXCLUSIVE lock must be held by a connection that wants to transition into our out of WAL mode. Transitioning into WAL mode is, therefore, just like any other write transaction, since every write transaction in rollback mode requires the SQLITE_LOCK_EXCLUSIVE lock. If the database file is already in WAL mode (hence if the desire it to change it back into rollback mode) and if there are two or more connections to the database, then each of these connections will be holding an SQLITE_LOCK_SHARED lock. That means that the SQLITE_LOCK_EXCLUSIVE cannot be obtained, and the transition out of WAL mode will not be allowed. This prevents one connection from deleting WAL mode out from under another. It also means that the only way to move a database from WAL mode into rollback mode is to close all but one connection to the database. <li><p><b>Close a connection to a WAL mode database</b> <p>When a database connection closes (via [sqlite3_close()] or [sqlite3_close_v2()]), an attempt is made to acquire SQLITE_LOCK_EXCLUSIVE. If this attempt is successful, that means the connection that is closing is the last connection to the database. In that case, it is desirable to clean up the WAL and WAL-index files, so the closing connection runs a [checkpoint] (while holding SQLITE_LOCK_EXCLUSIVE) and the deletes both the WAL and WAL-index files. The SQLITE_LOCK_EXCLUSIVE is not released until after both the WAL and WAL-index files have been deleted. <p>If the application invokes [sqlite3_file_control]([SQLITE_FCNTL_PERSIST_WAL]) on the database connection prior to closing, then the final checkpoint is still run but the WAL and WAL-index files are not deleted as they normally would be. This leaves the database in a state that allows other processes without write permission on the database, WAL, or WAL-index files to open the database read-only. If the WAL and WAL-index files are missing, then a process that lacks permission to create and initialize those files will not be able to open the database, unless the database is designated as immutable using the [immutable query parameter]. <li><p><b>Create a new WAL and WAL-index as the initial connection to a WAL-mode database</b> <li><p><b>Reconstruct the global shared WAL-index during [recovery]</b> <li><p><b>Append a new transaction to the end of the WAL</b> <li><p><b>Read content from the database and WAL as part of a transaction</b> <li><p><b>Run a checkpoint</b> <li><p><b>Reset the WAL file</b> </ul> <tcl>hd_fragment recovery recovery {WAL-mode crash recovery}</tcl> <h1>Crash Recovery</h1> <p> After an application crash, or a system crash, or a power failure, the database might be left in an inconsistent state. When that happens, the first connection to the database will need to run a recovery procedure to restore the database to a consistent state before it begins using the database. This section describes that procedure. <p><i>TBD...</i></p> |