Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Reduce the number of requirements in fileformat.in governing updating the database file. It is not possible to have too much detail without also defining the expectations SQLite has of a file-system, which is not in scope. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
fcd47d8a184e741e21a7c775ada39a64 |
User & Date: | dan 2009-05-20 08:58:52.000 |
Context
2009-05-20
| ||
11:36 | Modifications to the introduction of the file format document. (check-in: 0f0af25556 user: dan tags: trunk) | |
08:58 | Reduce the number of requirements in fileformat.in governing updating the database file. It is not possible to have too much detail without also defining the expectations SQLite has of a file-system, which is not in scope. (check-in: fcd47d8a18 user: dan tags: trunk) | |
2009-05-18
| ||
23:21 | Last minute tweaks to the 3.6.14.1 release documentation. (check-in: f8fba45763 user: drh tags: trunk) | |
Changes
Changes to pages/fileformat.in.
1 2 3 4 5 6 7 8 9 10 11 12 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <link type="text/css" rel="stylesheet" href="images/fileformat/rtdocs.css"> </head> <body> <div id=document_title>SQLite Database File Format</div> <div id=toc_header>Table Of Contents</div> <tcl> | > | 1 2 3 4 5 6 7 8 9 10 11 12 13 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <link type="text/css" rel="stylesheet" href="images/fileformat/rtdocs.css"> <title>SQLite Database File Format</title> </head> <body> <div id=document_title>SQLite Database File Format</div> <div id=toc_header>Table Of Contents</div> <tcl> |
︙ | ︙ | |||
151 152 153 154 155 156 157 | <p> It is intended that this document shall provide all the information required to create an system that reads and writes SQLite databases in a way that is completely compatible with SQLite itself. Or, put another way, this document defines the protocols that all SQLite database users (including SQLite) are required to follow. The availability of this information makes an SQLite | | | | 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | <p> It is intended that this document shall provide all the information required to create an system that reads and writes SQLite databases in a way that is completely compatible with SQLite itself. Or, put another way, this document defines the protocols that all SQLite database users (including SQLite) are required to follow. The availability of this information makes an SQLite database an even safer choice for long-term data storage. If at some point in the future the SQLite software library cannot be used to access an SQLite database that contains useful data, a procedure or software module may be developed based on the content of this document to extract the required data. <p> As well as file format descriptions, this document also describes the way in which SQLite compatible clients are required to lock database files when |
︙ | ︙ | |||
1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 | the page) of the next block in the list stored as a big-endian unsigned integer. The first two bytes of the final block in the list are set to zero. The third and fourth bytes of each free block contain the total size of the free block in bytes, stored as a 2 byte big-endian unsigned integer. </ul> [fileformat_import_requirement2 H30810] [fileformat_import_requirement2 H30820] <p> The following requirements describe the <i>B-Tree page header</i> present at the start of both index and table B-Tree pages. | > > > | 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 | the page) of the next block in the list stored as a big-endian unsigned integer. The first two bytes of the final block in the list are set to zero. The third and fourth bytes of each free block contain the total size of the free block in bytes, stored as a 2 byte big-endian unsigned integer. </ul> <p class=todo> The list of free blocks is kept in order, sorted by offset. Right? [fileformat_import_requirement2 H30810] [fileformat_import_requirement2 H30820] <p> The following requirements describe the <i>B-Tree page header</i> present at the start of both index and table B-Tree pages. |
︙ | ︙ | |||
1451 1452 1453 1454 1455 1456 1457 | consumed by the cell's record, stored as a variable length integer (see section <cite>varint_format</cite>). <p> If the record is small enough, it is stored verbatim in the cell. A record is deemed to be small enough to be completely stored in the cell if it consists of less than: <pre> | | | 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 | consumed by the cell's record, stored as a variable length integer (see section <cite>varint_format</cite>). <p> If the record is small enough, it is stored verbatim in the cell. A record is deemed to be small enough to be completely stored in the cell if it consists of less than: <pre> <i>max-local</i> := (<i>usable-size</i> - 12) * <i>max-embedded-fraction</i> / 255 - 23 </pre> <p> bytes. In the formula above, <i>usable-size</i> is the page-size in bytes less the number of unused bytes left at the end of every page (as read from byte offset 20 of the file header), and <i>max-embedded-fraction</i> is the value read from byte offset 21 of the file header. |
︙ | ︙ | |||
1541 1542 1543 1544 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 | a way to navigate the tree structure. [fileformat_import_requirement2 H31020] [fileformat_import_requirement2 H31030] [fileformat_import_requirement2 H31040] [fileformat_import_requirement2 H31050] <p> The precise way in which table B-Tree pages and cells are formatted is described in subsequent sections. [h4 "Table B-Tree Content" table_btree_content] <p> The database file contains one table B-Tree for each database table | > > > > | 1545 1546 1547 1548 1549 1550 1551 1552 1553 1554 1555 1556 1557 1558 1559 1560 1561 1562 | a way to navigate the tree structure. [fileformat_import_requirement2 H31020] [fileformat_import_requirement2 H31030] [fileformat_import_requirement2 H31040] [fileformat_import_requirement2 H31050] <p class=todo> The special case for root page 1. Root page 1 may contain zero cells, just a right-child pointer to the only other b-tree page in the tree. <p> The precise way in which table B-Tree pages and cells are formatted is described in subsequent sections. [h4 "Table B-Tree Content" table_btree_content] <p> The database file contains one table B-Tree for each database table |
︙ | ︙ | |||
1965 1966 1967 1968 1969 1970 1971 | Sub-section <cite>reading_from_files</cite> contains a precise description of the various ways a database image may be distributed between the database file and journal file, and the rules that must be followed to extract it. In other words, a description of how SQLite or compatible software reads the database image from the file-system. | < < < < < < < < < | 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 | Sub-section <cite>reading_from_files</cite> contains a precise description of the various ways a database image may be distributed between the database file and journal file, and the rules that must be followed to extract it. In other words, a description of how SQLite or compatible software reads the database image from the file-system. [h2 "Journal File Formats" journal_file_formats] <p> The following sub-sections describe the formats used by SQLite journal files (section <cite>journal_file_format</cite>) and master journal files (section <cite>masterjournal_file_format</cite>). |
︙ | ︙ | |||
2417 2418 2419 2420 2421 2422 2423 | The following requirements dictate the way in which the data for each page of the database image can be located within the file-system by a database reader. [fileformat_import_requirement2 H32070] [fileformat_import_requirement2 H32080] | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | < | | > > > > > > > | > > | > > > > > > | > | > > | > | | | | 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 2428 2429 2430 2431 2432 2433 2434 2435 2436 2437 2438 2439 2440 2441 2442 2443 2444 2445 2446 2447 2448 2449 2450 2451 2452 2453 2454 2455 2456 2457 2458 2459 2460 2461 2462 2463 2464 2465 2466 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 | The following requirements dictate the way in which the data for each page of the database image can be located within the file-system by a database reader. [fileformat_import_requirement2 H32070] [fileformat_import_requirement2 H32080] [h1 "SQLite Interoperabilty Requirements" interoperability_requirements] <p> This section contains requirements that further constrains the behaviour of software that accesses (reads and/or writes) SQLite databases stored within the file-system. These requirements need only be implemented by systems that access databases while other clients may also be doing so. More specifically, they need only be implemented by software operating within a system where one or more of the database clients writes to the database. If the database file-system representation remains constant at all times, or if there is only ever a single database client for each database within the system, the requirements in this section can be ignored. <p> The requirements in this section fall into three categories: <ul> <li> <p><b>Database Writer Requirements</b>. Section <cite>writing_database</cite> contains notes on and requirements that must be observed by software systems that update an existing SQLite database image within the file-system. <li> <p><b>Locking Requirements</b>. Section <cite>locking_protocol</cite> contains a description of the file-system locks that must be obtained on the database file, and how locks placed by other database clients should be interpreted. <li> <p><b>Header Cookie Requirements</b>. An SQLite database image header (see section <cite>file_header</cite>) contains two "cookie" values that must sometimes be incremented when the database image stored in the file-system is updated. Section <cite>database_header_cookies_protocol</cite> contains requirements identifying exactly when the cookie values must be incremented, and how they can be used by a database client to determine if cached data is valid or not. </ul> [h2 "Writing to an SQLite Database File" writing_database] <p> When writing to an SQLite database, the database representation on disk must be modified to reflect the new, modified, database image. Exactly how this is done in terms of raw IO operations depends on the characteristics of the file-system in which the database is stored and the degree to which the application is required to handle failures within the system. A failure may be an application crash, an operating system crash, a power failure or other unexpected event that terminates processing. For example, SQLite itself runs in several different modes with various levels of guarantees on how failures are handled as follows: <ul> <li> <b>In-memory journal mode</b> (PRAGMA journal_mode=memory). In this mode any failure may cause database file-system corruption, including an application crash or unexpected exit. <li> <b>Non-synchronous mode</b> (PRAGMA sychronous=off). In this mode an application crash or unexpected exit may not cause database corruption, however an operating system crash or power failure may. <li> <b>Synchronous mode</b> (PRAGMA sychronous=full). In this mode neither an application crash, operating system crash or power failure may cause database file-system corruption. </ul> <p> If a process attempts to modify a database so as to replace database image A with database image B and a failure occurs while doing so, then following recovery the file-system must contain a database image equivalent to A or B. Otherwise, the database file-system is considered corrupt. <p> Two database images are considered to be equivalent if each of the following are true: <ul> <li> <p> The two database images have the same page-size. <li> <p> The two database images have the same number of pages. |
︙ | ︙ | |||
2456 2457 2458 2459 2460 2461 2462 | different blob stored on a free-list leaf page and still be considered equivalent. This concept can sometimes be exploited to more efficiently update an SQLite database file-system representation. [fileformat_import_requirement2 H32290] <p> | | | | > > > > < | | | | > | | < | > | > | > | | | | 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 2539 2540 2541 2542 2543 2544 2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 | different blob stored on a free-list leaf page and still be considered equivalent. This concept can sometimes be exploited to more efficiently update an SQLite database file-system representation. [fileformat_import_requirement2 H32290] <p> The following requirement constrains the way in which a database file-system representation may be updated. In many ways, it is equivalent to "do not corrupt the database file-system representation under those conditions where the file-system should not be corrupted". The definition of "handled failure" depends on the mode that SQLite is running in (or on the requirements of the external system accessing the database file-system representation). [fileformat_import_requirement2 H32300] <p> The following two sections, <cite>rollback_journal_method</cite> and <cite>atomic_write_method</cite>, are somewhat advisory in nature. They contain descriptions of two different methods used by SQLite to modify a database image within a database file-system representation in accordance with the above requirements. These are not the only methods that could be used. So long as the above requirements (and those in sections <cite>locking_protocol</cite> and <cite>database_header_cookies_protocol</cite>) are honoured, any method may be used by an SQLite database writer to update the database file-system representation. Sections <cite>rollback_journal_method</cite> and <cite>atomic_write_method</cite> do not contain formal requirements. Formal requirements governing the way in which SQLite safely updates database file-system representations may be found in <span class=todo>ref</span>. <p class=todo> Refer to webpage "How SQLite Implements Atomic Commit" too? [h3 "The Rollback-Journal Method" rollback_journal_method] <p> This section describes the method usually used by SQLite to update a database image within a database file-system representation. This is one way to modify a database image in accordance with the requirements in the parent and other sections. When overwriting database image A with database image B using this method, assuming that to begin with database image A is entirely contained within the database file and that the page-size of database image B is the same as that of database image A, the following steps are taken: <ol> <li> <p>The start of the journal file is populated with data that is not a valid journal header. |
︙ | ︙ | |||
2557 2558 2559 2560 2561 2562 2563 | [Figure filesystem2.gif figure_filesystem2 "Interim file-system state used to atomically overwrite database image ABCD with AEC"] <p> The procedure described above can be onourous to implement, as it requires that the data for all modified pages of database image B be available (presumably in main memory) at the same time, when step 4 is performed. For transactions that write to a large number of database pages, this | | > > > > > > > > > > > > | > > > > > > > > > > > > | > > > > > > > > | > > > | > > > > > > > > > < > | 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658 2659 2660 2661 2662 2663 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 | [Figure filesystem2.gif figure_filesystem2 "Interim file-system state used to atomically overwrite database image ABCD with AEC"] <p> The procedure described above can be onourous to implement, as it requires that the data for all modified pages of database image B be available (presumably in main memory) at the same time, when step 4 is performed. For transactions that write to a large number of database pages, this may be undesirable. A solution is to create a journal-file containing two or more journal headers. If, while modifying a database image within main-memory, a client wishes to reduce the amount of data held in memory, it may perform steps 3 and 4 of the above procedure in order to write modified content out to the file-system. Once the modified pages have been written into the database file, the in-memory copies may be discarded. The writer process may then continue accumulating changes in memory. When it is ready to write these changes out to the file-system, either to free up main-memory or because all changes associated with the transaction have been prepared, it adds a second (or subsequent) journal header to the journal file, followed by journal records containing the original data for pages about to be modified. It may then write the changes accumulated in-memory to the database file, as described in step 4 above. <p> This technique can also be modified to support atomic modification of multiple databases. In this case the first 4 steps of the procedure outlined above are followed for each individual database. Following this a master-journal file is created somewhere within the file-system and a master-journal pointer added to each individual journal file. Since a journal-file that contains a master-journal pointer to a master-journal file that does not exist is considered invalid (requirement H32000), all journal-files may be simultaneously invalidated by deleting the master-journal file from the file-system. This delete operation takes the place of step 5 of the procedure as outlined above. [h3 "The Atomic-Write Method" atomic_write_method] <p> On some systems, SQLite is able to overwrite a single page of the database file as an atomic operation. If, while updating the page, a failure occurs, the system guarantees that following recovery, the page will be found to have been correctly and completely updated or not modified at all. When running in such an environment, if SQLite is required to update a database image so that only a single page is modified, it can do so simply by overwriting the page. <p> Assuming the database page being updated is not page 1, if requirement H33040 requires that the database header change counter be updated, then the database image modification is no longer confined to a single page. In this case it can be split in two: SQLite first atomically updates page 1 of the database file to increment the database header change counter, then updates the page that it is actually required to update using a second atomic write operation. If a failure occurs some time between the two write operations, following recovery the database image may be found to be in its original state except for the value of the database header change counter <span class=todo>It would be good to have some requirement to say that that is Ok. Some modification to the definition of equivalent databases perhaps.</span> <!-- <p> The following requirements require that the journal header at the start of a journal file is set to contain the original database page-size and page-count and written to non-volatile storage before the size of the database file is modified. And that once the size of the database file has been modified, the journal header does not become unstable and the page-size |
︙ | ︙ | |||
2591 2592 2593 2594 2595 2596 2597 | <p> Journal before truncate: [fileformat_import_requirement2 H32360] [fileformat_import_requirement2 H32370] | | < < | 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 | <p> Journal before truncate: [fileformat_import_requirement2 H32360] [fileformat_import_requirement2 H32370] h4 "Multiple Database Transactions" multi_db_transactions <p> SQLite is required to do make all modifications associated with the transaction such that the database image is modified atomically. If an application, OS or power failure occurs while SQLite is updating the database, upon recovery the contents of the database must reflect either that all modifications |
︙ | ︙ | |||
2665 2666 2667 2668 2669 2670 2671 | operation can be used to effect all required changes to the logical database contents. <p> The following two sub-sections describe the specific ways in which SQLite achieves this for single and multiple database transactions. | | | 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 | operation can be used to effect all required changes to the logical database contents. <p> The following two sub-sections describe the specific ways in which SQLite achieves this for single and multiple database transactions. h4 "Single Database Transactions" single_db_transactions <p> In order to atomically modify the database image stored in the file-system from database image A to database image B, the file-system must first be manipulated to a state where it contains the database image A, but can by a single atomic operation be modified to contain database image B. A file-system state that has the following properties satisfies |
︙ | ︙ | |||
2703 2704 2705 2706 2707 2708 2709 | <li> Truncating the journal file to zero bytes in size, or <li> Overwriting some or all of the first 8 bytes in the journal file so that the journal file no longer contains a well-formed journal header. </ul> | | < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < | 2805 2806 2807 2808 2809 2810 2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 | <li> Truncating the journal file to zero bytes in size, or <li> Overwriting some or all of the first 8 bytes in the journal file so that the journal file no longer contains a well-formed journal header. </ul> h4 "Multiple Database Transactions" multi_db_transactions <p class=todo> Deleting the master-journal is used as the atomic operation. --> [h2 "SQLite Locking Protocol" locking_protocol] <p> An SQLite database client may hold at any time one of four different types of locks on a database file-system representation. This document does not describe how these locks are to be implemented. Possible implementation techniques include mapping the four SQLite locks to operating system file |
︙ | ︙ |
Changes to req/hlr30000.txt.
︙ | ︙ | |||
850 851 852 853 854 855 856 | corresponding pointer-map entry is set to the value 0x05 and the parent page number field is set to the page number of the parent node in the B-Tree structure. HLR H32000 | | | < | | 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 | corresponding pointer-map entry is set to the value 0x05 and the parent page number field is set to the page number of the parent node in the B-Tree structure. HLR H32000 If a journal file contains a well-formed master-journal pointer and the named master-journal file does not exist then the journal file shall be considered invalid. HLR H32010 If the first 28 bytes of a <i>journal file</i> do not contain a well-formed <i>journal header</i>, then the <i>journal file</i> shall be considered invalid. HLR H32020 |
︙ | ︙ | |||
1024 1025 1026 1027 1028 1029 1030 | HLR H32290 Two database images shall be considered to be equivalent if they (a) have the same page size, (b) contain the same number of pages and (c) the content of each page of the first database image that is not a free-list leaf page is the same as the content of the corresponding page in the second database image. HLR H32300 | < < < < < | | | 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 | HLR H32290 Two database images shall be considered to be equivalent if they (a) have the same page size, (b) contain the same number of pages and (c) the content of each page of the first database image that is not a free-list leaf page is the same as the content of the corresponding page in the second database image. HLR H32300 If, while writing to an SQLite database file-system representation in order to replace database image A with database image B, a failure that should be handled gracefully occurs, then following recovery the database file-system representation shall contain a database image equivalent to either A or B. HLR H32320 When using the rollback-journal method to modify the file-system representation |
︙ | ︙ | |||
1114 1115 1116 1117 1118 1119 1120 | If another database client holds either a RESERVED or PENDING lock on the database file-system representation, then any journal file that exists within the file system shall be considered invalid. HLR H33040 A database writer shall increment the value of the database header change | | | 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 | If another database client holds either a RESERVED or PENDING lock on the database file-system representation, then any journal file that exists within the file system shall be considered invalid. HLR H33040 A database writer shall increment the value of the database header change counter field (H30100) as part of the first database image modification that it performs after obtaining an EXCLUSIVE lock. HLR H33050 A database writer shall increment the value of the database schema version field (H30110) as part of the first database image modification that includes a schema change that it performs after obtaining an EXCLUSIVE lock. |
︙ | ︙ |