Documentation Source Text

Check-in [fcd47d8a18]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Reduce the number of requirements in fileformat.in governing updating the database file. It is not possible to have too much detail without also defining the expectations SQLite has of a file-system, which is not in scope.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:fcd47d8a184e741e21a7c775ada39a640da1b55e
User & Date: dan 2009-05-20 08:58:52
Context
2009-05-20
11:36
Modifications to the introduction of the file format document. check-in: 0f0af25556 user: dan tags: trunk
08:58
Reduce the number of requirements in fileformat.in governing updating the database file. It is not possible to have too much detail without also defining the expectations SQLite has of a file-system, which is not in scope. check-in: fcd47d8a18 user: dan tags: trunk
2009-05-18
23:21
Last minute tweaks to the 3.6.14.1 release documentation. check-in: f8fba45763 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fileformat.in.

1
2
3
4
5

6
7
8
9
10
11
12
...
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
....
1404
1405
1406
1407
1408
1409
1410



1411
1412
1413
1414
1415
1416
1417
....
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
....
1541
1542
1543
1544
1545
1546
1547




1548
1549
1550
1551
1552
1553
1554
....
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
....
2417
2418
2419
2420
2421
2422
2423





































2424
2425
2426
2427
2428
2429
2430


















2431
2432
2433




2434
2435
2436



2437
2438
2439
2440
2441
2442
2443
2444
....
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465




2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477


2478
2479
2480
2481
2482





2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
....
2557
2558
2559
2560
2561
2562
2563
2564












2565











2566
2567
2568
2569

2570
2571
























2572
2573
2574
2575
2576
2577
2578
....
2591
2592
2593
2594
2595
2596
2597
2598
2599
2600
2601
2602
2603
2604
2605
2606
2607
....
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
....
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730
2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
2767
2768
2769
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
  <link type="text/css" rel="stylesheet" href="images/fileformat/rtdocs.css">

</head>
<body>

<div id=document_title>SQLite Database File Format</div>
<div id=toc_header>Table Of Contents</div>

<tcl>
................................................................................

  <p>
    It is intended that this document shall provide all the information required to
    create an system that reads and writes SQLite databases in a way that is
    completely compatible with SQLite itself. Or, put another way, this document
    defines the protocols that all SQLite database users (including SQLite) are
    required to follow. The availability of this information makes an SQLite
    database an even safer choice for long-term data storage. If, at some point
    in the future, the SQLite software library cannot be used to access an
    SQLite database that contains useful data, a procedure or software module
    may be developed based on the content of this document to extract the
    required data.

  <p>
    As well as file format descriptions, this document also describes the way in
    which SQLite compatible clients are required to lock database files when
................................................................................
            the page) of the next block in the list stored as a big-endian
            unsigned integer. The first two bytes of the final block in the 
            list are set to zero. The third and fourth bytes of each free
            block contain the total size of the free block in bytes, stored
            as a 2 byte big-endian unsigned integer.
      </ul>




          [fileformat_import_requirement2 H30810]
          [fileformat_import_requirement2 H30820]

      <p>
        The following requirements describe the <i>B-Tree page header</i>
        present at the start of both index and table B-Tree pages.

................................................................................
          consumed by the cell's record, stored as a variable length integer
          (see section <cite>varint_format</cite>). 
        <p> 
          If the record is small enough, it is stored verbatim in the cell.
          A record is deemed to be small enough to be completely stored in
          the cell if it consists of less than:
        <pre>
            <i>max-local</i> := <i>usable-size</i> * (<i>max-embedded-fraction</i> / 255 ) - 23
</pre>
        <p>
          bytes. In the formula above, <i>usable-size</i> is the page-size
          in bytes less the number of unused bytes left at the end of every
          page (as read from byte offset 20 of the file header), and
          <i>max-embedded-fraction</i> is the value read from byte offset 
          21 of the file header.
................................................................................
        a way to navigate the tree structure.

          [fileformat_import_requirement2 H31020]
          [fileformat_import_requirement2 H31030]
          [fileformat_import_requirement2 H31040]
          [fileformat_import_requirement2 H31050]





      <p>
        The precise way in which table B-Tree pages and cells are formatted is
        described in subsequent sections.

      [h4 "Table B-Tree Content" table_btree_content]
        <p>
          The database file contains one table B-Tree for each database table
................................................................................
      Sub-section <cite>reading_from_files</cite> contains a precise 
      description of the various ways a database image may be
      distributed between the database file and journal file, 
      and the rules that must be followed to extract it. In other words, a 
      description of how SQLite or compatible software reads the database 
      image from the file-system.

    <p>
      Sub-section <cite>writing_to_files</cite> describes the guidelines that
      must be followed by SQLite or compatible software to safely modify the
      contents of an SQLite database within the file-system. These techniques
      allow a complex set of modifications to be made to the database image
      atomically, eliminating the risk of database corruption due to 
      application, Operating System (OS) or power failure while updating the
      database.

[h2 "Journal File Formats" journal_file_formats]

  <p>
    The following sub-sections describe the formats used by SQLite journal
    files (section <cite>journal_file_format</cite>) and master journal files
    (section <cite>masterjournal_file_format</cite>).

................................................................................
    The following requirements dictate the way in which the data for each
    page of the database image can be located within the file-system
    by a database reader.

    [fileformat_import_requirement2 H32070]
    [fileformat_import_requirement2 H32080]






































[h2 "Writing to an SQLite Database" writing_to_files]

  <p>
    When an SQLite user commits a transaction that modifies the contents
    of the database, the database representation on disk must be modified
    to reflect the new contents of the database image. In doing so, SQLite
    is required to ensure that if an application, operating system or power 


















    failure occurs while updating the database file-system representation, 
    the database image stored within the file-system is found to be in a 
    valid state following recovery. If the transaction being committed to the 




    database file replaces database image A with database image B, then all
    database images equivalent to A or B are considered valid database image
    states.




  <p>
    Two database images are considered to be equivalent if each of the 
    following are true:

    <ul>
      <li> <p> The two database images have the same page-size.
      <li> <p> The two database images have the same number of pages.
................................................................................
    different blob stored on a free-list leaf page and still be considered
    equivalent. This concept can sometimes be exploited to more efficiently
    update an SQLite database file-system representation.

    [fileformat_import_requirement2 H32290]

  <p>
    The following two requirements constrain the way in which a database 
    file-system representation may be updated. In many ways, they are 
    equivalent to "do not corrupt the database file-system representation".





    [fileformat_import_requirement2 H32300]
    [fileformat_import_requirement2 H32310]

  <p>
    The following two sections, sections <cite>rollback_journal_method</cite>
    and <cite>rollforward_journal_method</cite> are somewhat advisory in nature.
    They contain descriptions of two different methods that could be used to
    modify a database image within a database file-system representation in
    accordance with the above requirements. These are not the only methods
    which could be used. So long as the above requirements (and, if applicable,
    those in section <cite>interoperability_requirements</cite>) are honoured,


    any method may be used by an SQLite database writer to update the database
    file-system representation. SQLite itself uses the "rollback-journal method"
    described in section <cite>rollback_journal_method</cite>. For this reason,
    section <cite>rollback_journal_method</cite> contains numbered requirements
    but section <cite>rollforward_journal_method</cite> does not.






  [h3 "The Rollback-Journal Method" rollback_journal_method]

    <p>
      This section describes the method used by SQLite to update a database 
      image within a database file-system representation. This is one way
      to modify a database image in accordance with the requirements in the
      parent section. When overwriting database image A with database image B 
      using this method, assuming that to begin with database image A is 
      entirely contained within the database file and that the page-size of
      database image B is the same as that of database image A, the following 
      steps are taken:

    <ol>
      <li> <p>The start of the journal file is populated with data that is not
           a valid journal header.
................................................................................
    [Figure filesystem2.gif figure_filesystem2 "Interim file-system state used to atomically overwrite database image ABCD with AEC"]

  <p>
    The procedure described above can be onourous to implement, as it requires
    that the data for all modified pages of database image B be available
    (presumably in main memory) at the same time, when step 4 is performed.
    For transactions that write to a large number of database pages, this 
    may be undesirable.












    












    <p class=todo>
      The above does not explain multiple header journals. Or multi-file 
      transactions.


    <p class=todo> FINISH THIS!

























  <p>
    The following requirements require that the journal header at the start of
    a journal file is set to contain the original database page-size and 
    page-count and written to non-volatile storage before the size of the
    database file is modified. And that once the size of the database file has
    been modified, the journal header does not become unstable and the page-size 
................................................................................
  <p>
    Journal before truncate:

       [fileformat_import_requirement2 H32360]
       [fileformat_import_requirement2 H32370]
      

  [h4 "Multiple Database Transactions" multi_db_transactions]

<!--

  <p>
    SQLite is required 
    to do make all modifications associated with the transaction such that 
    the database image is modified atomically. If an application, OS or 
    power failure occurs while SQLite is updating the database, upon recovery 
    the contents of the database must reflect either that all modifications 
................................................................................
    operation can be used to effect all required changes to the logical
    database contents.

  <p>
    The following two sub-sections describe the specific ways in which 
    SQLite achieves this for single and multiple database transactions.

  [h4 "Single Database Transactions" single_db_transactions]

  <p>
    In order to atomically modify the database image stored in the 
    file-system from database image A to database image B, the file-system must
    first be manipulated to a state where it contains the database image A,
    but can by a single atomic operation be modified to contain database 
    image B. A file-system state that has the following properties satisfies
................................................................................
    <li> Truncating the journal file to zero bytes in size, or
    <li> Overwriting some or all of the first 8 bytes in the journal file
         so that the journal file no longer contains a well-formed journal
         header.
  </ul>


  [h4 "Multiple Database Transactions" multi_db_transactions]

  <p class=todo>
    Deleting the master-journal is used as the atomic operation.
-->

  [h3 "The Atomic-Write Method" atomic_write_method]

    <p class=todo>
      Describe the special atomic-write option.

  [h3 "The Rollforward-Journal Method" rollforward_journal_method]

    <p class=todo>
      Describe the how the journal file can also be used as a roll 
      forward journal or "transaction log". This section does not 
      contain requirements. It is only here to make the point that
      alternatives to the rollback journal method exist.


[h1 "SQLite Interoperabilty Requirements" interoperability_requirements]

  <p>
    This section contains requirements that further constrains the behaviour
    of software that accesses (reads and/or writes) SQLite databases stored
    within the file-system. These requirements need only be implemented by
    systems that access databases while other clients may also be doing so.
    More specifically, they need only be implemented by software operating
    within a system where one or more of the database clients writes to the
    database. If the database file-system representation remains constant
    at all times, or if there is only ever a single database client for each
    database within the system, the requirements in this section can be 
    ignored.

  <p>
    The requirements in this section fall into two categories:

  <ul>
    <li> <p><b>Locking Requirements</b>. Section <cite>locking_protocol</cite>
         contains a description of the file-system locks that must be obtained
         on the database file, and how locks placed by other database clients 
         should be interpreted.

    <li> <p><b>Header Cookie Requirements</b>. An SQLite database image header 
         (see section <cite>file_header</cite>) contains two "cookie" values
         that must sometimes be incremented when the database image stored in
         the file-system is updated. Section 
         <cite>database_header_cookies_protocol</cite> contains requirements
         identifying exactly when the cookie values must be incremented, and
         how they can be used by a database client to determine if cached
         data is valid or not.
  </ul> 

  [h2 "SQLite Locking Protocol" locking_protocol]

    <p>
      An SQLite database client may hold at any time one of four different types 
      of locks on a database file-system representation. This document does not
      describe how these locks are to be implemented. Possible implementation
      techniques include mapping the four SQLite locks to operating system file





>







 







|
|







 







>
>
>







 







|







 







>
>
>
>







 







<
<
<
<
<
<
<
<
<







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|


<
|
<
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
<
<
>
>
>
>
|
<
<
>
>
>
|







 







|
|
|
>
>
>
>


<


|
|
|


|
<
>
>
|
<
<
|
<
>
>
>
>
>




|


|
|







 







|
>
>
>
>
>
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>

<
<
<
>

<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|
<
<







 







|







 







|





<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







1
2
3
4
5
6
7
8
9
10
11
12
13
...
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
....
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
....
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
....
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
....
1973
1974
1975
1976
1977
1978
1979









1980
1981
1982
1983
1984
1985
1986
....
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462

2463


2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
2479
2480
2481
2482


2483
2484
2485
2486
2487


2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
....
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525

2526
2527
2528
2529
2530
2531
2532
2533

2534
2535
2536


2537

2538
2539
2540
2541
2542
2543
2544
2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
....
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649



2650
2651

2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
....
2695
2696
2697
2698
2699
2700
2701
2702


2703
2704
2705
2706
2707
2708
2709
....
2767
2768
2769
2770
2771
2772
2773
2774
2775
2776
2777
2778
2779
2780
2781
....
2805
2806
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817















































2818
2819
2820
2821
2822
2823
2824
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
  <link type="text/css" rel="stylesheet" href="images/fileformat/rtdocs.css">
  <title>SQLite Database File Format</title>
</head>
<body>

<div id=document_title>SQLite Database File Format</div>
<div id=toc_header>Table Of Contents</div>

<tcl>
................................................................................

  <p>
    It is intended that this document shall provide all the information required to
    create an system that reads and writes SQLite databases in a way that is
    completely compatible with SQLite itself. Or, put another way, this document
    defines the protocols that all SQLite database users (including SQLite) are
    required to follow. The availability of this information makes an SQLite
    database an even safer choice for long-term data storage. If at some point
    in the future the SQLite software library cannot be used to access an
    SQLite database that contains useful data, a procedure or software module
    may be developed based on the content of this document to extract the
    required data.

  <p>
    As well as file format descriptions, this document also describes the way in
    which SQLite compatible clients are required to lock database files when
................................................................................
            the page) of the next block in the list stored as a big-endian
            unsigned integer. The first two bytes of the final block in the 
            list are set to zero. The third and fourth bytes of each free
            block contain the total size of the free block in bytes, stored
            as a 2 byte big-endian unsigned integer.
      </ul>

      <p class=todo>
        The list of free blocks is kept in order, sorted by offset. Right?

          [fileformat_import_requirement2 H30810]
          [fileformat_import_requirement2 H30820]

      <p>
        The following requirements describe the <i>B-Tree page header</i>
        present at the start of both index and table B-Tree pages.

................................................................................
          consumed by the cell's record, stored as a variable length integer
          (see section <cite>varint_format</cite>). 
        <p> 
          If the record is small enough, it is stored verbatim in the cell.
          A record is deemed to be small enough to be completely stored in
          the cell if it consists of less than:
        <pre>
            <i>max-local</i> := (<i>usable-size</i> - 12) * <i>max-embedded-fraction</i> / 255 - 23
</pre>
        <p>
          bytes. In the formula above, <i>usable-size</i> is the page-size
          in bytes less the number of unused bytes left at the end of every
          page (as read from byte offset 20 of the file header), and
          <i>max-embedded-fraction</i> is the value read from byte offset 
          21 of the file header.
................................................................................
        a way to navigate the tree structure.

          [fileformat_import_requirement2 H31020]
          [fileformat_import_requirement2 H31030]
          [fileformat_import_requirement2 H31040]
          [fileformat_import_requirement2 H31050]

      <p class=todo>
        The special case for root page 1. Root page 1 may contain zero cells,
        just a right-child pointer to the only other b-tree page in the tree.

      <p>
        The precise way in which table B-Tree pages and cells are formatted is
        described in subsequent sections.

      [h4 "Table B-Tree Content" table_btree_content]
        <p>
          The database file contains one table B-Tree for each database table
................................................................................
      Sub-section <cite>reading_from_files</cite> contains a precise 
      description of the various ways a database image may be
      distributed between the database file and journal file, 
      and the rules that must be followed to extract it. In other words, a 
      description of how SQLite or compatible software reads the database 
      image from the file-system.










[h2 "Journal File Formats" journal_file_formats]

  <p>
    The following sub-sections describe the formats used by SQLite journal
    files (section <cite>journal_file_format</cite>) and master journal files
    (section <cite>masterjournal_file_format</cite>).

................................................................................
    The following requirements dictate the way in which the data for each
    page of the database image can be located within the file-system
    by a database reader.

    [fileformat_import_requirement2 H32070]
    [fileformat_import_requirement2 H32080]

[h1 "SQLite Interoperabilty Requirements" interoperability_requirements]

  <p>
    This section contains requirements that further constrains the behaviour
    of software that accesses (reads and/or writes) SQLite databases stored
    within the file-system. These requirements need only be implemented by
    systems that access databases while other clients may also be doing so.
    More specifically, they need only be implemented by software operating
    within a system where one or more of the database clients writes to the
    database. If the database file-system representation remains constant
    at all times, or if there is only ever a single database client for each
    database within the system, the requirements in this section can be 
    ignored.

  <p>
    The requirements in this section fall into three categories:

  <ul>
    <li> <p><b>Database Writer Requirements</b>. Section <cite>writing_database</cite>
         contains notes on and requirements that must be observed by software 
	 systems that update an existing SQLite database image within the file-system.

    <li> <p><b>Locking Requirements</b>. Section <cite>locking_protocol</cite>
         contains a description of the file-system locks that must be obtained
         on the database file, and how locks placed by other database clients 
         should be interpreted.

    <li> <p><b>Header Cookie Requirements</b>. An SQLite database image header 
         (see section <cite>file_header</cite>) contains two "cookie" values
         that must sometimes be incremented when the database image stored in
         the file-system is updated. Section 
         <cite>database_header_cookies_protocol</cite> contains requirements
         identifying exactly when the cookie values must be incremented, and
         how they can be used by a database client to determine if cached
         data is valid or not.
  </ul> 

  [h2 "Writing to an SQLite Database File" writing_database]

  <p>

    When writing to an SQLite database, the database representation on disk


    must be modified to reflect the new, modified, database image. Exactly
    how this is done in terms of raw IO operations depends on the 
    characteristics of the file-system in which the database is stored and
    the degree to which the application is required to handle failures within
    the system. A failure may be an application crash, an operating system
    crash, a power failure or other unexpected event that terminates 
    processing. For example, SQLite itself runs in several different modes
    with various levels of guarantees on how failures are handled as follows:

  <ul>
    <li> <b>In-memory journal mode</b> (PRAGMA journal_mode=memory). In this
      mode any failure may cause database file-system corruption, including an
      application crash or unexpected exit.
    <li> <b>Non-synchronous mode</b> (PRAGMA sychronous=off). In this mode 
      an application crash or unexpected exit may not cause database 
      corruption, however an operating system crash or power failure may.
    <li> <b>Synchronous mode</b> (PRAGMA sychronous=full). In this mode 
      neither an application crash, operating system crash or power failure 
      may cause database file-system corruption.


  </ul>

  <p>
    If a process attempts to modify a database so as to replace database
    image A with database image B and a failure occurs while doing so, 


    then following recovery the file-system must contain a database image 
    equivalent to A or B. Otherwise, the database file-system is considered 
    corrupt.
    
  <p>
    Two database images are considered to be equivalent if each of the 
    following are true:

    <ul>
      <li> <p> The two database images have the same page-size.
      <li> <p> The two database images have the same number of pages.
................................................................................
    different blob stored on a free-list leaf page and still be considered
    equivalent. This concept can sometimes be exploited to more efficiently
    update an SQLite database file-system representation.

    [fileformat_import_requirement2 H32290]

  <p>
    The following requirement constrains the way in which a database 
    file-system representation may be updated. In many ways, it is
    equivalent to "do not corrupt the database file-system representation
    under those conditions where the file-system should not be corrupted".
    The definition of "handled failure" depends on the mode that SQLite
    is running in (or on the requirements of the external system accessing
    the database file-system representation).

    [fileformat_import_requirement2 H32300]


  <p>
    The following two sections, <cite>rollback_journal_method</cite>
    and <cite>atomic_write_method</cite>, are somewhat advisory in nature.
    They contain descriptions of two different methods used by SQLite to
    modify a database image within a database file-system representation in
    accordance with the above requirements. These are not the only methods
    that could be used. So long as the above requirements (and

    those in sections <cite>locking_protocol</cite> and 
    <cite>database_header_cookies_protocol</cite>) are honoured, any method may
    be used by an SQLite database writer to update the database file-system


    representation. Sections <cite>rollback_journal_method</cite> and 

    <cite>atomic_write_method</cite> do not contain formal requirements. Formal
    requirements governing the way in which SQLite safely updates database
    file-system representations may be found in <span class=todo>ref</span>.

  <p class=todo> Refer to webpage "How SQLite Implements Atomic Commit" too?

  [h3 "The Rollback-Journal Method" rollback_journal_method]

    <p>
      This section describes the method usually used by SQLite to update a database 
      image within a database file-system representation. This is one way
      to modify a database image in accordance with the requirements in the
      parent and other sections. When overwriting database image A with database 
      image B using this method, assuming that to begin with database image A is 
      entirely contained within the database file and that the page-size of
      database image B is the same as that of database image A, the following 
      steps are taken:

    <ol>
      <li> <p>The start of the journal file is populated with data that is not
           a valid journal header.
................................................................................
    [Figure filesystem2.gif figure_filesystem2 "Interim file-system state used to atomically overwrite database image ABCD with AEC"]

  <p>
    The procedure described above can be onourous to implement, as it requires
    that the data for all modified pages of database image B be available
    (presumably in main memory) at the same time, when step 4 is performed.
    For transactions that write to a large number of database pages, this 
    may be undesirable. A solution is to create a journal-file containing
    two or more journal headers. If, while modifying a database image within
    main-memory, a client wishes to reduce the amount of data held in memory,
    it may perform steps 3 and 4 of the above procedure in order to write
    modified content out to the file-system. Once the modified pages have been
    written into the database file, the in-memory copies may be discarded.
    The writer process may then continue accumulating changes in memory. When
    it is ready to write these changes out to the file-system, either to free
    up main-memory or because all changes associated with the transaction have
    been prepared, it adds a second (or subsequent) journal header to the 
    journal file, followed by journal records containing the original data
    for pages about to be modified. It may then write the changes accumulated
    in-memory to the database file, as described in step 4 above.

  <p>
    This technique can also be modified to support atomic modification of 
    multiple databases. In this case the first 4 steps of the procedure outlined
    above are followed for each individual database. Following this a
    master-journal file is created somewhere within the file-system and a
    master-journal pointer added to each individual journal file. Since
    a journal-file that contains a master-journal pointer to a master-journal
    file that does not exist is considered invalid (requirement H32000),
    all journal-files may be simultaneously invalidated by deleting the
    master-journal file from the file-system. This delete operation takes the
    place of step 5 of the procedure as outlined above.




  [h3 "The Atomic-Write Method" atomic_write_method]


    <p>
      On some systems, SQLite is able to overwrite a single page of the
      database file as an atomic operation. If, while updating the page,
      a failure occurs, the system guarantees that following recovery, the 
      page will be found to have been correctly and completely updated or 
      not modified at all. When running in such an environment, if SQLite
      is required to update a database image so that only a single page
      is modified, it can do so simply by overwriting the page.

    <p>
      Assuming the database page being updated is not page 1, if requirement 
      H33040 requires that the database header change counter be updated, then 
      the database image modification is no longer confined to a single page.
      In this case it can be split in two: SQLite first atomically updates
      page 1 of the database file to increment the database header change 
      counter, then updates the page that it is actually required to update
      using a second atomic write operation. If a failure occurs some time
      between the two write operations, following recovery the database 
      image may be found to be in its original state except for the value
      of the database header change counter <span class=todo>It would be good
      to have some requirement to say that that is Ok. Some modification to
      the definition of equivalent databases perhaps.</span>

<!--

  <p>
    The following requirements require that the journal header at the start of
    a journal file is set to contain the original database page-size and 
    page-count and written to non-volatile storage before the size of the
    database file is modified. And that once the size of the database file has
    been modified, the journal header does not become unstable and the page-size 
................................................................................
  <p>
    Journal before truncate:

       [fileformat_import_requirement2 H32360]
       [fileformat_import_requirement2 H32370]
      

  h4 "Multiple Database Transactions" multi_db_transactions



  <p>
    SQLite is required 
    to do make all modifications associated with the transaction such that 
    the database image is modified atomically. If an application, OS or 
    power failure occurs while SQLite is updating the database, upon recovery 
    the contents of the database must reflect either that all modifications 
................................................................................
    operation can be used to effect all required changes to the logical
    database contents.

  <p>
    The following two sub-sections describe the specific ways in which 
    SQLite achieves this for single and multiple database transactions.

  h4 "Single Database Transactions" single_db_transactions

  <p>
    In order to atomically modify the database image stored in the 
    file-system from database image A to database image B, the file-system must
    first be manipulated to a state where it contains the database image A,
    but can by a single atomic operation be modified to contain database 
    image B. A file-system state that has the following properties satisfies
................................................................................
    <li> Truncating the journal file to zero bytes in size, or
    <li> Overwriting some or all of the first 8 bytes in the journal file
         so that the journal file no longer contains a well-formed journal
         header.
  </ul>


  h4 "Multiple Database Transactions" multi_db_transactions

  <p class=todo>
    Deleting the master-journal is used as the atomic operation.
-->
















































  [h2 "SQLite Locking Protocol" locking_protocol]

    <p>
      An SQLite database client may hold at any time one of four different types 
      of locks on a database file-system representation. This document does not
      describe how these locks are to be implemented. Possible implementation
      techniques include mapping the four SQLite locks to operating system file

Changes to req/hlr30000.txt.

850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
....
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
....
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
corresponding pointer-map entry is set to the value 0x05 and the parent
page number field is set to the page number of the parent node in the
B-Tree structure.



HLR H32000
If a <i>journal file</i> contains a well-formed <i>master-journal 
pointer</i>, and the named <i>master-journal file</i> either does
not exist or does not contain the name of the <i>journal file</i>,
then the <i>journal file</i> shall be considered invalid.

HLR H32010
If the first 28 bytes of a <i>journal file</i> do not contain a well-formed
<i>journal header</i>, then the <i>journal file</i> shall be considered
invalid.

HLR H32020
................................................................................
HLR H32290
Two database images shall be considered to be equivalent if they (a) have the
same page size, (b) contain the same number of pages and (c) the content of
each page of the first database image that is not a free-list leaf page is
the same as the content of the corresponding page in the second database image.

HLR H32300
When writing to an SQLite database file-system representation in order to 
replace database image A with database image B, the file-system representation
shall at all times contain a database image equivalent to either A or B.

HLR H32310
If, while writing to an SQLite database file-system representation in 
order to replace database image A with database image B, an operating
system or power failure should occur, then following recovery the database
file-system representation shall contain a database image equivalent to
either A or B.



HLR H32320
When using the rollback-journal method to modify the file-system representation
................................................................................
If another database client holds either a RESERVED or PENDING lock on the
database file-system representation, then any journal file that exists within
the file system shall be considered invalid.


HLR H33040
A database writer shall increment the value of the database header change
counter field (H30100) either as part of the first database image modification 
that it performs after obtaining an EXCLUSIVE lock.

HLR H33050
A database writer shall increment the value of the database schema version 
field (H30110) as part of the first database image modification that includes
a schema change that it performs after obtaining an EXCLUSIVE lock.








|
|
<
|







 







<
<
<
<
<

|
|







 







|







850
851
852
853
854
855
856
857
858

859
860
861
862
863
864
865
866
....
1023
1024
1025
1026
1027
1028
1029





1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
....
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
corresponding pointer-map entry is set to the value 0x05 and the parent
page number field is set to the page number of the parent node in the
B-Tree structure.



HLR H32000
If a journal file contains a well-formed master-journal pointer and the 
named master-journal file does not exist then the journal file shall be 

considered invalid.

HLR H32010
If the first 28 bytes of a <i>journal file</i> do not contain a well-formed
<i>journal header</i>, then the <i>journal file</i> shall be considered
invalid.

HLR H32020
................................................................................
HLR H32290
Two database images shall be considered to be equivalent if they (a) have the
same page size, (b) contain the same number of pages and (c) the content of
each page of the first database image that is not a free-list leaf page is
the same as the content of the corresponding page in the second database image.

HLR H32300





If, while writing to an SQLite database file-system representation in 
order to replace database image A with database image B, a failure that
should be handled gracefully occurs, then following recovery the database
file-system representation shall contain a database image equivalent to
either A or B.



HLR H32320
When using the rollback-journal method to modify the file-system representation
................................................................................
If another database client holds either a RESERVED or PENDING lock on the
database file-system representation, then any journal file that exists within
the file system shall be considered invalid.


HLR H33040
A database writer shall increment the value of the database header change
counter field (H30100) as part of the first database image modification 
that it performs after obtaining an EXCLUSIVE lock.

HLR H33050
A database writer shall increment the value of the database schema version 
field (H30110) as part of the first database image modification that includes
a schema change that it performs after obtaining an EXCLUSIVE lock.