Documentation Source Text

Check-in [3a3fd02722]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add some more to fileio.html.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 3a3fd027228fcf4538f8834be9cc3067cfaabc43
User & Date: dan 2008-09-23 18:06:48.000
Context
2008-09-24
17:49
Fix a bug in the CREATE TRIGGER documentation. (check-in: 2f8b2b9ead user: drh tags: trunk)
2008-09-23
18:06
Add some more to fileio.html. (check-in: 3a3fd02722 user: dan tags: trunk)
2008-09-22
12:42
Add the observation that the unary "+" operator destroys type affinity. (check-in: b46eae7d62 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to images/fileformat/rtdocs.css.
57
58
59
60
61
62
63

64


  .todo:before { content: 'TODO:' }
  p.todo       { border: solid #AA3333 1px; padding: 1ex }

  cite         { font-style: normal; font-weight: normal }
  cite a       { color: inherit; text-decoration: none }
  :link:hover,:visited:hover { background: wheat }


  img {display:block}









>
|
>
>
57
58
59
60
61
62
63
64
65
66
67
  .todo:before { content: 'TODO:' }
  p.todo       { border: solid #AA3333 1px; padding: 1ex }

  cite         { font-style: normal; font-weight: normal }
  cite a       { color: inherit; text-decoration: none }
  :link:hover,:visited:hover { background: wheat }

  img {
    display:block;
    page-break-inside: avoid;
  }
Changes to pages/fileio.in.
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
    Although it would be easy to design a system that uses the <i>VFS</i>
    interface to read and update the content of a database file stored
    within a file-system, there are several complicated issues that need
    to be addressed by such a system:

  <ol>
    <li><p>SQLite is required to <b>implement atomic and durable
	transactions</b> (the 'A' and 'D' from the ACID acronym), even if an
	application, operating system or power failure occurs midway through or
        shortly after updating a database file.

	<p>To implement atomic transactions in the face of potential 
	application, operating system or power failures, database writers write
	a copy of those portions of the database file that they are going to
	modify into a second file, the <i>journal file</i>, before writing
        to the database file. If a failure does occur while modifying the 
        database file, SQLite can reconstruct the original database 
        (before the modifications were attempted) based on the contents of 
        the <i>journal file</i>.

    <li><p>SQLite is required to <b>implement isolated transactions</b> (the 'I'
        from the ACID acronym). 

	<p>This is done by using the file locking facililities provided by the
	VFS adaptor to serialize writers (write transactions) and preventing
	readers (read transactions) from accessing database files while writers
        are midway through updating them.

    <li><p>For performance reasons, it is advantageous to <b>minimize the 
        quantity of data read and written</b> to and from the file-system.

        <p>As one might expect, the amount of data read from the database 
        file is minimized by caching portions of the database file in main 







|
|


|
|
|
|








|
|
|







84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
    Although it would be easy to design a system that uses the <i>VFS</i>
    interface to read and update the content of a database file stored
    within a file-system, there are several complicated issues that need
    to be addressed by such a system:

  <ol>
    <li><p>SQLite is required to <b>implement atomic and durable
        transactions</b> (the 'A' and 'D' from the ACID acronym), even if an
        application, operating system or power failure occurs midway through or
        shortly after updating a database file.

        <p>To implement atomic transactions in the face of potential 
        application, operating system or power failures, database writers write
        a copy of those portions of the database file that they are going to
        modify into a second file, the <i>journal file</i>, before writing
        to the database file. If a failure does occur while modifying the 
        database file, SQLite can reconstruct the original database 
        (before the modifications were attempted) based on the contents of 
        the <i>journal file</i>.

    <li><p>SQLite is required to <b>implement isolated transactions</b> (the 'I'
        from the ACID acronym). 

        <p>This is done by using the file locking facililities provided by the
        VFS adaptor to serialize writers (write transactions) and preventing
        readers (read transactions) from accessing database files while writers
        are midway through updating them.

    <li><p>For performance reasons, it is advantageous to <b>minimize the 
        quantity of data read and written</b> to and from the file-system.

        <p>As one might expect, the amount of data read from the database 
        file is minimized by caching portions of the database file in main 
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
      of the following four types of operation:

    <ul>
      <li> <b>Create file</b> operations. SQLite may create new files
           within the file-system by invoking the xOpen() method of
           the sqlite3_io_methods object.
      <li> <b>Delete file</b> operations. SQLite may remove files from the
	   file system by calling the xDelete() method of the
           sqlite3_io_methods object.
      <li> <b>Truncate file</b> operations. SQLite may truncate existing 
           files by invoking the xTruncate() method of the sqlite3_file 
           object.
      <li> <b>Write file</b> operations. SQLite may modify the contents
           and increase the size of a file by files by invoking the xWrite() 
           method of the sqlite3_file object.







|







355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
      of the following four types of operation:

    <ul>
      <li> <b>Create file</b> operations. SQLite may create new files
           within the file-system by invoking the xOpen() method of
           the sqlite3_io_methods object.
      <li> <b>Delete file</b> operations. SQLite may remove files from the
           file system by calling the xDelete() method of the
           sqlite3_io_methods object.
      <li> <b>Truncate file</b> operations. SQLite may truncate existing 
           files by invoking the xTruncate() method of the sqlite3_file 
           object.
      <li> <b>Write file</b> operations. SQLite may modify the contents
           and increase the size of a file by files by invoking the xWrite() 
           method of the sqlite3_file object.
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659

    <p>
      Opening a new database connection is a two-step process:

    <ol>
      <li> A file-handle is opened on the database file.
      <li> If step 1 was successful, an attempt is made to read the 
	   <i>database file header</i> from the database file using the 
           new file-handle.
    </ol>

    <p>
      In step 2 of the procedure above, the database file is not locked
      before it is read from. This is the only exception to the locking 
      rules described in section <cite>reading_data</cite>.







|







645
646
647
648
649
650
651
652
653
654
655
656
657
658
659

    <p>
      Opening a new database connection is a two-step process:

    <ol>
      <li> A file-handle is opened on the database file.
      <li> If step 1 was successful, an attempt is made to read the 
           <i>database file header</i> from the database file using the 
           new file-handle.
    </ol>

    <p>
      In step 2 of the procedure above, the database file is not locked
      before it is read from. This is the only exception to the locking 
      rules described in section <cite>reading_data</cite>.
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
    <i>read/write transaction</i>) on the database before data may be 
    read from the database file. 

  <p>
    The two exceptions are:
  <ul>
    <li> When an attempt is made to read the 100 byte <i>database file
	 header</i> immediately after opening the <i>database connection</i>
	 (see section <cite>open_new_connection</cite>). When this occurs
         no lock is held on the database file.
    <li> Data read while in the process of opening a read-only transaction
         (see section <cite>open_read_only_trans</cite>). These read 
	 operations occur after a <i>shared lock</i> is held on the database
	 file.
  </ul>

  <p>
    Once a transaction has been opened, reading data from a database 
    connection is a simple operation. Using the xRead() method of the 
    file-handle open on the database file, the required database file 
    pages are read one at a time. SQLite never reads partial pages and







|
|



|
|







726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
    <i>read/write transaction</i>) on the database before data may be 
    read from the database file. 

  <p>
    The two exceptions are:
  <ul>
    <li> When an attempt is made to read the 100 byte <i>database file
         header</i> immediately after opening the <i>database connection</i>
         (see section <cite>open_new_connection</cite>). When this occurs
         no lock is held on the database file.
    <li> Data read while in the process of opening a read-only transaction
         (see section <cite>open_read_only_trans</cite>). These read 
         operations occur after a <i>shared lock</i> is held on the database
         file.
  </ul>

  <p>
    Once a transaction has been opened, reading data from a database 
    connection is a simple operation. Using the xRead() method of the 
    file-handle open on the database file, the required database file 
    pages are read one at a time. SQLite never reads partial pages and
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
    <ol>
      <li>A <i>shared-lock</i> is obtained on the database file.
      <li>The connection checks if a <i>hot journal file</i> exists in the
          file-system. If one does, then it is rolled back before continuing.
      <li>The connection checks if the data in the <i>page cache</i> may 
          still be trusted. If not, all page cache data is discarded.
      <li>If the file-size is not zero bytes and the page cache does not
	  contain valid data for the first page of the database, then the
          data for the first page must be read from the database.
    </ol>

    <p>
      Of course, an error may occur while attempting any of the 4 steps
      enumerated above. If this happens, then the <i>shared-lock</i> is 
      released (if it was obtained) and an error returned to the user. 







|







814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
    <ol>
      <li>A <i>shared-lock</i> is obtained on the database file.
      <li>The connection checks if a <i>hot journal file</i> exists in the
          file-system. If one does, then it is rolled back before continuing.
      <li>The connection checks if the data in the <i>page cache</i> may 
          still be trusted. If not, all page cache data is discarded.
      <li>If the file-size is not zero bytes and the page cache does not
          contain valid data for the first page of the database, then the
          data for the first page must be read from the database.
    </ol>

    <p>
      Of course, an error may occur while attempting any of the 4 steps
      enumerated above. If this happens, then the <i>shared-lock</i> is 
      released (if it was obtained) and an error returned to the user. 
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959


960
961
962
963
964
965
966
      complex. The following steps take place:

      <ol class=spacedlist>
        <li>Using the VFS xAccess() method, SQLite queries the file-system 
            to see if the journal file associated with the database exists. 
            If it does not, then there is no hot-journal file.

	<li>By invoking the xCheckReservedLock() method of the file-handle
	    opened on the database file, SQLite checks if some other connection
	    holds a <i>reserved lock</i> or greater. If some other connection
	    does hold a <i>reserved lock</i>, this indicates that the other
	    connection is midway through a <i>read/write transaction</i> (see
	    section <cite>writing_data</cite>). In this case the 
            <i>journal file</i> is not a <i>hot-journal</i> and must not be 
            rolled back.

        <li>Using the xFileSize() method of the file-handle opened
            on the database file, SQLite checks if the database file is 
            0 bytes in size. If it is, the journal file is not considered
            to be a <i>hot journal</i> file. Instead of rolling back the
            journal file, in this case it is deleted from the file-system
            by calling the VFS xDelete() method. <span class=todo>Technically,
            there is a race condition here. This step should be moved to
            after the exclusive lock is held.</span>

        <li>An attempt is made to upgrade to an <i>exclusive lock</i> on the
            database file. If the attempt fails, then all locks, including 
            the recently obtained <i>shared lock</i> are dropped. The attempt
            to open a <i>read-only transaction</i> has failed. This occurs
	    when some other connection is also attempting to open a 
	    <i>read-only transaction</i> and the attempt to gain the
	    <i>exclusive lock</i> fails because the other connection is also
	    holding a <i>shared lock</i>. It is left to the other connection 
            to roll back the <i>hot journal</i>.
            <div style="margin-top:0.5em"></div>
            It is important that the file-handle lock is upgraded 
	    directly from <i>shared</i> to <i>exclusive</i> in this case,
            instead of first upgrading to <i>reserved</i> or </i>pending</i>
            locks as is required when obtaining an <i>exclusive lock</i> to
            write to the database file (section <cite>writing_data</cite>).
	    If SQLite were to first upgrade to a <i>reserved</i> or
	    <i>pending</i> lock in this scenario, then a second process also
            trying to open a <i>read-transaction</i> on the database file might
	    detect the <i>reserved</i> lock in step 2 of this process, 
            conclude that there was no <i>hot journal</i>, and commence
            reading data from the <i>database file</i>.

        <li>The xAccess() method is invoked again to detect if the journal 
            file is still in the file system. If it is, then it is a 
            hot-journal file and SQLite tries to roll it back (see section
            <cite>rollback</cite>).
      </ol>



    <p>
      The following requirements describe step 1 of the above procedure in
      more detail.

    REQ H21014
      When required to attempt to detect a <i>hot-journal file</i>, SQLite







|
|
|
|
|
|
















|
|
|
|



|



|
|

|








>
>







908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
      complex. The following steps take place:

      <ol class=spacedlist>
        <li>Using the VFS xAccess() method, SQLite queries the file-system 
            to see if the journal file associated with the database exists. 
            If it does not, then there is no hot-journal file.

        <li>By invoking the xCheckReservedLock() method of the file-handle
            opened on the database file, SQLite checks if some other connection
            holds a <i>reserved lock</i> or greater. If some other connection
            does hold a <i>reserved lock</i>, this indicates that the other
            connection is midway through a <i>read/write transaction</i> (see
            section <cite>writing_data</cite>). In this case the 
            <i>journal file</i> is not a <i>hot-journal</i> and must not be 
            rolled back.

        <li>Using the xFileSize() method of the file-handle opened
            on the database file, SQLite checks if the database file is 
            0 bytes in size. If it is, the journal file is not considered
            to be a <i>hot journal</i> file. Instead of rolling back the
            journal file, in this case it is deleted from the file-system
            by calling the VFS xDelete() method. <span class=todo>Technically,
            there is a race condition here. This step should be moved to
            after the exclusive lock is held.</span>

        <li>An attempt is made to upgrade to an <i>exclusive lock</i> on the
            database file. If the attempt fails, then all locks, including 
            the recently obtained <i>shared lock</i> are dropped. The attempt
            to open a <i>read-only transaction</i> has failed. This occurs
            when some other connection is also attempting to open a 
            <i>read-only transaction</i> and the attempt to gain the
            <i>exclusive lock</i> fails because the other connection is also
            holding a <i>shared lock</i>. It is left to the other connection 
            to roll back the <i>hot journal</i>.
            <div style="margin-top:0.5em"></div>
            It is important that the file-handle lock is upgraded 
            directly from <i>shared</i> to <i>exclusive</i> in this case,
            instead of first upgrading to <i>reserved</i> or </i>pending</i>
            locks as is required when obtaining an <i>exclusive lock</i> to
            write to the database file (section <cite>writing_data</cite>).
            If SQLite were to first upgrade to a <i>reserved</i> or
            <i>pending</i> lock in this scenario, then a second process also
            trying to open a <i>read-transaction</i> on the database file might
            detect the <i>reserved</i> lock in step 2 of this process, 
            conclude that there was no <i>hot journal</i>, and commence
            reading data from the <i>database file</i>.

        <li>The xAccess() method is invoked again to detect if the journal 
            file is still in the file system. If it is, then it is a 
            hot-journal file and SQLite tries to roll it back (see section
            <cite>rollback</cite>).
      </ol>

    <p class=todo> Master journal file pointers?

    <p>
      The following requirements describe step 1 of the above procedure in
      more detail.

    REQ H21014
      When required to attempt to detect a <i>hot-journal file</i>, SQLite
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
      a <i>database connection</i> unlocks the database file, it stores the
      current value of the <i>file change counter</i>. Later, while opening a
      new <i>read-only transaction</i>, SQLite checks the value of the <i>file
      change counter</i> stored in the database file. If the value has not
      changed since the database file was unlocked, then the <i>page cache</i>
      entries can be trusted. If the value has changed, then the <i>page
      cache</i> entries cannot be trusted and all entries associated with
      the current <i>database connection</i> is discarded.
   
    REQ H21018
      When a file-handle open on a database file is unlocked, if the
      <i>page cache</i> contains one or more entries belonging to the
      associated <i>database connection</i>, SQLite shall store the value 
      of the <i>file change counter</i> internally.








|







1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
      a <i>database connection</i> unlocks the database file, it stores the
      current value of the <i>file change counter</i>. Later, while opening a
      new <i>read-only transaction</i>, SQLite checks the value of the <i>file
      change counter</i> stored in the database file. If the value has not
      changed since the database file was unlocked, then the <i>page cache</i>
      entries can be trusted. If the value has changed, then the <i>page
      cache</i> entries cannot be trusted and all entries associated with
      the current <i>database connection</i> are discarded.
   
    REQ H21018
      When a file-handle open on a database file is unlocked, if the
      <i>page cache</i> contains one or more entries belonging to the
      associated <i>database connection</i>, SQLite shall store the value 
      of the <i>file change counter</i> internally.

1215
1216
1217
1218
1219
1220
1221

1222

1223
1224


1225
1226




1227
1228

1229


1230
1231
1232
1233
1234
1235
1236



1237
1238
1239
1240



1241


1242
1243

1244

1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258


1259
1260
1261
1262

1263

1264
1265
1266
1267
1268
1269
1270
1271









1272
1273
1274
1275
1276
1277
1278
1279


1280
1281
1282
1283
1284
1285
1286
      calling the xUnlock() method of the file-handle.

    <p>
      See also requirements H21018 and H21021 above.

<h1 id=writing_data>Writing Data</h1>
  <p>

    Safely writing data to a database file is also a complex procedure. The

    database file must be updated in such a way that if a power failure,
    operating system crash or application fault occurs while SQLite is midway


    through writing to the database file the database contents are still
    accessible and correct after system recovery.





  <p>

    Logically, an SQLite database file is modified using 


    <i>write transactions</i>. Each <i>write transaction</i> may contain any
    number of modifications to the database files content or size. From the
    point of view of an external observer (a second 
    <i>database connection</i>) an entire <i>write transaction</i> is applied
    to the database file atomically. If a failure of some sort occurs while
    SQLite is midway through applying a <i>write transaction</i> to a database
    file, then it must appear from the point of view of the next <i>database



    connection</i> that reads data from the <i>database file</i> that the 
    aborted transaction was not applied.

  <p>



    SQLite accomplishes these goals using two techniques:



  <ul class=spacedlist>

    <li>While modifying the content or size of a <i>database file</i> to

        apply a <i>write transaction</i>, SQLite maintains an <i>exclusive
        lock</i> on the <i>database file</i>. Because reading from the 
        <i>database file</i> requires a <i>shared lock</i> (see section 
        <cite>reading_data</cite>), and because holding an <i>exclusive lock</i>
        guarantees that no other <i>database connection</i> is holding
        or can obtain a <i>shared lock</i>, this ensures that no other 
        connection may read data from the <i>database file</i> at a point
        when a <i>write transaction</i> has been partially applied. This
        alone ensures that the absence of an application or system failure
        <i>write transactions</i> appear to be atomically applied from the
        point of view of a second <i>database connection</i>.

    <li>In almost all cases, before the contents of a page of the <i>database
	file</i> may be modified or deleted by file truncation, the original


	contents of that page is stored in the <i>journal file</i>.
        Similarly, before the size of the <i>database file</i> may be modifed
        (either by extending or truncating the file), then the original size
        of the database file is stored in the <i>journal file</i>. If an

        application or system failure occurs while updating the <i>database

        file</i>, then the database file content may be restored based on the 
        contents of the <i>journal file</i> before data is next read from it.
        Restoring the contents of a <i>database file</i> using the contents
        of the <i>journal file</i> after an application or system failure
        occurs is known as <i>hot-journal rollback</i> and is described in
        section <cite>hot_journal_rollback</cite>. The methods used by
        SQLite for detecting that <i>hot-journal rollback</i> is required is 
        detailed in section <cite>hot_journal_detection</cite>.









  </ul>

  <p>
    The <i>page cache</i> belonging to the <i>database connection</i> is 
    used to buffer writes before they are written to the <i>database file</i>.
    Often, all changes for an entire <i>write transaction</i> are accumulated
    within the <i>page cache</i>. In this case no write operations are 
    performed on the database file until the user commits the transaction.



  <p>
    Even if an application or system failure does not occur while a
    <i>write transaction</i> is in progress, a rollback operation to restore
    the database file to the state that it was in before the transaction
    started may be required. This may occur if the user explicitly requests
    transaction rollback (i.e. by issuing a "ROLLBACK" command), or 







>
|
>
|
<
>
>
|
<
>
>
>
>


>
|
>
>
|
|
<
|
|
|
|
>
>
>
|
|

|
>
>
>
|
>
>
|
|
>
|
>
|
<
<
|
|
|
<
|
<
|
|
|
|
<
>
>
|
|
<
|
>
|
>
|
<
|
|
|
|
<
|
>
>
>
>
>
>
>
>
>



|
|



>
>







1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227

1228
1229
1230

1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242

1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265


1266
1267
1268

1269

1270
1271
1272
1273

1274
1275
1276
1277

1278
1279
1280
1281
1282

1283
1284
1285
1286

1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
      calling the xUnlock() method of the file-handle.

    <p>
      See also requirements H21018 and H21021 above.

<h1 id=writing_data>Writing Data</h1>
  <p>
    Using DDL or DML SQL statements, SQLite users may modify the contents and
    size of a database file. From the point of view of the sub-systems
    described in this document, each DDL or DML statement executed results in 
    the contents of zero or more of the pages that make up the database file

    (see <cite>ff_sqlitert_requirements</cite>) being overwritten with new
    data. A DDL or DML statement may also append or truncate one or more pages
    to or from the end of the database file. One or more DDL and/or DML 

    statements are grouped together to make up a single <i>write
    transaction</i>. A <i>write transaction</i> is required to have the 
    special properties described in section <cite>overview</cite>: a <i>write
    transaction</i> must be isolated, durable and atomic.

  <p>
    SQLite accomplishes these goals using the following techniques:

  <ul>
    <li><p>
        To ensure that <i>write transactions</i> are <b>isolated</b>, before
        beginning to modify the contents of the database file to reflect the

        results of a <i>write transaction</i>, SQLite obtains an <i>exclusive
        lock</i> on the <i>database file</i>. The lock is not relinquished
        until the <i>write transaction</i> is concluded. Because reading from
        the <i>database file</i> requires a <i>shared lock</i> (see section
        <cite>reading_data</cite>), and because holding an <i>exclusive
        lock</i> guarantees that no other <i>database connection</i> is holding
        or can obtain a <i>shared lock</i>, this ensures that no other
        connection may read data from the <i>database file</i> at a point when
        a <i>write transaction</i> has been partially applied. 

    <li><p>
        So that <i>write transactions</i> are <b>durable</b> in the face of
        a system failure, SQLite executes a <i>sync file</i> operation on the
        database file before concluding the <i>write transaction</i>

    <li><p>Ensuring that <i>write transactions</i> are <b>atomic</b> is the most
        complex task required of the system. In this case, <i>atomic</i> means
        that even if a system failure occurs, an attempt to commit a <i>write
        transaction</i> to the database file either results in all changes
        that are a part of the transaction being successfully applied to the
        database file, or none of the changes are successfully applied. There
        is no chance that a subset of the changes only are applied. Hence from
        the point of view of an external observer, the <i>write transaction</i>


        appears to be an atomic event. 
        <p>
        Of course, it is usually not possible to atomically apply all the

        changes required by a <i>write transaction</i> to a database file

        within the file-system. For example, if a <i>write transaction</i>
        requires ten pages of a database file to be modified, and a power
        outage causes a system failure after sqlite has modified only five
        pages, then the database file will almost certainly be in an

        inconsistent state following system recovery.
        <p>
        SQLite solves this problem by using a <i>journal file</i>. In almost
        all cases, before the <i>database file</i> is modified in any way, 

        SQLite stores sufficient information in the <i>journal file</i> to
        allow the original the database file to be reconstructed if a system
        failure occurs while the database file is being updated to reflect
        the modifications made by the <i>write transaction</i>. Each time
        SQLite opens a database file, it checks if such a system failure has

        occured and, if so, reconstructs the database file based on the contents
        of the journal file. The procedure used to detect whether or not this
        process, coined <i>hot journal rollback</i>, is required is described
        in section <cite>hot_journal_detection</cite>. <i>Hot journal rollback

        </i> itself is described in section <cite>hot_journal_rollback</cite>.
        <p>
        The same technique ensures that an SQLite database file cannot be
        corrupted by a system failure that occurs at an inopportune moment.
        If a system failure does occur before SQLite has had a chance to
        execute sufficient <i>sync file</i> operations to ensure that the
        changes that make up a <i>write transaction</i> have made it safely
        to persistent storage, then the <i>journal file</i> will be used
        to restore the database to a known good state following system
        recovery.
  </ul>

  <p>
    The <i>page cache</i> is used to buffer modifications to the database
    file image before they are written to the <i>database file</i>.
    Often, all changes for an entire <i>write transaction</i> are accumulated
    within the <i>page cache</i>. In this case no write operations are 
    performed on the database file until the user commits the transaction.

<p class=todo> this section is suspect from this point on.

  <p>
    Even if an application or system failure does not occur while a
    <i>write transaction</i> is in progress, a rollback operation to restore
    the database file to the state that it was in before the transaction
    started may be required. This may occur if the user explicitly requests
    transaction rollback (i.e. by issuing a "ROLLBACK" command), or 
1343
1344
1345
1346
1347
1348
1349

1350


1351












1352
1353
1354
1355
1356
1357
1358



1359
1360
1361
1362
1363









































1364
1365






1366
1367
1368
1369


















1370
1371
1372
1373
1374
1375
1376
    entire <i>write transaction</i>. This is intended to be illustrative only,
    many operations are omitted.

    <center><img src="images/fileformat/write_transaction.gif">
    <p><i>Figure <span class=fig id=figure_write_transaction></span> - Progression of a Write Transaction</i>
      </center>
 




  <h2>Journal File Format</h2>












    <h3 id=journal_header_format>Journal Header Format</h3>

    <p>
      A <i>journal header</i> is <i>sector-size</i> bytes in size, where <i>
      sector-size</i> is the value returned by the xSectorSize method of
      the file handle opened on the database file.
     




    <center><img src="images/fileformat/journal_header.gif">
    <p><i>Figure <span class=fig id=figure_journal_record></span> - Journal Header Format</i>
      </center>










































  <h3 id=journal_record_format>Journal Record Format</h3>







    <center><img src="images/fileformat/journal_record.gif">
    <p><i>Figure <span class=fig id=figure_journal_record></span> - Journal Record Format</i>
      </center>



















  <h3>Master Journal Pointer</h3>

  <h2>Write Transactions</h2>
  <h3>Beginning a Write Transaction</h3>
    <p>
      Before any database pages may be modified within the <i>page cache</i>,
      the <i>database connection</i> must open a <i>write transaction</i>. 







>

>
>
|
>
>
>
>
>
>
>
>
>
>
>
>





|
|
>
>
>
|

|


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


>
>
>
>
>
>




>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
    entire <i>write transaction</i>. This is intended to be illustrative only,
    many operations are omitted.

    <center><img src="images/fileformat/write_transaction.gif">
    <p><i>Figure <span class=fig id=figure_write_transaction></span> - Progression of a Write Transaction</i>
      </center>
 
  <h2>Journal File Format</h2>

    <p>
      This section describes the format used by an SQLite <i>journal file</i>.

    <p>
      A journal file consists of one or more <i>journal headers</i>, zero
      or more <i>journal records</i> and optionally a <i>master journal
      pointer</i>. Each journal file always begins with a
      <i>journal header</i>, followed by zero or more <i>journal records</i>.
      Following this may be a second <i>journal header</i> followed by a
      seconds set of zero or more <i>journal records</i> and so on. There
      is no limit to the number of <i>journal headers</i> a journal file
      may contain. Following the <i>journal headers</i> and their accompanying
      sets of <i>journal records</i> may be the optional <i>master journal
      pointer</i>. Or, the file may simply end.

    <h3 id=journal_header_format>Journal Header Format</h3>

    <p>
      A <i>journal header</i> is <i>sector-size</i> bytes in size, where <i>
      sector-size</i> is the value returned by the xSectorSize method of
      the file handle opened on the database file. Only the first 28 bytes
      of the <i>journal header</i> are used, the remainder may contain garbage
      data. The first 28 bytes of each <i>journal header</i> consists of an 
      eight byte block set to a well-known value, followed by five big-endian 
      32-bit unsigned integer fields.
     
    <center><img src="images/fileformat/journal_header.gif">
    <p><i>Figure <span class=fig id=figure_journal_header></span> - Journal Header Format</i>
      </center>

    <p>
      Figure <cite>figure_journal_header</cite> graphically depicts the layout
      of a <i>journal header</i>. The individual fields are described in
      the following table. The offsets in the 'byte offset' column of the
      table are relative to the start of the <i>journal header</i>.

    <table class=striped>
      <tr><th>Byte offset<th>Size in bytes<th width=100%>Description
      <tr><td>0<td>8<td>The <b>journal magic</b> field always contains a
                        well-known 8-byte string value used to identify SQLite
                        journal files. The well-known sequence of byte values
                        is:
                        <pre>0xd9 0xd5 0x05 0xf9 0x20 0xa1 0x63 0xd7</pre>
      <tr><td>8<td>4<td>This field, the <b>record count</b>, is set to the
                        number of <i>journal records</i> that follow this
			<i>journal header</i> in the <i>journal file</i>.
      <tr><td>12<td>4<td>This field, the <b>database page count</b>, is set
                         to the number of pages that the <i>database file</i>
			 contained before any modifications associated with
			 <i>write transaction</i> are applied.
      <tr><td>16<td>4<td>The <b>checksum initializer</b> field is set to a 
			 pseudo-random value. It is used as part of the
			 algorithm to calculate the checksum for all <i>journal
			 records</i> that follow this <i>journal header</i>.
      <tr><td>20<td>4<td>This field, the <b>sector size</b>, is set to the
                         <i>sector size</i> of the device on which the 
			 <i>journal file</i> was created, in bytes. This value
			 is required when reading the journal file to determine
			 the size of each <i>journal header</i>.
      <tr><td>24<td>4<td>The <b>page size</b> field contains the database page
			 size used by the corresponding <i>database file</i>
			 when the <i>journal file</i> was created, in bytes.
    </table>

    <p>
      All <i>journal headers</i> are positioned in the file so that they 
      start at a <i>sector size</i> aligned offset. To achieve this, unused
      space may be left between the start of the second and subsequent
      <i>journal headers</i> and the end of the <i>journal records</i>
      associated with the previous header.

  <h3 id=journal_record_format>Journal Record Format</h3>

    <p>
      Each <i>journal record</i> contains the original data for a database page
      modified by the <i>write transaction</i>. If a rollback is required, then
      this data may be used to restore the contents of the database page to the
      state it was in before the <i>write transaction</i> was started.

    <center><img src="images/fileformat/journal_record.gif">
    <p><i>Figure <span class=fig id=figure_journal_record></span> - Journal Record Format</i>
      </center>

    <p>
      A <i>journal record</i>, depicted graphically by figure
      <cite>figure_journal_record</cite>, contains three fields, as described
      in the following table. Byte offsets are relative to the start of the
      <i>journal record</i>.

    <table class=striped>
      <tr><th>Byte offset<th>Size in bytes<th width=100%>Description
      <tr><td>0<td>4<td>The page number of the database page associated with
			this <i>journal record</i>, stored as a 4 byte
			big-endian unsigned integer.
      <tr><td>4<td><i>page-size<td>
			This field contains the original data for the page,
			exactly as it appeared in the database file before the
			<i>write transaction</i> began.
      <tr><td style="white-space: nowrap">4 + <i>page-size</i><td>4<td>
    </table>

  <h3>Master Journal Pointer</h3>

  <h2>Write Transactions</h2>
  <h3>Beginning a Write Transaction</h3>
    <p>
      Before any database pages may be modified within the <i>page cache</i>,
      the <i>database connection</i> must open a <i>write transaction</i>.