Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Begin adding the "WAL-mode File Format" document. Much work left to be done. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
3f66c70302d1c85a139518a6c5b0a8da |
User & Date: | drh 2017-11-04 17:08:42.880 |
Context
2017-11-04
| ||
20:26 | Continuing work on the WAL file format document. (check-in: 0d151f472f user: drh tags: trunk) | |
17:08 | Begin adding the "WAL-mode File Format" document. Much work left to be done. (check-in: 3f66c70302 user: drh tags: trunk) | |
15:02 | Merge fixes from the 3.21.0 release branch. (check-in: 411a4d6dde user: drh tags: trunk) | |
Changes
Changes to pages/fileformat2.in.
︙ | ︙ | |||
167 168 169 170 171 172 173 174 175 176 177 178 179 180 | at offset 16 is 0x00 0x01. This value can be interpreted as a big-endian 1 and thought of is as a magic number to represent the 65536 page size. Or one can view the two-byte field as a little endian number and say that it represents the page size divided by 256. These two interpretations of the page-size field are equivalent.</p> <h3>File format version numbers</h3> <p>The file format write version and file format read version at offsets 18 and 19 are intended to allow for enhancements of the file format in future versions of SQLite. In current versions of SQLite, both of these values are 1 for rollback journalling modes and 2 for [WAL] journalling mode. If a version of SQLite coded to the current | > | 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | at offset 16 is 0x00 0x01. This value can be interpreted as a big-endian 1 and thought of is as a magic number to represent the 65536 page size. Or one can view the two-byte field as a little endian number and say that it represents the page size divided by 256. These two interpretations of the page-size field are equivalent.</p> <tcl>hd_fragment vnums {file format version numbers}</tcl> <h3>File format version numbers</h3> <p>The file format write version and file format read version at offsets 18 and 19 are intended to allow for enhancements of the file format in future versions of SQLite. In current versions of SQLite, both of these values are 1 for rollback journalling modes and 2 for [WAL] journalling mode. If a version of SQLite coded to the current |
︙ | ︙ |
Added pages/walformat.in.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 | <title>WAL-mode File Format</title> <tcl>hd_keywords {wal-mode file format}</tcl> <table_of_contents> <p>This document describes low-level details on how [WAL mode] is implemented on unix and windows. <p>The separate [file format] description provides details on the structure of a database file and of the write-head log file for [WAL mode]. But details of the locking protocol and of the format of the [WAL-index format] are deliberately omitted since those details are left to descretion of individual [VFS] implementations. This document fills in the missing details for the unix and windows [VFSes]. <p>For completeness, some of the higher level formatting information contains in the [file format] document and elsewhere is replicated here, when it pertains to WAL mode processing. <h1>Files On Disk</h1> <p>When in active use, the state of a WAL mode database is described by three separate files: <ol> <li> The main database file with an arbitrary name "X". <li> The write-ahead log file, usually named "X-wal". <li> The wal-index file, usually named "X-shm". </ol> <p>The format of the main database file is as described in the [file format] document. The [file format version numbers] at offsets 18 and 19 into the main database must both be 2 to indicate that the database is in WAL mode. The main database may have an arbitrary name allowed by the underlying filesystem. No special file suffixes are required, though ".db", ".sqlite", and ".sqlite3" seem to be popular choices. <p>The write-ahead log or "wal" file is a roll-forward style log file that records transactions that have been committed but not yet applied to the main database. Details on the format of the wal file are describe in the [WAL format] subsection of the main [file format] document. The wal file is named by appending the four characters "-wal" to the end of the name of the main database file. Except on 8+3 filesystems, such names are not allowed, and in that case the file suffix is changed to ".WAL". But as 8+3 filesystems are increasingly rare, that exceptional case can usually be ignored. <p>The wal-index file or "shm" file is not actually used as a file. Rather, individual database clients mmap the shm file and use it as shared memory for coordinating access to the database and as a cache for quickly locating frame within the wal file. The name of the shm file is the main database file name with the four characters "-shm" appended. Or, for 8+3 filesystems, the shm file is the main database file with the suffix changed to ".SHM". <p>The shm does not contain any database content and is not required to recover the database following a crash. For that reason, the first client to connect to a quiescent database will normally truncate the shm file if it exists. Since the content of the shm file does not need to be preserved across a crash, the shm file is never fsync()-ed to disk. In fact, if there were a mechanism by which SQLite could tell the operating system to never persist the shm file to disk but always hold it in cache memory, SQLite would use that mechanism to avoid any unnecessary disk I/O associated with the shm file. However, no such mechanism exists in standard posix. <p>When a WAL mode database is in active use, all three of the above files exist. <p>If the last client using the database shuts down cleanly by calling [sqlite3_close()], then a [checkpoint] is run automatically in order to transfer all information from the wal file over into the main database, then both the shm file and the wal file are unlinked. Thus, when the database is not in use by any client, it is usually the case that only the main database file exists on disk. However, if the last client did not call [sqlite3_close()] before it shut down, then the final cleanup operation does not occur and the shm and wal files may still exist on disk even when the database is not in use. <h2>Variations</h2> <p>When [PRAGMA locking_mode=EXCLUSIVE] is set, that means that only a single client is allowed to have the database open at one time. Since only a single client can use the database, the shm file is omitted. The single client uses a buffer in heap memory as a substitute for the memory-mapped shm file. <h1>The WAL-Index File Format</h1> <p> The WAL-index or "shm" file is used to coordinate access to the database by multiple clients, and as a cache to help clients quickly locate frames within the wal file. <p> Because the shm file is not involved in recovery, the shm file does not need to be machine byte-order independent. Hence, numeric values in the shm file are written in the native byte order of the host computer, rather than being converted into a specific cross-platform byte order as is done with the main database file and the wal file. <p> The shm file consists of one or more hash tables, where each hash table is 32768 bytes in size. Except, a 136-byte header is carved out of the front of the very first hash table, so the first hash table is only 32632 bytes in size. The total size of the shm file is always a multiple of 32768. In most cases, the total size of the shm file is exactly 32768 bytes. The shm file only needs to grow beyond a single hash table if when the wal file grows very large (more than 4079 frames). Since the default [sqlite3_wal_autocheckpoint|automatic checkpoint threshold] is 1000, WAL files rare reach the 4079 threshold needed to make the shm file grow. <h2>The WAL-Index Header</h2> <p>The first 136 bytes of the shm file is a header. The shm header has three main divisions as follows: <center> <i>WAL-Index Header Divisions</i><br> <table width="80%" border=1> <tr><th>Bytes<th>Description <tr><td valign=top>0..47</td><td>First copy of the WAL Index Information</td> <tr><td valign=top>48..95</td><td>Second copy of the WAL Index Information</td> <tr><td valign=top>96..135</td><td>Checkpoint Information and Locks</td> </table> </center> <p>Individual fields of the shm header are all unsigned integers in the native byte-order of the host machine. The size of integers may be 8, 16, 32, or 64 bits. A detailed breakout of the individual fields of the shm header follows: <center> <i>WAL-Index Header Details</i><br> <table width="80%" border=1> <tr><th>Bytes</th><th>Meaning</th></tr> <tr> <td>0..3</td> <td>The WAL-index format version number. Always 3007000.</td> </tr> <tr> <td>4..7</td> <td>Unused padding space. Must be zero. </tr> <tr> <td>8..11</td> <td>Unsigned integer counter, incremented with each transaction </tr> <tr> <td>12</td> <td>The "isInit" flag. 1 when the shm file has been initialized. </tr> <tr> <td>13</td> <td>True if the WAL file uses big-ending checksums. 0 if the WAL uses little-endian checksums. </tr> <tr> <td>14..15</td> <td>The database page size in bytes, or 1 if the page size is 65536. </tr> <tr> <td>16..19</td> <td>Index of the last valid frame in the WAL file. </tr> <tr> <td>20..23</td> <td>Size of the database file in pages. </tr> <tr> <td>24..31</td> <td>Checksum of the last frame in the WAL file. </tr> <tr> <td>32..39</td> <td>The two salt value copied from the WAL file header. </tr> <tr> <td>40..47</td> <td>A checksum over bytes 0 through 39 of this header. </tr> <tr> <td>48..95</td> <td>A copy of bytes 0 through 47 of this header. </tr> <tr> <td>96..99</td> <td>Number of WAL frames that have already been backfilled into the database by prior checkpoints </tr> <tr> <td>100..119</td> <td>Five "read marks". Each read mark is a 32-bit unsigned integer (4 bytes). </tr> <tr> <td>120..127</td> <td>Unused space set aside for locking. </tr> <tr> <td>128..132</td> <td>Number of WAL frames that have attempted to be backfilled but which might not have been backfilled successfully. </tr> <tr> <td>132..136</td> <td>Unused space reserved for futher expansion. </tr> </table> </center> |