Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Continuing work on the WAL file format document. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
0d151f472f2cc37d569164e64bb476c5 |
User & Date: | drh 2017-11-04 20:26:33.130 |
Context
2017-11-08
| ||
15:43 | Updated VFS documentation. (check-in: 61fb0b3465 user: drh tags: trunk) | |
2017-11-04
| ||
20:26 | Continuing work on the WAL file format document. (check-in: 0d151f472f user: drh tags: trunk) | |
17:08 | Begin adding the "WAL-mode File Format" document. Much work left to be done. (check-in: 3f66c70302 user: drh tags: trunk) | |
Changes
Changes to pages/walformat.in.
︙ | ︙ | |||
208 209 210 211 212 213 214 | </tr> <tr> <td>132..136</td> <td>Unused space reserved for futher expansion. </tr> </table> </center> | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 | </tr> <tr> <td>132..136</td> <td>Unused space reserved for futher expansion. </tr> </table> </center> <p><i>TBD: More information about the header</i> <h2>WAL-Index Hash Tables</h2> <p>The hash tables in the shm file are designed to answer the following question quickly: <blockquote><i> FindFrame(P,M): Given a page number P and a maximum WAL frame index M, return the largest WAL frame index for page P that does not exceed M, or return NULL if there are no frames for page P that do not exceed M. </i></blockquote> <p> Let the datatypes "u8", "u16", and "u32" mean unsigned integers of length 8, 16, and 32 bits, respectively. Then, the first 32768-byte unit of the shm file is organized as follows: <blockquote><pre> u8 aWalIndexHeader[136]; u32 aPgno[4062]; u16 aHash[8192]; </pre></blockquote> <p>The second and all subsequent 32768-byte units of the shm file are like this: <blockquote><pre> u32 aPgno[4096]; u16 aHash[8192]; </pre></blockquote> <p>Collectively, the aPgno entries record the database page number stored in all frames of the WAL file. The aPgno[0] entry on the first hash table records the database page number stored in the very first frame in the WAL file. The aPgno[i] entry from the first hash table is the database page number for the i-th frame in the WAL file. The aPgno[k] entry for the second hash table is the database page number for the (k+4062)-th frame in the WAL file. The aPgno[k] entry for the n-th 32768-byte hash table in the shm file (for n>1) holds the database page number stored in the (k+4062+4096*(n-2))-th frame of the WAL file. <p>Here is a slightly different way to describe the aPgno values: If you think of all aPgno values as a contiguous array, then the database page number stored in the i-th frame of the WAL file is stored in aPgno[i]. Of course, aPgno is not a contiguous array. The first 4062 entries are on the first 32768-byte unit of the shm file and subsequent values are in 4096 entry chunks in later units of the shm file. <p>One way to compute FindFrame(P,M) would be to scan the aPgno array starting with the M-th entry and working backwards towards the beginning and return J where aPgno[J]==P. Such an algorithm would work, and it would be faster than searching the whole WAL file for the latest frame with page number P. But the search can be made much faster still by using the aHash structure. <p>A database page number P is mapped into a hash value using the following hash function: <blockquote> h = (P * 383)%8192 </blockquote> <p>This function maps every page number into an integer between 0 and 8191 inclusive. The aHash field of each 32768-byte shm file unit maps P values into indexes of the aPgno field of the same unit as follows: <ol> <li> Compute the hash value: h = P * 383 <li> Let X be the largest set of consecutive integers {h, h+1, h+2, ..., h+N} such that for every j in X, aPgno[j%8192]!=0. The X set will be empty if aPgno[h%8192]==0. The X set is easily computed by starting with the value h%8192, and adding h%8192 to X and incrementing h until encountering the first aPgno[h%8192] entry that is zero. <li> The set X contains the index in aPgno of every entry in the current 32768-byte unit of the shm file that might possible be a solution to the FindFrame(P,M) function. Each of these entries must be checked separately to ensure that the aPgno value is P and that the frame number does not exceed M. The largest frame number that passes those two tests is the answer. </ol> <p>Each entry in the aPgno array has a single corresponding entry in the aHash array. There are more available slots in aHash than there are in aPgno. The unused slots in aHash are filled with zero. And since there are guaranteed to be unused slots in aHash, that means the loop that computes X is guaranteed to terminate. The expected size of X is less than 2. The worst case is that X will be the same as the number of entries in aPgno, in which case the algorithm runs at about the same speed as a linear scan of aPgno. But that worst case performance is exceedingly rare. Usually, the size of X will be small and the use of the aHash array allows one to compute FindFrame(P,M) much faster. <p>Note that each 32768-byte unit of the shm file has its own aHash and aPgno arrays. The aHash array for a single unit is only helpful in finding aPgno entries in that same unit. The overall FindFrame(P,M) function needs to do hash lookups beginning with the latest unit and working backwards to the oldest unit until it finds an answer. |
Changes to wrap.tcl.
︙ | ︙ | |||
90 91 92 93 94 95 96 | regsub -all {<yyterm>} $text {<span class='yyterm'>} text regsub -all {</yyterm>} $text {</span>} text regsub -all {\[(.*?)\]} $text \ "\175; hd_resolve_one \173\\1\175; hd_puts \173" text eval "hd_puts \173$text\175" } proc hd_resolve_one {x} { | | | 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 | regsub -all {<yyterm>} $text {<span class='yyterm'>} text regsub -all {</yyterm>} $text {</span>} text regsub -all {\[(.*?)\]} $text \ "\175; hd_resolve_one \173\\1\175; hd_puts \173" text eval "hd_puts \173$text\175" } proc hd_resolve_one {x} { if {[string is integer $x] || [string length $x]==1} { hd_puts \[$x\] return } if {[string match {dateof:3.*} $x]} { set x [string range $x 7 end] if {[info exists ::dateofversion($x)]} { hd_puts $::dateofversion($x) |
︙ | ︙ |