Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Start enumerating the assumptions sqlite makes related to the state of the file system following a power failure or OS crash. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
ac2c5476d0b4c598aafd06383050394f |
User & Date: | dan 2008-07-28 15:01:48.000 |
Context
2008-07-29
| ||
16:42 | Add definitions for the "atomic-write" "safe-append" and "sequential-write" VFS properties to fileio.html. (check-in: 26da8bcd0b user: dan tags: trunk) | |
2008-07-28
| ||
15:01 | Start enumerating the assumptions sqlite makes related to the state of the file system following a power failure or OS crash. (check-in: ac2c5476d0 user: dan tags: trunk) | |
2008-07-23
| ||
15:38 | Updates to system requirements. (check-in: a1807b9496 user: drh tags: trunk) | |
Changes
Changes to pages/fileio.in.
1 2 3 4 | <tcl> proc process {text} { set zOut "" | > > > > > | > | > > > > > | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | <tcl> proc hd_assumption {id text} { hd_requirement $id $text } proc process {text} { set zOut "" set zSpecial "" foreach zLine [split $text "\n"] { switch -regexp $zLine { {^ *REQ *[^ ][^ ]* *$} { regexp { *REQ *([^ ]+) *} $zLine -> zRecid append zOut "<p class=req id=$zRecid>" set zSpecial hd_requirement set zRecText "" } {^ *ASSUMPTION *[^ ][^ ]* *$} { regexp { *ASSUMPTION *([^ ]+) *} $zLine -> zRecid append zOut "<p class=req id=$zRecid>" set zSpecial hd_assumption set zRecText "" } {^ *$} { if {$zSpecial ne ""} { $zSpecial $zRecid $zRecText set zSpecial "" append zOut </p> } } default { if {$zSpecial ne ""} { if {[regexp {^ *\. *$} $zLine]} {set zLine ""} append zRecText "$zLine\n" } append zOut "$zLine\n" } } |
︙ | ︙ | |||
96 97 98 99 100 101 102 | <li>xOpen <li>xFullPathname <li>xClose </ul> | | | | | | < | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 | <li>xOpen <li>xFullPathname <li>xClose </ul> <h1>VFS Adaptor Related Assumptions</h1> <h2>SQLite File-System Usage </h2> <p> SQLite uses the file-system service made available via the <i>VFS adaptor</i> to create, read and modify three different types of files, as follows: <ul> <li> database files, <li> journal files, and <li> master journal files. </ul> <p> A database file is a file-system file used to store an SQLite database image (see <cite>ff_sqlitert_requirements</cite>). <h2 id=fs_characteristics>System Failure Related Assumptions</h2> <p> In the event of an operating system or power failure, the various combinations of file-system and storage hardware available provide varying levels of guarantee as to the integrity of the data written to the file system just before or during the failure. The exact combination of IO operations that SQLite is required to perform in order to safely modify a database file depend on the exact characteristics of the target platform. <p> This section describes the assumptions that SQLite makes about the the content of a file-system following a power or system failure. In other words, it describes the extent of file and file-system corruption that such an event may cause. <p> SQLite queries an implementation for file-system characteristics using the xDeviceCharacteristics() and xSectorSize() methods of the database file file-handle. These two methods are only ever called on file-handles open on database files. They are not called for <i>journal files</i>, <i>master-journal files</i> or <i>temporary database files</i>. <p> The file-system <i>sector size</i> value determined by calling the xSectorSize() method is a power of 2 value between 512 and 32768, inclusive <span class=todo>reference to exactly how this is determined</span>. SQLite assumes that the underlying storage device stores data in blocks of <i>sector-size</i> bytes each, sectors. It is also assumed that each aligned block of <i>sector-size</i> bytes of each file is stored in a single device sector. If the file is not an exact multiple of <i>sector-size</i> bytes in size, then the final device sector is partially empty. <p> Normally, SQLite assumes that if a power failure occurs while updating any portion of a sector then the contents of the entire device sector is suspect following recovery. After writing to any part of a sector within a file, it is assumed that the modified sector contents are held in a volatile buffer somewhere within the system (main memory, disk cache etc.). SQLite does not assume that the updated data has reached the persistent storage media, until after it has successfully <i>synced</i> the corresponding file by invoking the VFS xSync() method. <i>Syncing</i> a file causes all modifications to the file up until that point to be committed to persistent storage. <p> Based on the above, SQLite usually uses an internal model of the file-system whereby any sector of a file written to is considered to be in a transient state until after the file has been successfully <i>synced</i>. Should a power or system failure occur while a sector is in a transient state, it is impossible to predict its contents following recovery. It may be written correctly, not written at all, overwritten with random data, or any combination thereof. <p> For example, if the <i>sector-size</i> of a given file-system is 2048 bytes, and SQLite opens a file and writes a 1024 byte block of data to offset 3072 of the file, then according to the model the second sector of the file is in the transient state. If a power failure or operating system crash occurs before or during the next call to xSync() on the file handle, then following system recovery SQLite assumes that all file data between byte offsets 2048 and 4095, inclusive, is invalid. It also assumes that since the first sector of the file, containing the data from byte offset 0 to 2047 inclusive, is valid, since it was not in a transient state when the crash occured. <p> Assuming that any and all sectors in the transient state may be corrupted following a power or system failure is a very pessimistic approach. Some modern systems provide more sophisticated guarantees than this. SQLite allows the VFS implementation to specify at runtime that the current platform supports one or more of the following properties: <ul> <li>The <b>safe-append</b> property. Details... <li>The <b>sequential-write</b> property. Details... <li>The <b>atomic-write</b> property. Details... </ul> <h3>Details</h3> <p> This section describes how the assumptions presented in the parent section apply to the individual API functions and operations provided by the VFS to SQLite for the purposes of modifying the contents of the file-system. <p> SQLite manipulates the contents of the file-system using a combination of the following four types of operation: <ul> <li> <b>Create file</b> operations. SQLite may create new files within the file-system by invoking the xOpen() method of the sqlite3_io_methods object. <li> <b>Delete file</b> operations. SQLite may remove files from the file system by calling the xDelete() method of the sqlite3_io_methods object. <li> <b>Truncate file</b> operations. SQLite may truncate existing files by invoking the xTruncate() method of the sqlite3_file object. <li> <b>Write file</b> operations. SQLite may modify the contents and increase the size of a file by files by invoking the xWrite() method of the sqlite3_file object. </ul> <p> Additionally, all VFS implementations are required to provide the <i>sync file</i> operation, accessed via the xSync() method of the sqlite3_file object, used to flush create, write and truncate operations on a file to the persistent storage medium. <p> The formalized assumptions in this section refer to <i>system failure</i> events. In this context, this should be interpreted as any failure that causes the system to stop operating. For example a power failure or operating system crash. <p> SQLite does not assume that a <b>create file</b> operation has actually modified the file-system records within perisistent storage until after the file has been successfully <i>synced</i>. ASSUMPTION A21001 If a system failure occurs during or after a "create file" operation, but before the created file has been <i>synced</i>, then SQLite assumes that it is possible that the created file may not exist following system recovery. <p> Of course, it is also possible that it does exist following system recovery. ASSUMPTION A21002 If a "create file" operation is executed by SQLite, and then the created file <i>synced</i>, then SQLite assumes that the file-system modifications corresponding to the "create file" operation have been committed to persistent media. It is assumed that if a system failure occurs any time after the file has been successfully <i>synced</i>, then the file is guaranteed to appear in the file-system following system recovery. <p> A <b>delete file</b> operation (invoked by a call to the VFS xDelete() method) is assumed to be an atomic and durable operation. </p> ASSUMPTION A21003 If a system failure occurs at any time after a "delete file" operation (call to the VFS xDelete() method) returns successfully, it is assumed that the file-system will not contain the deleted file following system recovery. ASSUMPTION A21004 If a system failure occurs during a "delete file" operation, it is assumed that following system recovery the file-system will either contain the file being deleted in the state it was in before the operation was attempted, or not contain the file at all. It is assumed that it is not possible for the file to have become corrupted purely as a result of a failure occuring during a "delete file" operation. <p> The effects of a <b>truncate file</b> operation are not assumed to be made persistent until after the corresponding file has been <i>synced</i>. ASSUMPTION A21005 If a system failure occurs during or after a "truncate file" operation, but before the truncated file has been <i>synced</i>, then SQLite assumes that the size of the truncated file is either as large or larger than the size that it was to be truncated to. ASSUMPTION A21006 If a system failure occurs during or after a "truncate file" operation, but before the truncated file has been <i>synced</i>, then it is assumed that the contents of the file up to the size that the file was to be truncated to are not corrupted. <p> The above two assumptions may be interpreted to mean that if a system failure occurs after file truncation but before the truncated file is <i>synced</i>, the contents of the file following the point at which it was to be truncated may not be trusted. They may contain the original file data, or may contain garbage. ASSUMPTION A21007 If a "truncate file" operation is executed by SQLite, and then the truncated file <i>synced</i>, then SQLite assumes that the file-system modifications corresponding to the "truncate file" operation have been committed to persistent media. It is assumed that if a system failure occurs any time after the file has been successfully <i>synced</i>, then the effects of the file truncation are guaranteed to appear in the file system following recovery. <p> A <b>write file</b> operation modifies the contents of an existing file within the file-system. It may also increase the size of the file. The effects of a <i>write file</i> operation are not assumed to be made persistent until after the corresponding file has been <i>synced</i>. ASSUMPTION A21008 If a system failure occurs during or after a "write file" operation, but before the corresponding file has been <i>synced</i>, then it is assumed that the content of all sectors spanned by the <i>write file</i> operation is assumed to be untrustworthy following system recovery. This includes regions of the sectors that were not actually modified by the write file operation. It is assumed that it is possible that the sector data was written correctly, partially, or not at all, or that the sector has been completely or partially filled with random data. ASSUMPTION A21009 If a system failure occurs during or after a "write file" operation that causes the file to grow, but before the corresponding file has been <i>synced</i>, then it is assumed that the size of the file following recovery is as large or larger than it was before the "write file" operation that, if successful, would cause the file to grow. <!-- <p> The return value of the xSectorSize() method, the <i>sector-size</i>, is expected by SQLite to be a power of 2 value greater than or equal to 512. <p class=todo> What does it do if this is not the case? If the sector size is less than 512 then 512 is used instead. How about a non power-of-two value? UPDATE: How this situation is handled should be described in the API requirements. Here we can just refer to the other document. <p> |
︙ | ︙ | |||
170 171 172 173 174 175 176 | occured may be trusted. </ul> <p class=todo> What do we assume about the other three file-system write operations - xTruncate(), xDelete() and "create file"? | < < < < < < < < < < < < < > | 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 | occured may be trusted. </ul> <p class=todo> What do we assume about the other three file-system write operations - xTruncate(), xDelete() and "create file"? <p> The xDeviceCharacteristics() method returns a set of flags, indicating which of the following properties (if any) the file-system provides: <ul> <li>The <b><i>sequential IO</i></b> property. If a file-system has this property, then in the event of a crash at most a single sector may contain invalid data. The file-system guarantees <li>The <b><i>safe-append</i></b> property. <li>The <b><i>atomic write</i></b> property. </ul> <p class=todo> Write an explanation as to how the file-system properties influence the model used to predict file damage after a catastrophy. --> <h1>Database Connections</h1> <p> Within this document, the term <i>database connection</i> has a slightly different meaning from that which one might assume. The handles returned |
︙ | ︙ |