Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Documentation of the new pager locking mechanism. (CVS 1570) |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
13cf1ba8256bf8cee0195dbaeac71a20 |
User & Date: | drh 2004-06-11 17:48:03.000 |
Context
2004-06-11
| ||
22:04 | Fix typos in the new locking document. (CVS 1571) (check-in: 022075517c user: drh tags: trunk) | |
17:48 | Documentation of the new pager locking mechanism. (CVS 1570) (check-in: 13cf1ba825 user: drh tags: trunk) | |
13:19 | Have the vdbe aggregator use a btree table instead of a hash table. (CVS 1569) (check-in: 8d56118f64 user: danielk1977 tags: trunk) | |
Changes
Changes to main.mk.
︙ | ︙ | |||
415 416 417 418 419 420 421 422 423 424 425 426 427 428 | index.html: $(TOP)/www/index.tcl last_change tclsh $(TOP)/www/index.tcl >index.html lang.html: $(TOP)/www/lang.tcl tclsh $(TOP)/www/lang.tcl >lang.html omitted.html: $(TOP)/www/omitted.tcl tclsh $(TOP)/www/omitted.tcl >omitted.html opcode.html: $(TOP)/www/opcode.tcl $(TOP)/src/vdbe.c tclsh $(TOP)/www/opcode.tcl $(TOP)/src/vdbe.c >opcode.html mingw.html: $(TOP)/www/mingw.tcl | > > > | 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | index.html: $(TOP)/www/index.tcl last_change tclsh $(TOP)/www/index.tcl >index.html lang.html: $(TOP)/www/lang.tcl tclsh $(TOP)/www/lang.tcl >lang.html lockingv3.html: $(TOP)/www/lockingv3.tcl tclsh $(TOP)/www/lockingv3.tcl >lockingv3.html omitted.html: $(TOP)/www/omitted.tcl tclsh $(TOP)/www/omitted.tcl >omitted.html opcode.html: $(TOP)/www/opcode.tcl $(TOP)/src/vdbe.c tclsh $(TOP)/www/opcode.tcl $(TOP)/src/vdbe.c >opcode.html mingw.html: $(TOP)/www/mingw.tcl |
︙ | ︙ | |||
471 472 473 474 475 476 477 478 479 480 481 482 483 484 | docs.html \ download.html \ faq.html \ fileformat.html \ formatchng.html \ index.html \ lang.html \ mingw.html \ nulls.html \ omitted.html \ opcode.html \ quickstart.html \ speed.html \ sqlite.gif \ | > | 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 | docs.html \ download.html \ faq.html \ fileformat.html \ formatchng.html \ index.html \ lang.html \ lockingv3.html \ mingw.html \ nulls.html \ omitted.html \ opcode.html \ quickstart.html \ speed.html \ sqlite.gif \ |
︙ | ︙ |
Changes to src/sqlite.h.in.
︙ | ︙ | |||
8 9 10 11 12 13 14 | ** May you find forgiveness for yourself and forgive others. ** May you share freely, never taking more than you give. ** ************************************************************************* ** This header file defines the interface that the SQLite library ** presents to client programs. ** | | | 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ** May you find forgiveness for yourself and forgive others. ** May you share freely, never taking more than you give. ** ************************************************************************* ** This header file defines the interface that the SQLite library ** presents to client programs. ** ** @(#) $Id: sqlite.h.in,v 1.98 2004/06/11 17:48:03 drh Exp $ */ #ifndef _SQLITE_H_ #define _SQLITE_H_ #include <stdarg.h> /* Needed for the definition of va_list */ /* ** Make sure we can call this stuff from C++. |
︙ | ︙ | |||
959 960 961 962 963 964 965 966 | void sqlite3_result_int(sqlite3_context*, int); void sqlite3_result_int64(sqlite3_context*, long long int); void sqlite3_result_null(sqlite3_context*); void sqlite3_result_text(sqlite3_context*, const char*, int n, int eCopy); void sqlite3_result_text16(sqlite3_context*, const void*, int n, int eCopy); void sqlite3_result_value(sqlite3_context*, sqlite3_value*); #define SQLITE_UTF8 1 | > > > > > | | > | 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 | void sqlite3_result_int(sqlite3_context*, int); void sqlite3_result_int64(sqlite3_context*, long long int); void sqlite3_result_null(sqlite3_context*); void sqlite3_result_text(sqlite3_context*, const char*, int n, int eCopy); void sqlite3_result_text16(sqlite3_context*, const void*, int n, int eCopy); void sqlite3_result_value(sqlite3_context*, sqlite3_value*); /* ** These are the allowed values for the eTextRep argument to ** sqlite3_create_collation and sqlite3_create_function. */ #define SQLITE_UTF8 1 #define SQLITE_UTF16 2 /* Use native byte order */ #define SQLITE_UTF16LE 3 #define SQLITE_UTF16BE 4 #define SQLITE_ANY 5 /* sqlite3_create_function only */ /* ** These two functions are used to add new collation sequences to the ** sqlite3 handle specified as the first argument. ** ** The name of the new collation sequence is specified as a UTF-8 string ** for sqlite3_create_collation() and a UTF-16 string for |
︙ | ︙ |
Changes to www/capi3ref.tcl.
|
| | | 1 2 3 4 5 6 7 8 | set rcsid {$Id: capi3ref.tcl,v 1.2 2004/06/11 17:48:04 drh Exp $} source common.tcl header {C/C++ Interface For SQLite Version 3} puts { <h2>C/C++ Interface For SQLite Version 3</h2> } proc api {name prototype desc {notused x}} { |
︙ | ︙ | |||
65 66 67 68 69 70 71 | } { The next routine returns the number of calls to xStep for a particular aggregate function instance. The current call to xStep counts so this routine always returns at least 1. } api {} { | | | | > > | | > | > > | | | | | | 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | } { The next routine returns the number of calls to xStep for a particular aggregate function instance. The current call to xStep counts so this routine always returns at least 1. } api {} { int sqlite3_bind_blob(sqlite3_stmt*, int, const void*, int n, void(*)(void*)); int sqlite3_bind_double(sqlite3_stmt*, int, double); int sqlite3_bind_int(sqlite3_stmt*, int, int); int sqlite3_bind_int64(sqlite3_stmt*, int, long long int); int sqlite3_bind_null(sqlite3_stmt*, int); int sqlite3_bind_text(sqlite3_stmt*, int, const char*, int n, void(*)(void*)); int sqlite3_bind_text16(sqlite3_stmt*, int, const void*, int n, void(*)(void*)); #define SQLITE_STATIC ((void*)0) #define SQLITE_EPHEMERAL ((void*)8) } { In the SQL strings input to sqlite3_prepare() and sqlite3_prepare16(), one or more literals can be replace by a wildcard "?" or ":N:" where N is an integer. The value of these wildcard literals can be set using these routines. The first parameter is a pointer to the sqlite3_stmt structure returned from sqlite3_prepare(). The second parameter is the index of the wildcard. The first "?" has an index of 1. ":N:" wildcards use the index N. The fifth parameter to sqlite3_bind_blob(), sqlite3_bind_text(), and sqlite3_bind_text16() is a destructor used to dispose of the BLOB or text after SQLite has finished with it. If the fifth argument is the special value SQLITE_STATIC, then the library assumes that the information is in static, unmanaged space and does not need to be freed. If the fifth argument has the value SQLITE_EPHEMERAL, then SQLite makes its on private copy of the data. The sqlite3_bind_*() routine must be called after sqlite3_prepare() or sqlite3_reset() and before sqlite3_step(). Bindings are not reset by the sqlite3_reset() routine. Unbound wildcards are interpreted as NULL. } api {} { void sqlite3_busy_handler(sqlite*, int(*)(void*,int), void*); } { This routine identifies a callback function that is invoked whenever an attempt is made to open a database table that is currently locked by another process or thread. If the busy callback is NULL, then sqlite3_exec() returns SQLITE_BUSY immediately if it finds a locked table. If the busy callback is not NULL, then sqlite3_exec() invokes the callback with two arguments. The second argument is the number of prior calls to the busy callback for the same lock. If the busy callback returns 0, then sqlite3_exec() immediately returns SQLITE_BUSY. If the callback returns non-zero, then sqlite3_exec() tries to open the table again and the cycle repeats. The default busy callback is NULL. Sqlite is re-entrant, so the busy handler may start a new query. |
︙ | ︙ | |||
304 305 306 307 308 309 310 311 312 313 314 315 316 317 | The parameter must be a nul-terminated UTF-8 string for sqlite3_complete() and a nul-terminated UTF-16 string for sqlite3_complete16(). The algorithm is simple. If the last token other than spaces and comments is a semicolon, then return true. otherwise return false. } {} api {} { int sqlite3_create_function( sqlite3 *, const char *zFunctionName, int nArg, int eTextRep, | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 | The parameter must be a nul-terminated UTF-8 string for sqlite3_complete() and a nul-terminated UTF-16 string for sqlite3_complete16(). The algorithm is simple. If the last token other than spaces and comments is a semicolon, then return true. otherwise return false. } {} api {} { int sqlite3_create_collation( sqlite3*, const char *zName, int pref16, void*, int(*xCompare)(void*,int,const void*,int,const void*) ); int sqlite3_create_collation16( sqlite3*, const char *zName, int pref16, void*, int(*xCompare)(void*,int,const void*,int,const void*) ); #define SQLITE_UTF8 1 #define SQLITE_UTF16 2 #define SQLITE_UTF16BE 3 #define SQLITE_UTF16LE 4 } { These two functions are used to add new collation sequences to the sqlite3 handle specified as the first argument. The name of the new collation sequence is specified as a UTF-8 string for sqlite3_create_collation() and a UTF-16 string for sqlite3_create_collation16(). In both cases the name is passed as the second function argument. The third argument must be one of the constants SQLITE_UTF8, SQLITE_UTF16LE or SQLITE_UTF16BE, indicating that the user-supplied routine expects to be passed pointers to strings encoded using UTF-8, UTF-16 little-endian or UTF-16 big-endian respectively. A pointer to the user supplied routine must be passed as the fifth argument. If it is NULL, this is the same as deleting the collation sequence (so that SQLite cannot call it anymore). Each time the user supplied function is invoked, it is passed a copy of the void* passed as the fourth argument to sqlite3_create_collation() or sqlite3_create_collation16() as its first parameter. The remaining arguments to the user-supplied routine are two strings, each represented by a [length, data] pair and encoded in the encoding that was passed as the third argument when the collation sequence was registered. The user routine should return negative, zero or positive if the first string is less than, equal to, or greater than the second string. i.e. (STRING1 - STRING2). } api {} { int sqlite3_collation_needed( sqlite3*, void*, void(*)(void*,sqlite3*,int eTextRep,const char*) ); int sqlite3_collation_needed16( sqlite3*, void*, void(*)(void*,sqlite3*,int eTextRep,const void*) ); } { To avoid having to register all collation sequences before a database can be used, a single callback function may be registered with the database handle to be called whenever an undefined collation sequence is required. If the function is registered using the sqlite3_collation_needed() API, then it is passed the names of undefined collation sequences as strings encoded in UTF-8. If sqlite3_collation_needed16() is used, the names are passed as UTF-16 in machine native byte order. A call to either function replaces any existing callback. When the user-function is invoked, the first argument passed is a copy of the second argument to sqlite3_collation_needed() or sqlite3_collation_needed16(). The second argument is the database handle. The third argument is one of SQLITE_UTF8, SQLITE_UTF16BE or SQLITE_UTF16LE, indicating the most desirable form of the collation sequence function required. The fourth parameter is the name of the required collation sequence. The collation sequence is returned to SQLite by a collation-needed callback using the sqlite3_create_collation() or sqlite3_create_collation16() APIs, described above. } api {} { int sqlite3_create_function( sqlite3 *, const char *zFunctionName, int nArg, int eTextRep, |
︙ | ︙ | |||
328 329 330 331 332 333 334 | int eTextRep, int iCollateArg, void*, void (*xFunc)(sqlite3_context*,int,sqlite3_value**), void (*xStep)(sqlite3_context*,int,sqlite3_value**), void (*xFinal)(sqlite3_context*) ); | | | | | > | 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | int eTextRep, int iCollateArg, void*, void (*xFunc)(sqlite3_context*,int,sqlite3_value**), void (*xStep)(sqlite3_context*,int,sqlite3_value**), void (*xFinal)(sqlite3_context*) ); #define SQLITE_UTF8 1 #define SQLITE_UTF16 2 #define SQLITE_UTF16BE 3 #define SQLITE_UTF16LE 4 #define SQLITE_ANY 5 } { These two functions are used to add user functions or aggregates implemented in C to the SQL langauge interpreted by SQLite. The difference only between the two is that the second parameter, the name of the (scalar) function or aggregate, is encoded in UTF-8 for sqlite3_create_function() and UTF-16 for sqlite3_create_function16(). |
︙ | ︙ | |||
616 617 618 619 620 621 622 | should always use %q instead of %s when inserting text into a string literal. } {} api {} { int sqlite3_open( const char *filename, /* Database filename (UTF-8) */ | | < | < | > | | 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 | should always use %q instead of %s when inserting text into a string literal. } {} api {} { int sqlite3_open( const char *filename, /* Database filename (UTF-8) */ sqlite3 **ppDb /* OUT: SQLite db handle */ ); int sqlite3_open16( const void *filename, /* Database filename (UTF-16) */ sqlite3 **ppDb /* OUT: SQLite db handle */ ); } { Open the sqlite database file "filename". The "filename" is UTF-8 encoded for sqlite3_open() and UTF-16 encoded in the native byte order for sqlite3_open16(). An sqlite3* handle is returned in *ppDb, even if an error occurs. If the database is opened (or created) successfully, then SQLITE_OK is returned. Otherwise an error code is returned. The sqlite3_errmsg() or sqlite3_errmsg16() routines can be used to obtain an English language description of the error. If the database file does not exist, then a new database will be created as needed. The encoding for the database will be UTF-8 if sqlite3_open() is called and UTF-16 if sqlite3_open16 is used. Whether or not an error occurs when it is opened, resources associated with the sqlite3* handle should be released by passing it to sqlite3_close() when it is no longer required. } |
︙ | ︙ | |||
725 726 727 728 729 730 731 | statement obtained by a previous call to sqlite3_prepare() or sqlite3_prepare16() back to it's initial state, ready to be re-executed. Any SQL statement variables that had values bound to them using the sqlite3_bind_*() API retain their values. } api {} { | | | | > > | 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 | statement obtained by a previous call to sqlite3_prepare() or sqlite3_prepare16() back to it's initial state, ready to be re-executed. Any SQL statement variables that had values bound to them using the sqlite3_bind_*() API retain their values. } api {} { void sqlite3_result_blob(sqlite3_context*, const void*, int n, void(*)(void*)); void sqlite3_result_double(sqlite3_context*, double); void sqlite3_result_error(sqlite3_context*, const char*, int); void sqlite3_result_error16(sqlite3_context*, const void*, int); void sqlite3_result_int(sqlite3_context*, int); void sqlite3_result_int64(sqlite3_context*, long long int); void sqlite3_result_null(sqlite3_context*); void sqlite3_result_text(sqlite3_context*, const char*, int n, void(*)(void*)); void sqlite3_result_text16(sqlite3_context*, const void*, int n, void(*)(void*)); void sqlite3_result_text16be(sqlite3_context*, const void*, int n, void(*)(void*)); void sqlite3_result_text16le(sqlite3_context*, const void*, int n, void(*)(void*)); void sqlite3_result_value(sqlite3_context*, sqlite3_value*); } { User-defined functions invoke the following routines in order to set their return value. The sqlite3_result_value() routine is used to return an exact copy of one of the parameters to the function. } |
︙ | ︙ | |||
860 861 862 863 864 865 866 867 868 869 870 871 872 873 | int sqlite3_value_bytes(sqlite3_value*); int sqlite3_value_bytes16(sqlite3_value*); double sqlite3_value_double(sqlite3_value*); int sqlite3_value_int(sqlite3_value*); long long int sqlite3_value_int64(sqlite3_value*); const unsigned char *sqlite3_value_text(sqlite3_value*); const void *sqlite3_value_text16(sqlite3_value*); int sqlite3_value_type(sqlite3_value*); } { This group of routines returns information about parameters to a user-defined function. Function implementations use these routines to access their parameters. These routines are the same as the sqlite3_column_* routines except that these routines take a single sqlite3_value* pointer instead of an sqlite3_stmt* and an integer | > > | 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 | int sqlite3_value_bytes(sqlite3_value*); int sqlite3_value_bytes16(sqlite3_value*); double sqlite3_value_double(sqlite3_value*); int sqlite3_value_int(sqlite3_value*); long long int sqlite3_value_int64(sqlite3_value*); const unsigned char *sqlite3_value_text(sqlite3_value*); const void *sqlite3_value_text16(sqlite3_value*); const void *sqlite3_value_text16be(sqlite3_value*); const void *sqlite3_value_text16le(sqlite3_value*); int sqlite3_value_type(sqlite3_value*); } { This group of routines returns information about parameters to a user-defined function. Function implementations use these routines to access their parameters. These routines are the same as the sqlite3_column_* routines except that these routines take a single sqlite3_value* pointer instead of an sqlite3_stmt* and an integer |
︙ | ︙ |
Changes to www/docs.tcl.
1 2 3 | # This script generates the "docs.html" page that describes various # sources of documentation available for SQLite. # | | | 1 2 3 4 5 6 7 8 9 10 11 | # This script generates the "docs.html" page that describes various # sources of documentation available for SQLite. # set rcsid {$Id: docs.tcl,v 1.4 2004/06/11 17:48:04 drh Exp $} source common.tcl header {SQLite Documentation} puts { <h2>Available Documentation</h2> <table width="100%" cellpadding="5"> } |
︙ | ︙ | |||
35 36 37 38 39 40 41 42 43 44 45 46 47 48 | doc {Version 3 C/C++ API<br>Reference} {capi3ref.html} { This document describes each API function separately. } doc {Tcl API} {tclsqlite.html} { A description of the TCL interface bindings for SQLite. } doc {Version 2 DataTypes } {datatypes.html} { A description of how SQLite version 2 handles SQL datatypes. } doc {Version 3 DataTypes } {datatype3.html} { SQLite version 3 introduces the concept of manifest typing, where the type of a value is associated with the value itself, not the column that | > > > > > | 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | doc {Version 3 C/C++ API<br>Reference} {capi3ref.html} { This document describes each API function separately. } doc {Tcl API} {tclsqlite.html} { A description of the TCL interface bindings for SQLite. } doc {Locking And Concurrency<br>In SQLite Version 3} {lockingv3.html} { A description of how the new locking code in version 3 increases concurrancy and decreases the problem of writer starvation. } doc {Version 2 DataTypes } {datatypes.html} { A description of how SQLite version 2 handles SQL datatypes. } doc {Version 3 DataTypes } {datatype3.html} { SQLite version 3 introduces the concept of manifest typing, where the type of a value is associated with the value itself, not the column that |
︙ | ︙ |
Added www/lockingv3.tcl.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 | # # Run this script to generated a lockingv3.html output file # set rcsid {$Id: } source common.tcl header {File Locking And Concurrency In SQLite Version 3} proc HEADING {level title} { global pnum incr pnum($level) foreach i [array names pnum] { if {$i>$level} {set pnum($i) 0} } set h [expr {$level+1}] if {$h>6} {set h 6} set n $pnum(1).$pnum(2) for {set i 3} {$i<=$level} {incr i} { append n .$pnum($i) } puts "<h$h>$n $title</h$h>" } set pnum(1) 0 set pnum(2) 0 set pnum(3) 0 set pnum(4) 0 set pnum(5) 0 set pnum(6) 0 set pnum(7) 0 set pnum(8) 0 HEADING 1 {File Locking And Concurrency In SQLite Version 3} puts { <p>Version 3 of SQLite introduces a more sophisticated locking mechanism design to improve concurrency and reduce the writer starvation problem. This document describes the new locking mechanism. The intended audience is programmers who want to understand and/or modify the pager code and reviewers working to verify the design of SQLite version 3. </p> } HEADING 1 {Overview} puts { <p> Locking and concurrency control are handled by the the <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/pager.c"> pager module</a>. The pager module is responsible for make SQLite "ACID" (Atomic, Consistent, Isolated, and Durable). The pager module makes sure changes happen all at once, that either all changes occur or none of them do, that two or more threads or processes do not try to access the database in incompatible ways at the same time, and that once changes have been written they persist until explicitly deleted. The pager also provides an memory cache of some of the contents of the disk file.</p> <p>The pager is unconcerned with the details of B-Trees, text encodings, indices, and so forth. From the point of view of the pager, the database consists of a single file of uniform-sized blocks. Each block is called a "page" is is usually 1024 bytes in size. The pages are numbered beginning with 1. So the first 1024 bytes of the database are called "page 1" and the second 1024 bytes are call "page 2" and so forth. All other encoding details are handled by higher layers of the library. The pager communicates with the operating system using one of several modules (Examples: <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/os_unix.c"> os_unix.c</a>, <a href="http://www.sqlite.org/cvstrac/getfile/sqlite/src/os_win.c"> os_win.c</a>) that provides a uniform abstraction for operating system services. </p> } HEADING 1 {Locking} puts { <p> From the point of view of a single thread or process, a database file can be in one of five locking states: </p> <p> <table cellpadding="20"> <tr><td valign="top">UNLOCKED</td> <td valign="top"> No locks are held on the database. The database may be neither read nor written. Any internally cached data is considered suspect and subject to verification against the database file before being used. Other threads and processes can read or write the database as their own locking states permit. This is the default state. </td></tr> <tr><td valign="top">SHARED</td> <td valign="top"> The database may be read but not written. Any number of threads or processes can hold SHARED locks at the same time, hence there can be many simultaneous readers. But no other thread or process is allowed to write to the database file while one or more SHARED locks are active. </td></tr> <tr><td valign="top">RESERVED</td> <td valign="top"> A RESERVED lock means that the process is planning on writing to the database file at some point in the future but that it is currently just reading from the file. Only a single RESERVED lock may be active at one time, though multiple SHARED locks can coexist with a single RESERVED lock. RESERVED differs from PENDING in that new SHARED locks can be acquired while there is a RESERVED lock. </td></tr> <tr><td valign="top">PENDING</td> <td valign="top"> A PENDING lock means that the process holding the lock wants to write to the database as soon as possible and is just waiting on all current SHARED locks to clear so that it can get an EXCLUSIVE lock. No new SHARED locks are permitted against the database if a PENDING lock is active, though existing SHARED locks are allowed to continue. </td></tr> <tr><td valign="top">EXCLUSIVE</td> <td valign="top"> An EXCLUSIVE lock is needed in order to write to the database file. Only one EXCLUSIVE lock is allowed on the file and no other locks of any kind are allowed to coexist with an EXCLUSIVE lock. In order to maximize concurrency, SQLite works to minimize the amount of time that EXCLUSIVE locks are held. </td></tr> </table> </p> <p> The operating system interface layer understands and tracks all five locking states described above. (It has to, since it is responsible for implementing the locks.) But the pager module only tracks four of the five locking states. A PENDING lock is always just a temporary stepping stone on the path to an EXCLUSIVE lock and so the pager module does not track PENDING locks. </p> } HEADING 1 {The Rollback Journal} puts { <p>Any time a process wants to make a changes to a database file, it first records enough information in the <em>rollback journal</em> to restore the database file back to its initial condition. Thus, before altering any page of the database, the original contents of that page must be written into the journal. The journal also records the initial size of the database so that if the database file grows it can be truncated back to its original size on a rollback.</p> <p>The rollback journal is a ordinary disk file that has the same name as the database file with the suffix "<tt>-journal</tt>" added.</p> <p>If SQLite is working with multiple databases at the same time (using the ATTACH command) then each database has its own journal. But there is also a separate aggregate journal called the "master journal". The master journal does not contain page data used for rolling back changes. Instead the master journal contains the names of the individual file journals for each of the ATTACHed databases. Each of the individual file journals also contain the name of the master journal. If there are no ATTACHed databases (or if none of the ATTACHed database is participating in the current transaction) no master journal is created and the normal rollback journal contains an empty string in the place normally reserved for recording the name of the master journal.</p> <p>A individual file journal is said to be "hot" if it needs to be rolled back in order to restore the integrity of its database. A hot journal is created when a process is in the middle of a database update and a program or operating system crash or power failure prevents the update from completing. Hot journals are an exception condition. Hot journals exist to facility recovery from crashes and power failures. If everything is working correctly (that is, if there are no crashes or power failures) you will never get a hot journal. </p> <p> If no master journal is involved, then a journal is hot if it exists and its corresponding database file does not have a RESERVED lock. If a master journal is named in the file journal, then the file journal is hot if its master journal exists and there is no RESERVED lock on the corresponding database file. </p> } HEADING 2 {Dealing with hot journals} puts { <p> Before reading from a a database file, SQLite always checks to see if that file has a hot journal. If the file does have a hot journal, then the journal is rolled back before the file is read. In this way, we ensure that the database file is in a consistent state before it is read. </p> <p>When a process wants to read from a database file, it followed the following sequence of steps: </p> <ol> <li>Open the database file and obtain a SHARED lock. If the SHARED lock cannot be obtained, fail immediately and return SQLITE_BUSY.</li> <li>Check to see if the database file has a hot journal. If the file does not have a hot journal, we are done. Return immediately. If there is a hot journal, that journal must be rolled back by the subsequent steps of this algorithm.</li> <li>Acquire a PENDING then an EXCLUSIVE lock on the database file. (Note: do not acquire a RESERVED lock because that would make other processes think the journal was no longer hot.) If we fail to acquire this lock it means another process or thread is already trying to do the rollback. In that case, drop all locks, close the database, and return SQLITE_BUSY. </li> <li>Read the journal file and roll back the changes.</li> <li>Wait for the rolled back changes to be written onto the surface of the disk. This protects the integrity of the database in case another power failure or crash occurs.</li> <li>Delete the journal file.</li> <li>Delete the master journal file if it is safe to do so. This step is optional. It is here only to prevent stale master journals from cluttering up the disk drive. See the discussion below for details.</li> <li>Drop the EXCLUSIVE and PENDING locks but retain the SHARED lock.</li> </ol> <p>After the algorithm above completes successfully, it is safe to read from the database file. Once all reading has completed, the SHARED lock is dropped.</p> } HEADING 2 {Deleting stale master journals} puts { <p>A stale master journal is a master journal that is no longer being used for anything. There is no requirement that stale master journals be deleted. The only reason for doing so is to free up disk space.</p> <p>A master journal is stale if no individual file journals are pointing to it. To figure out if a master journal is stale, we first read the master journal to obtain the names of all of its file journals. Then we check each of those file journals. If any of the file journals named in the master journal exists and points back to the master journal, then the master journal is not stale. If all file journals are either missing or refer to other master journals or no master journal at all, then the master journal we are testing is stale and can be safely deleted.</p> } HEADING 2 {Writing to a database file} puts { <p>To write to a database, a process must first acquire a SHARED lock as described above (possibly rolling back incomplete changes if there is a hot journal). After a SHARED lock is obtained, a RESERVED lock must be acquired. The RESERVED lock signals that the process intentions to write to the database at some point in the future. Only one process at a time can hold a reserved lock. But other processes can continue to read the database while the RESERVED lock is held. </p> <p>If the process that wants to write is unable to obtain a RESERVED lock, it must mean that another process already has a RESERVED lock. In that case, the write attempt fails and returns SQLITE_BUSY.</p> <p>After obtaining a RESERVED lock, the process that wants to write creates a rollback journal. The header of the journal is initialized with the original size of the database file. Space in the journal header is also reserved for a master journal name, though the master journal name is initially empty.</p> <p>Before making changes to any page of the database, the process writes the original value of that page into the rollback journal. Changes to pages are held in memory at first and are not written to the disk. The original database file remains unaltered, which means that other processes can continue to read the database.</p> <p>Eventually, the writing process will want to update the database file, either because its memory cache has filled up or because it is ready to commit its changes. Before this happens, the writer must make sure no other process is reading the database and that the rollback journal data is safely on the disk surface so that it can be used to rollback incomplete changes in the event of a power failure. The steps are as follows:</p> <ol> <li>Make sure all rollback journal data has actually been written to the surface of the disk (and is not just being held in the operating system's or disk controllers cache) so that if a power failure occurs the data will still be there after power is restored.</li> <li>Obtain a PENDING lock and then an EXCLUSIVE lock on the database file. If other processes are still have SHARED locks, the writer might have to wait until those SHARED locks clear before it is able to obtain an EXCLUSIVE lock.</li> <li>Write all page modifications currently held in memory out to the original database disk file.</li> </ol> <p> If the reason for writing to the database file is because the memory cache was full, then the writer will not commit right away. Instead, the writer might continue to make changes to other pages. Before subsequent changes are written to the database file, the rollback journal must be flushed to disk again. Note also that the EXCLUSIVE lock that the writer obtained in order to write to the database initially must be held until all changes are committed. That means that from the time the memory cache first spills to disk up until the transaction commits, no other processes are able to access the database. </p> <p> When a writer is ready to commit its changes, it executes the following steps: </p> <ol> <li value="4"> Obtain an EXCLUSIVE lock on the database file and make sure all memory changes have been written to the database file using the algorithm of steps 1-3 above.</li> <li>Flush all database file changes to the disk. Wait for those changes to actually be written onto the disk surface.</li> <li>Delete the journal file. This is the instant when the changes are committed. Prior to deleting the journal file, if a power failure or crash occurs, the next process to open the database will see that it has a hot journal and will roll the changes back. After the journal is deleted, there will no longer be a hot journal and the changes will persist. </li> <li>Drop the EXCLUSIVE and PENDING locks from the database file. </li> </ol> <p>As soon as PENDING lock is released from the database file, other processes can begin reading the database again. In the current implementation, the RESERVED lock is also released, but that is not essential. Future versions of SQLite might provide a "CHECKPOINT" SQL command that will commit all changes made so far within a transaction but retain the RESERVED lock so that additional changes can be made without given any other process an opportunity to write.</p> <p>If a transaction involves multiple databases, then a more complex commit sequence is used, as follows:</p> <ol> <li value="4"> Make sure all individual database files have an EXCLUSIVE lock and a valid journal. <li>Create a master-journal. The name of the master-journal is arbitrary. (The current implementation appends random suffixes to the name of the main database file until it finds a name that does not previously exist.) Fill the master journal with the names of all the individual journals and flush its contents to disk. <li>Write the name of the master journal into all individual journals (in space set aside for that purpose in the headers of the individual journals) and flush the contents of the individual journals to disk and wait for those changes to reach the disk surface. <li>Flush all database file changes to the disk. Wait for those changes to actually be written onto the disk surface.</li> <li>Delete the master journal file. This is the instant when the changes are committed. Prior to deleting the master journal file, if a power failure or crash occurs, the individual file journals will be considered hot and will be rolled back by the next process that attempts to read them. After the master journal has been deleted, the file journals will no longer be considered hot and the changes will persist. </li> <li>Delete all individual journal files. <li>Drop the EXCLUSIVE and PENDING locks from all database files. </li> </ol> } HEADING 1 {How To Corrupt Your Database Files} puts { <p>The pager module is robust but it is not completely failsafe. It can be subverted. This section attempt to identify and explain the risks.</p> <p> Clearly, a hardware or operating system fault that introduces incorrect data into the middle of the database file or journal will cause problems. Likewise, if a rogue process opens a database file or journal and writes malformed data into the middle of it, then the database will become corrupt. There is not much that can be done about these kinds of problems so so they are given no further attention. </p> <p> SQLite uses POSIX advisory locks to implement locking on Unix. On windows it uses the LockFile(), LockFileEx(), and UnlockFile() system calls. SQLite assumes that these system calls all work as advertised. If that is not the case, then database corruption can result. One should note that POSIX advisory locking is known to be buggy or even unimplemented on many NFS implementations (including recent versions of Mac OS X) and that there are persistent reports of locking problems for network filesystems under windows. Your best defense is to not use SQLite for files on a network filesystem. </p> <p> SQLite uses the fsync() system call to flush data to the disk under Unix and it uses the FlushFileBuffers() to do the same under windows. Once again, SQLite assumes that these operating system services function as advertised. But it has been reported that fsync() and FlushFileBuffers() do not always work correctly, especially with inexpensive IDE disks. Apparently some manufactures of IDE disks have defective controller chips that report that data has reached the disk surface when in fact the data is still in volatile cache memory in the disk drive electronics. There are also reports that windows sometimes chooses to ignore FlushFileBuffers() for unspecified reasons. The author cannot verify any of these reports. But if they are true, it means that database corruption is a possibility following an unexpected power loss. These are hardware and/or operating system bugs that SQLite is unable to defend against. </p> <p> If a crash or power failure occurs and results in a hot journal, but that journal is deleted. The next process to open the database will not know that it contains changes that need to be rolled back. The rollback will not occur and the database will be left in an inconsistent state. Rollback journals might be deleted for any number of reasons: </p> <ul> <li>An administrator might be cleaning up after an OS crash or power failure, see the journal file, think it is junk, and delete it.</li> <li>Someone (or some process) might rename the database file but fail to also rename its associated journal.</li> <li>If the database file has aliases (hard or soft links) and the file is opened by a different alias than the one used to create the journal, then the journal will not be found. To avoid this problem, you should not create links to SQLite database files.</li> <li>Filesystem corruption following a power failure might cause the journal to be renamed or deleted.</li> </ul> <p> The last (fourth) bullet above merits additional comment. When SQLite creates a journal file on Unix, it opens the directory that contains that file and calls fsync() on the directory, in an effort to push the directory information to disk. But suppose some other process is adding or removing unrelated files to the directory that contains the database and journal at the the moment of a power failure. The supposedly unrelated actions of this other process might in the journal file being dropped from the directory and moved into "lost+found". This is an unlikely scenario, but it could happen. The best defenses are to use a journaling filesystem or to keep the database and journal in a directory by themselves. </p> <p> For a commit involving multiple databases and a master journal, if the various databases were on different disk volumes and a power failure occurs during the commit, then when the machine comes back up the disks might be remounted with different names. Or some disks might not be mounted at all. When this happens the individual file journals and the master journal might not be able to find each other. The worst outcome from this scenario is that the commit ceases to be atomic. Some databases might be rolled back and others might not. All databases will continue to be self-consistent. To defend against this problem, keep all databases on the same disk volume and/or remount disks using exactly the same names after a power failure. </p> } HEADING 1 {Transaction Control At The SQL Level} puts { <p> The changes to locking and concurrency control in SQLite version 3 also introduce some subtle changes in the way transactions work at the SQL language level. By default, SQLite version 3 operates in "autocommit" mode. In autocommit mode, all changes to the database are committed as soon as all operations associated with the current database connection complete.</p> <p>The SQL command "BEGIN TRANSACTION" (the TRANSACTION keyword is optional) is used to take SQLite out of autocommit mode. Note that the BEGIN command does not acquire any locks on the database. After a BEGIN command, a SHARED lock will be acquired when the first SELECT statement is executed. A RESERVED lock will be acquired when the first INSERT, UPDATE, or DELETE statement is executed. No EXCLUSIVE locks is acquired until either the memory cache fills up and must be spilled to disk or until the transaction commits. In this way, the system delays blocking read access to the file file until the last possible moment. </p> <p>The SQL command "COMMIT" does not actually commit the changes to disk. It just turns autocommit back on. Then, at the conclusion of the command, the regular autocommit logic takes over and causes the actual commit to disk to occur. The SQL command "ROLLBACK" also operates by turning autocommit back on, but it also sets a flag that tells the autocommit logic to rollback rather than commit.</p> <p>If the SQL COMMIT command turns autocommit on and the autocommit logic then tries to commit change but fails because some other process is holding a SHARED lock, then autocommit is turned back off automatically. This allows the user to retry the COMMIT at a later time after the SHARED lock has had an opportunity to clear.</p> } footer $rcsid |