/ Check-in [cf8a0c71]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merge documentation changes from branch begin-concurrent-wal2 into this branch.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | begin-concurrent-pnu-wal2
Files: files | file ages | folders
SHA3-256: cf8a0c71cf1032bd4271cf889304def0fab82a76eac255d734652008fff6c9b6
User & Date: dan 2019-01-11 15:26:13
Wiki:begin-concurrent-pnu-wal2
Context
2019-03-08
16:06
Merge the latest enhancements and fixes from trunk. check-in: 543ecb30 user: drh tags: begin-concurrent-pnu-wal2
2019-01-11
15:26
Merge documentation changes from branch begin-concurrent-wal2 into this branch. check-in: cf8a0c71 user: dan tags: begin-concurrent-pnu-wal2
15:22
Merge documentation changes from branch "begin-concurrent" into this branch. Leaf check-in: 41e742bd user: dan tags: begin-concurrent-wal2
15:00
Merge latest begin-concurrent-wal2 changes (documentation only). check-in: d0ab56a3 user: dan tags: begin-concurrent-pnu-wal2
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Added doc/begin_concurrent.md.























































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

Begin Concurrent
================

## Overview

Usually, SQLite allows at most one writer to proceed concurrently. The
BEGIN CONCURRENT enhancement allows multiple writers to process write
transactions simultanously if the database is in "wal" or "wal2" mode,
although the system still serializes COMMIT commands.

When a write-transaction is opened with "BEGIN CONCURRENT", actually 
locking the database is deferred until a COMMIT is executed. This means
that any number of transactions started with BEGIN CONCURRENT may proceed
concurrently. The system uses optimistic page-level-locking to prevent
conflicting concurrent transactions from being committed.

When a BEGIN CONCURRENT transaction is committed, the system checks whether 
or not any of the database pages that the transaction has read have been
modified since the BEGIN CONCURRENT was opened. In other words - it asks 
if the transaction being committed operates on a different set of data than
all other concurrently executing transactions. If the answer is "yes, this
transaction did not read or modify any data modified by any concurrent
transaction", then the transaction is committed as normal. Otherwise, if the
transaction does conflict, it cannot be committed and an SQLITE_BUSY_SNAPSHOT
error is returned. At this point, all the client can do is ROLLBACK the
transaction.

If SQLITE_BUSY_SNAPSHOT is returned, messages are output via the sqlite3_log
mechanism indicating the page and table or index on which the conflict
occurred. This can be useful when optimizing concurrency.

## Application Programming Notes

In order to serialize COMMIT processing, SQLite takes a lock on the database
as part of each COMMIT command and releases it before returning. At most one
writer may hold this lock at any one time. If a writer cannot obtain the lock,
it uses SQLite's busy-handler to pause and retry for a while:

  <a href=https://www.sqlite.org/c3ref/busy_handler.html>
      https://www.sqlite.org/c3ref/busy_handler.html
  </a>

If there is significant contention for the writer lock, this mechanism can be
inefficient. In this case it is better for the application to use a mutex or
some other mechanism that supports blocking to ensure that at most one writer
is attempting to COMMIT a BEGIN CONCURRENT transaction at a time. This is
usually easier if all writers are part of the same operating system process.

If all database clients (readers and writers) are located in the same OS
process, and if that OS is a Unix variant, then it can be more efficient to
the built-in VFS "unix-excl" instead of the default "unix". This is because it
uses more efficient locking primitives.

The key to maximizing concurrency using BEGIN CONCURRENT is to ensure that
there are a large number of non-conflicting transactions. In SQLite, each
table and each index is stored as a separate b-tree, each of which is
distributed over a discrete set of database pages. This means that:

  * Two transactions that write to different sets of tables never 
    conflict, and that

  * Two transactions that write to the same tables or indexes only 
    conflict if the values of the keys (either primary keys or indexed 
    rows) are fairly close together. For example, given a large 
    table with the schema:

      <pre>     CREATE TABLE t1(a INTEGER PRIMARY KEY, b BLOB);</pre>

    writing two rows with adjacent values for "a" probably will cause a
    conflict (as the two keys are stored on the same page), but writing two
    rows with vastly different values for "a" will not (as the keys will likly
    be stored on different pages).

Note that, in SQLite, if values are not explicitly supplied for an INTEGER
PRIMARY KEY, as for example in:

>
     INSERT INTO t1(b) VALUES(&lt;blob-value>);

then monotonically increasing values are assigned automatically. This is
terrible for concurrency, as it all but ensures that all new rows are 
added to the same database page. In such situations, it is better to
explicitly assign random values to INTEGER PRIMARY KEY fields.

This problem also comes up for non-WITHOUT ROWID tables that do not have an
explicit INTEGER PRIMARY KEY column. In these cases each table has an implicit
INTEGER PRIMARY KEY column that is assigned increasing values, leading to the
same problem as omitting to assign a value to an explicit INTEGER PRIMARY KEY
column.

For both explicit and implicit INTEGER PRIMARY KEYs, it is possible to have
SQLite assign values at random (instead of the monotonically increasing
values) by writing a row with a rowid equal to the largest possible signed
64-bit integer to the table. For example:

     INSERT INTO t1(a) VALUES(9223372036854775807);

Applications should take care not to malfunction due to the presence of such
rows.

The nature of some types of indexes, for example indexes on timestamp fields,
can also cause problems (as concurrent transactions may assign similar
timestamps that will be stored on the same db page to new records). In these
cases the database schema may need to be rethought to increase the concurrency
provided by page-level-locking.

Added doc/wal2.md.





































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

Wal2 Mode Notes
===============

## Activating/Deactivating Wal2 Mode

"Wal2" mode is very similar to "wal" mode. To change a database to wal2 mode,
use the command:

>
     PRAGMA journal_mode = wal2;

It is not possible to change a database directly from "wal" mode to "wal2"
mode. Instead, it must first be changed to rollback mode. So, to change a wal
mode database to wal2 mode, the following two commands may be used:

>
     PRAGMA journal_mode = delete;
     PRAGMA journal_mode = wal2;

A database in wal2 mode may only be accessed by versions of SQLite compiled
from this branch. Attempting to use any other version of SQLite results in an
SQLITE&#95;NOTADB error. A wal2 mode database may be changed back to rollback mode
(making it accessible by all versions of SQLite) using:

>
     PRAGMA journal_mode = delete;

## The Advantage of Wal2 Mode

In legacy wal mode, when a writer writes data to the database, it doesn't
modify the database file directly. Instead, it appends new data to the
"&lt;database>-wal" file. Readers read data from both the original database
file and the "&lt;database>-wal" file. At some point, data is copied from the
"&lt;database>-wal" file into the database file, after which the wal file can
be deleted or overwritten. Copying data from the wal file into the database
file is called a "checkpoint", and may be done explictly (either by "PRAGMA
wal&#95;checkpoint" or sqlite3&#95;wal&#95;checkpoint&#95;v2()), or
automatically (by configuring "PRAGMA wal&#95;autocheckpoint" - this is the
default).

Checkpointers do not block writers, and writers do not block checkpointers.
However, if a writer writes to the database while a checkpoint is ongoing,
then the new data is appended to the end of the wal file. This means that,
even following the checkpoint, the wal file cannot be overwritten or deleted,
and so all subsequent transactions must also be appended to the wal file. The
work of the checkpointer is not wasted - SQLite remembers which parts of the
wal file have already been copied into the db file so that the next checkpoint
does not have to do so again - but it does mean that the wal file may grow
indefinitely if the checkpointer never gets a chance to finish without a
writer appending to the wal file. There are also circumstances in which
long-running readers may prevent a checkpointer from checkpointing the entire
wal file - also causing the wal file to grow indefinitely in a busy system.

Wal2 mode does not have this problem. In wal2 mode, wal files do not grow
indefinitely even if the checkpointer never has a chance to finish
uninterrupted.

In wal2 mode, the system uses two wal files instead of one. The files are named
"&lt;database>-wal" and "&lt;database>-wal2", where "&lt;database>" is of
course the name of the database file. When data is written to the database, the
writer begins by appending the new data to the first wal file. Once the first
wal file has grown large enough, writers switch to appending data to the second
wal file. At this point the first wal file can be checkpointed (after which it
can be overwritten). Then, once the second wal file has grown large enough and
the first wal file has been checkpointed, writers switch back to the first wal
file. And so on.

## Application Programming

From the point of view of the user, the main differences between wal and 
wal2 mode are to do with checkpointing:

  * In wal mode, a checkpoint may be attempted at any time. In wal2 
    mode, the checkpointer has to wait until writers have switched 
    to the "other" wal file before a checkpoint can take place.

  * In wal mode, the wal-hook (callback registered using
    sqlite3&#95;wal&#95;hook()) is invoked after a transaction is committed
    with the total number of pages in the wal file as an argument. In wal2
    mode, the argument is either the total number of uncheckpointed pages in
    both wal files, or - if the "other" wal file is empty or already
    checkpointed - 0.

Clients are recommended to use the same strategies for checkpointing wal2 mode
databases as for wal databases - by registering a wal-hook using
sqlite3&#95;wal&#95;hook() and attempting a checkpoint when the parameter
exceeds a certain threshold.

However, it should be noted that although the wal-hook is invoked after each
transaction is committed to disk and database locks released, it is still
invoked from within the sqlite3&#95;step() call used to execute the "COMMIT"
command. In BEGIN CONCURRENT systems, where the "COMMIT" is often protected by
an application mutex, this may reduce concurrency. In such systems, instead of
executing a checkpoint from within the wal-hook, a thread might defer this
action until after the application mutex has been released.