Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add documentation file ./README-server-edition.html. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | server-process-edition |
Files: | files | file ages | folders |
SHA3-256: |
fbc4f4ad259cfe04b28d698ef025f179 |
User & Date: | dan 2017-07-31 19:55:56.996 |
Context
2017-08-07
| ||
14:06 | Fix a bug on this branch involving mmap mode and readonly transactions. (check-in: 355c594e82 user: dan tags: server-process-edition) | |
2017-07-31
| ||
19:55 | Add documentation file ./README-server-edition.html. (check-in: fbc4f4ad25 user: dan tags: server-process-edition) | |
2017-07-29
| ||
17:10 | Merge latest trunk changes with this branch. (check-in: b42c87790e user: dan tags: server-process-edition) | |
Changes
Added README-server-edition.html.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 | <html> <center> <h1> The "server-process-edition" Branch</h1> </center> <p> The "server-process-edition" branch contains two modifications to stock SQLite that work together to provide concurrent read/write transactions using page-level-locking provided that: <ul> <li><p> All clients are in the same process, and <li><p> The application uses "PRAGMA synchronous=off". </ul> <p> Up to 16 simultaneous read/write transactions controlled by page-level-locking are possible. Additionally, there may be any number of read-only transactions started using "BEGIN READONLY" commands. Read-only transactions do not block read-write transactions, and read-write transactions do not block read-only transactions. <p> The two features on this branch are: <ol> <li><p> An alternative layout for the database free-page list. This is intended to reduce contention between writers when allocating new database pages, either from the free-list or by extending the database file. <li><p> The "server-mode" extension, which provides the read/write page-level-locking and read-only MVCC concurrency mentioned above. </ol> <h2> Alternative Free-List Format </h2> <p> The alternative free-list format is very similar to the current format. It differs in the following respects: <ul> <li><p>The "total number of free pages" field in the db header is not maintained. It is always set to zero. <li><p> Instead of pointing to the first free-list trunk page, the free-list pointer in the db header points to a page known as the "free-list node". <li><p> The free-list node page contains N pointers to free-lists stored in the legacy format (i.e. a linked list of trunk pages each containing pointers to many free leaf pages). </ul> <p> This effectively means that a database has N free-lists instead of just one. To allocate a free page, a writer only needs to lock one such free-list, and so up to N transactions that allocate new pages may proceed concurrently. <p> Allocating pages from the end of the db file still locks out all other read/write transactions (because it involves writing to page 1, which every transaction needs to read). To minimize the frequency with which this occurs, when a page must be allocated from the end of the database file, the file is extended by 2048 pages. These are distributed equally between 16 free-lists (children of the free-list node page). Additionally, the first trunk page in each free list is never reused. Doing so would require writing to the free-list node page - effectively an exclusive lock on the entire page-allocation system. <p> The current format used for the free-list can be modified or queried using a new pragma: <pre> PRAGMA [database.]freelist_format; PRAGMA [database.]freelist_format = 1|2; </pre> <p> At present, the free-list format may only be modified when the free-list is completely empty. Which, as the implementation ensures that a free-list that uses the alternative format is never completely emptied, effectively precludes changing the format from 2 (alternative) to 1 (legacy). <h2> Page level locking - "Server Mode" </h2> <p> A database client automatically enters "server mode" if (a) it is using a VFS that takes a process-wide exclusive lock on the db file (like "unix-excl" does), and (b) there exists a directory named "<database>-journal" in the file system alongside the database file "<database>" There is currently no provision for creating this directory, although it could be safely done for a database in rollback mode using something like: <pre> PRAGMA journal_mode = off; BEGIN EXCLUSIVE; <create directory> END; </pre> <p> To check the status of these two conditions, a new file-control is added - SQLITE_FCNTL_SERVER_MODE. SQLite invokes this file-control as part of the procedure for detecting a hot journal (after it has established that there is a file-system entry named <database>-journal and that no other process holds a RESERVED lock). If the VFS does support an exclusive process-wide lock and if the directory is present, the VFS indicates that the client should enter server mode. If the VFS does not indicate this, or if it returns SQLITE_NOTFOUND, then SQLite proceeds with the hot-journal rollback. <p> There is also a new file-control named SQLITE_FCNTL_FILEID, which requests a 128-bit value that uniquely identifies an open file on disk from the VFS. This is used to ensure that all connections to the same database from within a process use the same shared state, even if they connect to the db using different file-system paths. <p> The heap-memory data structure shared between all connections to the same database is protected by a mutex. Clients take and release the mutex each time a transaction is opened or closed, and each time a read or write lock is taken on a specific database page. Ordinary read/write transactions lock each page that they access - each page can support any number of concurrent read locks or a single write lock. <p> Write transactions use a journal file stored in the <database>-journal directory. Journal files are named "<id>-journal", where <id> is an integer value betwen 0 and 15, inclusive. A client may use multiple different journal files throughout its lifetime. <p>Before database pages are overwritten in server-mode, entries are added to an in-memory hash table containing the old page content. These entries are used by read-only transactions to ensure that they access a consistent snapshot of the database. Hash table entries are automatically removed when they are no longer required. <p> It is not difficult to extend the kind of page level locking used by read/write transactions to clients in multiple processes. It might be more difficult to extend the read-only MVCC capability though. <h2> Performance Test </h2> <p> The test uses a single table with the following schema: <pre> CREATE TABLE t1(a INTEGER PRIMARY KEY, b BLOB(16), c BLOB(16), d BLOB(400)); CREATE INDEX i1 ON t1(b); CREATE INDEX i2 ON t1(c); </pre> <p> The database initially contains 5,000,000 rows. Values for column "a" are between 1 and 5,000,000, inclusive. Other columns are populated with randomly generated blob values, each 16, 16, and 400 bytes in size, respectively. <p> Read/write transactions used by the test take the following form. Each such transaction modifies approximately 25 pages (5 in the main table and 10 in each index), not accounting for tree rebalancing operations. <pre> BEGIN; REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400)); REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400)); REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400)); REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400)); REPLACE INTO t1 VALUES(abs(random() % 5000000), randomblob(16), randomblob(16), randomblob(400)); COMMIT; </pre> <p> Read-only transactions are as follows: <pre> BEGIN READONLY; SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10; SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10; SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10; SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10; SELECT * FROM t1 WHERE a>abs((random()%5000000)) LIMIT 10; END; </pre> <p> The performance test features one or more clients executing read/write transactions as fast as possible, and zero or more clients executing read-only transactions, also as fast as possible. All tests use the "unix-excl" VFS and all clients execute in a separate thread within the same process. The database is located on a tmpfs file-system. <p> In the table below "rw:" refers to the number of read-write clients, and "ro:" the number of read-only clients used by the test. The TPS values in brackets are the number of read-only transactions per second. All other values are read-write transactions per second. <p> The collision rate (percentage of attempted transactions that failed due to a page-level locking conflict) in all tests was between 1 and 2%. Failed transactions are not included in the TPS counts below. <p> <table border=1 width=90% align=center> <!-- 320 jm=persist, rw:1 139675, 135797 320 jm=wal, rw:1 120995, 118889 begin-concurrent, rw:1 119438, 117580 begin-concurrent, rw:2 166923, 166904 begin-concurrent, rw:3 180432, 172825 begin-concurrent, rw:2,ro:1 150427(347319),152087(351601) server-mode, rw:1 126592, 126742 server-mode, rw:2 228317, 227155 server-mode, rw:3 309712, 306218 server-mode, rw:2,ro:1 213303(576032),210994(556005) --> <tr><th> Configuration <th>TPS per client <th> TPS total <tr><td> 3.20.0, journal_mode=persist, rw:1, ro:0 <td>6886 <td> 6886 <tr><td> 3.20.0, journal_mode=wal, rw:1, ro:0 <td>5997 <td> 5997 <tr><td> begin-concurrent, rw:1, ro:0<td>5925<td> 5925 <tr><td> begin-concurrent, rw:2, ro:0<td>4172 <td> 8345 <tr><td> begin-concurrent, rw:3, ro:0<td>2943 <td> 8831 <tr><td> begin-concurrent, rw:2, ro:1<td>3781 <td> 7562 (17473) <tr><td> server-mode, rw:1, ro:0 <td>6333 <td> 6333 <tr><td> server-mode, rw:2, ro:0<td> 5693 <td> 11386 <tr><td> server-mode, rw:3, ro:0<td>5132 <td> 15398 <tr><td> server-mode, rw:2, ro:1<td>5303 <td> 10607 (28300) </table> |