Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Updates to the 35% faster document. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
ac27ab7f6f0995ed596eecbcdd57b40c |
User & Date: | drh 2017-05-05 16:55:30.256 |
Context
2017-05-10
| ||
16:34 | Add the SQLITE_READ authorizer change to the change log for 3.19.0. (check-in: 84e21ebf22 user: drh tags: trunk) | |
2017-05-05
| ||
16:55 | Updates to the 35% faster document. (check-in: ac27ab7f6f user: drh tags: trunk) | |
2017-05-04
| ||
20:23 | More tweaks to the 35% faster document. (check-in: 3563b95114 user: drh tags: trunk) | |
Changes
Changes to pages/fasterthanfs.in.
︙ | ︙ | |||
11 12 13 14 15 16 17 | <p>Furthermore, a single SQLite database holding many thousands of thumbnail-sized blobs uses about 20% less disk space than storing the blobs in individual files. <p>The performance difference arises (we believe) because when reading from an SQLite database, the open() and close() system calls | | | | | | 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | <p>Furthermore, a single SQLite database holding many thousands of thumbnail-sized blobs uses about 20% less disk space than storing the blobs in individual files. <p>The performance difference arises (we believe) because when reading from an SQLite database, the open() and close() system calls are invoked only once, whereas open() and close() are invoked once for each blob when reading the blobs from individual files. It appears that the overhead of calling open() and close() is greater than the overhead of using the database. The size reduction arises from the fact that individual files are padded out to the next multiple of the filesystem block size, whereas the blobs are packed more tightly into an SQLite database. <h1>How These Measurements Are Made</h1> <p>The performance comparison is accomplished using the [https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program found in the SQLite source tree. To compile the test program, first gather the kvtest.c source file into a directory with the [amalgamation|SQLite amalgamation] source files "sqlite3.c" and "sqlite3.h". Then on unix, run a command like |
︙ | ︙ | |||
114 115 116 117 118 119 120 | option, a separate SQL statement is run to read each blob and the performance of reading from the database is approximately the same as the performance from reading directly from files. This is still a significant finding, since few people would expect a [full-featured SQL] database to run as fast as direct file reads, and yet SQLite does. | | | > > > > > > > > | > > > > > > > | | | > > > > > > | 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 | option, a separate SQL statement is run to read each blob and the performance of reading from the database is approximately the same as the performance from reading directly from files. This is still a significant finding, since few people would expect a [full-featured SQL] database to run as fast as direct file reads, and yet SQLite does. <p>The --random option on the "run" command causes the blobs to be read in a random order. This causes the performance of database reads to decrease. The reason is that the blobs are tightly packed in the database, rather than being padded out to the next block size as when they are stored in the filesystem. Some pages contain parts of adjacent blobs. When the blobs are read sequentially, those pages are only read into memory once and cached and then used to reconstruct adjacent blobs, but when blobs are read in a random order, those pages that share parts of two or more blobs tend to be read multiple times, leading to decreased performance. <p>The "--mmap SIZE" option on the "run" command causes the database file to be accessed using mmap() instead of via read(). The SIZE argument is the size of the memory mapped region, and should be the size of the database file for maximum performance. Using "--mmap 1G" causes the database reads to be almost twice as fast as disk reads even when the --random option is used. <p>When --random is used and both --blob-api and --mmap are omitted, reading directly from files on disk is generally a little faster, but reads from the database are still competitive. <h1>Other Considerations</h1> <p>Some other SQL database engines advise developers to store blobs in separate files and then store the filename in the database. In that case, where the database must first be consulted to find the filename before opening and reading the file, simply storing the entire blob in the database is gives much faster read performance with SQLite. See the [Internal Versus External BLOBs] article for more information. <p>This report only looks at the performance of reads, not writes. Because SQLite implements [atomic commit|power-safe ACID transactions] we expect that write performance into SQLite will be slower than writing directly to individual files. However, if ACID transactions are disabled via [PRAGMA journal_mode|PRAGMA journal_mode=OFF] (thus putting SQLite on equal footing with the filesystem) and the [sqlite3_blob_write()] interface is used, SQLite might well be competitive or even faster than writes to separate files on disk. That is an experiment we have not yet run. <p>Remember that the relative performance of database reads and reads from the filesystem will depend on both the hardware and the operating system. Please try the tests above on your own system. If you encounter cases there database reads do not perform favorably in comparison to filesystem reads, please report your findings in the [mailing lists|SQLite mailing list]. |