Documentation Source Text

Check-in [e5a09f2ea2]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Refinements to the faster-than-filesystem article based on peer review.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: e5a09f2ea27eae5e572c0bbe752ff77a9144944d4c199e43ab77639504782c7c
User & Date: drh 2017-06-08 19:31:53.189
Context
2017-06-09
12:23
Clarify the rules for referencing tables in triggers. (check-in: e0f55595d4 user: drh tags: trunk)
2017-06-08
19:31
Refinements to the faster-than-filesystem article based on peer review. (check-in: e5a09f2ea2 user: drh tags: trunk)
14:34
Version 3.19.3 (check-in: 9a562cf207 user: drh tags: trunk, release, version-3.19.3)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fasterthanfs.in.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33







34

35
36
37
38
39
40
41
42
















43
44
45
46
47
48
49
<title>35% Faster Than The Filesystem</title>
<tcl>hd_keywords {faster than the filesystem} \
   {35% Faster Than The Filesystem}</tcl>

<table_of_contents>

<h1>Summary</h1>

<center><b><font color="red">
This report is under development.  Check back 
in a few days for updates.
</font></b></center>

<p>Small blobs (for example, thumbnail images)
can be read from and written to an SQLite database about 
<a href="#approx">35% faster&sup1;</a>
than they can be read from or written to individual files on disk.


<p>Furthermore, a single SQLite database holding
10-kilobyte blobs uses about 20% less disk space than
storing the blobs in individual files.

<p>The performance difference arises (we believe) because when
working from an SQLite database, the open() and close() system calls
are invoked only once, whereas
open() and close() are invoked once for each blob
when using blobs stored in individual files.  It appears that the
overhead of calling open() and close() is greater than the overhead
of using the database.  The size reduction arises from the fact that
individual files are padded out to the next multiple of the filesystem
block size, whereas the blobs are packed more tightly into an SQLite
database.








<a name="approx"><p>

&sup1;The 35% figure above is approximate.  Actual timings vary
depending on hardware, operating system, and the
details of the experiment, and due to random performance fluctuations
on real-world hardware.  See the text below for more detail.
Try the experiments yourself.  Report significant deviations to
the [mailing lists].
</p></a>


















<h1>How These Measurements Are Made</h1>

<p>I/O performance is measured using the
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program
from the SQLite source tree.
To compile this test program, first gather the kvtest.c source file








<
<
<
<
<
|
<
|
|
>
















>
>
>
>
>
>
>
|
>






|

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







1
2
3
4
5
6
7
8





9

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
<title>35% Faster Than The Filesystem</title>
<tcl>hd_keywords {faster than the filesystem} \
   {35% Faster Than The Filesystem}</tcl>

<table_of_contents>

<h1>Summary</h1>






<p>SQLite reads small blobs (for example, thumbnail images)

<a href="#approx">35% faster&sup1;</a> than the same blobs
can be read from or written to individual files on disk using
fread() or fwrite().

<p>Furthermore, a single SQLite database holding
10-kilobyte blobs uses about 20% less disk space than
storing the blobs in individual files.

<p>The performance difference arises (we believe) because when
working from an SQLite database, the open() and close() system calls
are invoked only once, whereas
open() and close() are invoked once for each blob
when using blobs stored in individual files.  It appears that the
overhead of calling open() and close() is greater than the overhead
of using the database.  The size reduction arises from the fact that
individual files are padded out to the next multiple of the filesystem
block size, whereas the blobs are packed more tightly into an SQLite
database.

<p>
The measurements in this article were made during the week of 2017-06-05
using a version of SQLite in between 3.19.2 and 3.20.0.  You may expect
future versions of SQLite to perform even better.

<h2>Caveats</h2>

<a name="approx"></a>
<p>
&sup1;The 35% figure above is approximate.  Actual timings vary
depending on hardware, operating system, and the
details of the experiment, and due to random performance fluctuations
on real-world hardware.  See the text below for more detail.
Try the experiments yourself.  Report significant deviations to
the [mailing lists].
</p>

<p>
The 35% figure is based on running tests on every machine
that the author has easily at hand.
Some reviewers of this article report that SQLite has higher 
latency than direct I/O on their systems.  We do not yet understand
the difference.  We also see indications that SQLite does not
perform as well as direct I/O when experiments are run using
a cold filesystem cache.

<p>
So let your take-away be this: read/write latency for
SQLite is competitive with read/write latency of individual files on
disk.  Often SQLite is faster.  Sometimes SQLite is almost
as fast.  Either way, this article disproves the common
assumption that a relational database must be slower than direct
filesystem I/O.

<h1>How These Measurements Are Made</h1>

<p>I/O performance is measured using the
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program
from the SQLite source tree.
To compile this test program, first gather the kvtest.c source file
58
59
60
61
62
63
64



65
66
67

68
69
70
71
72
73
74

<p>Or on Windows with MSVC:

<codeblock>
cl -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c
</codeblock>




<p>
Use the resulting "kvtest" program to
generate a test database with 100,000 random blobs, each with a random

size between 8,000 and 12,000 bytes
using a command like this:

<codeblock>
./kvtest init test1.db --count 100k --size 10k --variance 2k
</codeblock>








>
>
>


|
>







77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97

<p>Or on Windows with MSVC:

<codeblock>
cl -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c
</codeblock>

<p>Instructions for compiling for Android
are <a href="#compile-android">shown below</a>.

<p>
Use the resulting "kvtest" program to
generate a test database with 100,000 random uncompressible
blobs, each with a random
size between 8,000 and 12,000 bytes
using a command like this:

<codeblock>
./kvtest init test1.db --count 100k --size 10k --variance 2k
</codeblock>

165
166
167
168
169
170
171









172
173
174
175
176
177
178
179
180
181
182
183

184











185
186
187
188
189
190
191




192
193
194
195
196

197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226

227
228
229
230
231
232
233
234
235
236
237
238
239
240
241

242
243
244
245
246
247
248
249
[power-safe transactions] whereas the direct-to-disk writing is not.
To put the tests on a more equal footing, add either the --nosync
option to the SQLite writes to disable calling fsync() or
FlushFileBuffers() to force content to disk, or using the --fsync option
for the direct-to-disk tests to force them to invoke fsync() or
FlushFileBuffers() when updating disk files.










<p>
There are many other testing options, which can be seen by running
the command:

<codeblock>
./kvtest help
</codeblock>

<h2>Read Performance Measurements</h2>

<p>The chart below shows data collected using 
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] on five different

systems:  An old Dell laptop running Windows7, a new Lenovo laptop running











Windows10, a Mac-Pro, an Ubuntu desktop machine, and an older Android
phone (a Galaxy S3). All machines use SSD except the Dell which has a
hard-drive. The test database is 100K BLOBs uniformly
distributed between 8K and 12K in size, for a total of about 1 gigabyte
of content.  The database page size
is 4KiB.  The -DSQLITE_DIRECT_OVERFLOW_READ compile-time option was
used for all of these tests.




The chart shows average time to read a BLOB from the SQLite database,
in a random order, relative to the time needed to read the same BLOB
from a file in the filesystem.  Smaller is better.  The actual timings
vary considerably from one system to another (the Ubuntu desktop is much
faster than the Galaxy S3 phone).  This chart shows the ratio of the

times needed to read blobs from SQLite divided by the times needed to
read the same blob directly from disk.  The left-most column in the chart
is the time needed to read from disk divided by itself, and thus always
has a value of 1.00.  That column is included for visual reference only.

<p>
In this chart, an SQL statement ("SELECT v FROM kv WHERE k=?1") 
is prepared once.  Then for each blob, the blob key value is bound 
to the ?1 parameter and the statement is evaluated to extract the
blob content.

<center>
<div class='imgcontainer'>
<img src="images/faster-read-sql.jpg">
</div>
<br>
Chart 1:  100K BLOBs, avg size 10KB, random order
using SQL
</center>

<p>
The performance can be improved slightly by bypassing the SQL layer
and reading the blob content directly using the
[sqlite3_blob_read()] interface, as shown in the next chart:

<center>
<div class='imgcontainer'>
<img src="images/faster-read-blobapi.jpg">
</div>
<br>

Chart 2:  100K BLOBs, avg size 10KB, random order<br>
using sqlite3_blob_read()
</center>

<p>
Futher performance improves can be made by using the
[memory-mapped I/O] feature of SQLite.  In the next chart, the
entire 1GB database file is memory mapped and blobs are read
(in random order) using the [sqlite3_blob_read()] interface:

<center>
<div class='imgcontainer'>
<img src="images/faster-read-mmap.jpg">
</div>
<br>

Chart 3:  100K BLOBs, avg size 10KB, random order<br>
using sqlite3_blob_read() from a memory-mapped database.
</center>

<p>
The third chart shows that reading blob content out of SQLite can be
twice as fast as reading from individual files on disk for Mac and
Android, and an amazing ten times faster for Windows.







>
>
>
>
>
>
>
>
>












>
|
>
>
>
>
>
>
>
>
>
>
>
|
|
|
|



>
>
>
>
|
|


|
>
















|
|












>
|
|



|









>
|







188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
[power-safe transactions] whereas the direct-to-disk writing is not.
To put the tests on a more equal footing, add either the --nosync
option to the SQLite writes to disable calling fsync() or
FlushFileBuffers() to force content to disk, or using the --fsync option
for the direct-to-disk tests to force them to invoke fsync() or
FlushFileBuffers() when updating disk files.

<p>
By default, kvtest runs the database I/O measurements all within
a single transaction.  Use the --multitrans option to run each blob
read or write in a separate transaction.  The --multitrans option makes
SQLite much slower, and uncompetitive with direct disk I/O.  This
option proves, yet again, that to get the most performance out of
SQLite, you should group as much database interaction as possible within
a single transaction.

<p>
There are many other testing options, which can be seen by running
the command:

<codeblock>
./kvtest help
</codeblock>

<h2>Read Performance Measurements</h2>

<p>The chart below shows data collected using 
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] on five different
systems:

<ul>
<li><b>Win7</b>: An circa-2009 Dell Inspiron laptop, Pentium dual-core
    at 2.30GHz, 4GiB RAM, Windows7.
<li><b>Win10</b>: A 2016 Lenovo YOGA 910, Intel i7-7500 at 2.70GHz,
    16GiB RAM, Windows10.
<li><b>Mac</b>: A 2015 MacBook Pro, 3.1GHz intel Core i7, 16GiB RAM,
    MacOS 10.12.5
<li><b>Ubuntu</b>: Desktop built from Intel i7-4770K at 3.50GHz, 32GiB RAM,
    Ubuntu 16.04.2 LTS
<li><b>Android</b>: Galaxy S3, ARMv7, 2GiB RAM
</ul>

<p>All machines use SSD except Win7 which has a
hard-drive. The test database is 100K blobs with sizes uniformly
distributed between 8K and 12K, for a total of about 1 gigabyte
of content.  The database page size
is 4KiB.  The -DSQLITE_DIRECT_OVERFLOW_READ compile-time option was
used for all of these tests.
Tests were run multiple times.
The first run was used to warm up the cache and its timings were discarded.

<p>
The chart below shows average time to read a blob from the SQLite database,
relative to the time needed to read the same blob
from a file in the filesystem.  Smaller is better.  The actual timings
vary considerably from one system to another (the Ubuntu desktop is much
faster than the Galaxy S3 phone, for example).  
This chart shows the ratio of the
times needed to read blobs from SQLite divided by the times needed to
read the same blob directly from disk.  The left-most column in the chart
is the time needed to read from disk divided by itself, and thus always
has a value of 1.00.  That column is included for visual reference only.

<p>
In this chart, an SQL statement ("SELECT v FROM kv WHERE k=?1") 
is prepared once.  Then for each blob, the blob key value is bound 
to the ?1 parameter and the statement is evaluated to extract the
blob content.

<center>
<div class='imgcontainer'>
<img src="images/faster-read-sql.jpg">
</div>
<br>
Chart 1:  SQLite read latency relative to direct filesystem reads.<br>
100K blobs, avg 10KB each, random order using SQL
</center>

<p>
The performance can be improved slightly by bypassing the SQL layer
and reading the blob content directly using the
[sqlite3_blob_read()] interface, as shown in the next chart:

<center>
<div class='imgcontainer'>
<img src="images/faster-read-blobapi.jpg">
</div>
<br>
Chart 2:  SQLite read latency relative to direct filesystem reads.<br>
100K blobs, avg size 10KB, random order<br>
using sqlite3_blob_read().
</center>

<p>
Further performance improves can be made by using the
[memory-mapped I/O] feature of SQLite.  In the next chart, the
entire 1GB database file is memory mapped and blobs are read
(in random order) using the [sqlite3_blob_read()] interface:

<center>
<div class='imgcontainer'>
<img src="images/faster-read-mmap.jpg">
</div>
<br>
Chart 3:  SQLite read latency relative to direct filesystem reads.<br>
100K blobs, avg size 10KB, random order<br>
using sqlite3_blob_read() from a memory-mapped database.
</center>

<p>
The third chart shows that reading blob content out of SQLite can be
twice as fast as reading from individual files on disk for Mac and
Android, and an amazing ten times faster for Windows.
285
286
287
288
289
290
291

292
293
294
295
296
297
298
299
300
301
setting is NORMAL instead of FULL, the transactions are not durable.

<center>
<div class='imgcontainer'>
<img src="images/faster-write-safe.jpg">
</div>
<br>

Chart 4:  10K BLOBs, avg size 10KB, random order<br>
in WAL mode with synchronous NORMAL,<br>
exclusive of checkpoint time.
</center>

<p>
The android performance numbers for the write experiments are omitted
because the performance tests on the Galaxy S3 are so random.  Two
consecutive runs of the exact same experiment would give wildly different
times.  And, to be fair, the performance of SQLite on android is slightly







>
|
|
|







336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
setting is NORMAL instead of FULL, the transactions are not durable.

<center>
<div class='imgcontainer'>
<img src="images/faster-write-safe.jpg">
</div>
<br>
Chart 4:  SQLite write latency relative to direct filesystem writes.<br>
10K blobs, avg size 10KB, random order,<br>
WAL mode with synchronous NORMAL,<br>
exclusive of checkpoint time
</center>

<p>
The android performance numbers for the write experiments are omitted
because the performance tests on the Galaxy S3 are so random.  Two
consecutive runs of the exact same experiment would give wildly different
times.  And, to be fair, the performance of SQLite on android is slightly
309
310
311
312
313
314
315

316
317
318
319
320
321
322
323
324
data prone to corruption due to system crashes and power failures.

<center>
<div class='imgcontainer'>
<img src="images/faster-write-unsafe.jpg">
</div>
<br>

Chart 5:  10K BLOBs, avg size 10KB, random order<br>
in journaling disabled, synchronous OFF.
</center>

<p>
In all of the write tests, it is important to disable anti-virus software
prior to running the direct-to-disk performance tests.  We found that
anti-virus software slows down direct-to-disk by an order of magnitude
whereas it impacts SQLite writes very little.  This is probably due to the







>
|
|







361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
data prone to corruption due to system crashes and power failures.

<center>
<div class='imgcontainer'>
<img src="images/faster-write-unsafe.jpg">
</div>
<br>
Chart 5:  SQLite write latency relative to direct filesystem writes.<br>
10K blobs, avg size 10KB, random order,<br>
journaling disabled, synchronous OFF.
</center>

<p>
In all of the write tests, it is important to disable anti-virus software
prior to running the direct-to-disk performance tests.  We found that
anti-virus software slows down direct-to-disk by an order of magnitude
whereas it impacts SQLite writes very little.  This is probably due to the
350
351
352
353
354
355
356






357
358
359
360
361
362
363
364
365
366
367
368
369
370
371








































<ol type="A">
<li>
<p>SQLite is competitive with, and usually faster than, blobs stored in
separate files on disk, for both reading and writing.

<li>






<p>Reading is about an order of magnitude faster than writing, for both
SQLite and direct-to-disk I/O.

<li>
<p>I/O performance varies widely depending on operating system and hardware.
Make your own measurements before drawing conclusions.

<li>
<p>Some other SQL database engines advise developers to store blobs in separate
files and then store the filename in the database.  In that case, where
the database must first be consulted to find the filename before opening
and reading the file, simply storing the entire blob in the database
gives much faster read and write performance with SQLite.
See the [Internal Versus External BLOBs] article for more information.
</ol>














































>
>
>
>
>
>
|
|













>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469

<ol type="A">
<li>
<p>SQLite is competitive with, and usually faster than, blobs stored in
separate files on disk, for both reading and writing.

<li>
<p>SQLite is much faster than direct writes to disk on Windows
when anti-virus protection is turned on.  Since anti-virus software
is and should be on by default in Windows, that means that SQLite
is generally much faster than direct disk writes on Windows.

<li>
<p>Reading is about an order of magnitude faster than writing, for all
systems and for both SQLite and direct-to-disk I/O.

<li>
<p>I/O performance varies widely depending on operating system and hardware.
Make your own measurements before drawing conclusions.

<li>
<p>Some other SQL database engines advise developers to store blobs in separate
files and then store the filename in the database.  In that case, where
the database must first be consulted to find the filename before opening
and reading the file, simply storing the entire blob in the database
gives much faster read and write performance with SQLite.
See the [Internal Versus External BLOBs] article for more information.
</ol>


<h1>Additional Notes</h1>

<a name="compile-android"></a>
<h2>Compiling And Testing on Android</h2>

<p>
The kvtest program is compiled and run on Android as follows.
First install the Android SDK and NDK.  Then prepare a script
named "android-gcc" that looks approximately like this:

<codeblock>
#!/bin/sh
#
NDK=/home/drh/Android/Sdk/ndk-bundle
SYSROOT=$NDK/platforms/android-16/arch-arm
ABIN=$NDK/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin
GCC=$ABIN/arm-linux-androideabi-gcc
$GCC --sysroot=$SYSROOT -fPIC -pie $*
</codeblock>

<p>Make that script executable and put it on your $PATH.  Then
compile the kvtest program as follows:

<codeblock>
android-gcc -Os -I. kvtest.c sqlite3.c -o kvtest-android
</codeblock>

<p>Next, move the resulting kvtest-android executable to the Android
device:

<codeblock>
adb push kvtest-android /data/local/tmp
</codeblock>

<p>Finally use "adb shell" to get a shell prompt on the Android device,
cd into the /data/local/tmp directory, and begin running the tests
as with any other unix host.