Documentation Source Text

Check-in [4764f16dc0]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos and make improvements to the 35% faster document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 4764f16dc04e1db9cf675fb8b841e30058d9cfb7bd2777d8cf5684d431972970
User & Date: drh 2017-05-04 20:14:45.970
Context
2017-05-04
20:23
More tweaks to the 35% faster document. (check-in: 3563b95114 user: drh tags: trunk)
20:14
Fix typos and make improvements to the 35% faster document. (check-in: 4764f16dc0 user: drh tags: trunk)
19:44
Add the "35% Faster Than The Filesystem" document. (check-in: fd906f77ce user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fasterthanfs.in.
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41






42

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58








59
60
61
62
63
64
65
when reading the blobs from individual files.  In other words, the
overhead of calling open() and close() is greater than the overhead
of using the database.  The size reduction arises from the fact that
individual files are padded out to the next multiple of the filesystem
block size, whereas the blobs are packed more tightly into an SQLite
database.

<h1>How Performance Was Measuremented</h1>

<p>The performance comparison is accomplished using the
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program
found in the SQLite source tree.
To compile the test program, first gather the kvtest.c source file
into a directory with the [amalgamation|SQLite amalgamation] source
files "sqlite3.c" and "sqlite3.h".  Then on unix, run a command like
the following:

<codeblock>
gcc -Os -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c \
    -o kvtest -ldl -lpthread
</codeblock>







<p>

Generate a test database with 100,000 random blobs, each 10,000 bytes in
size using a command like this:

<codeblock>
./kvtest init test1.db --count 100k --size 10k
</codeblock>

<p>
Next, make copies of all the blobs into individual files in a directory
using commands like this:

<codeblock>
mkdir test1.dir
./kvtest export test1.db test1.dir
</codeblock>









<p>
Measure the performance for reading blobs from the database and from
individual files using these commands:

<codeblock>
./kvtest run test1.db --count 100k --blob-api
./kvtest run test1.dir --count 100k







|














>
>
>
>
>
>

>
|















>
>
>
>
>
>
>
>







20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
when reading the blobs from individual files.  In other words, the
overhead of calling open() and close() is greater than the overhead
of using the database.  The size reduction arises from the fact that
individual files are padded out to the next multiple of the filesystem
block size, whereas the blobs are packed more tightly into an SQLite
database.

<h1>How Performance Was Measured</h1>

<p>The performance comparison is accomplished using the
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program
found in the SQLite source tree.
To compile the test program, first gather the kvtest.c source file
into a directory with the [amalgamation|SQLite amalgamation] source
files "sqlite3.c" and "sqlite3.h".  Then on unix, run a command like
the following:

<codeblock>
gcc -Os -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c \
    -o kvtest -ldl -lpthread
</codeblock>

<p>Or on Windows with MSVC:

<codeblock>
cl -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c
</codeblock>

<p>
Use the resulting "kvtest" program to
generate a test database with 100,000 random blobs, each 10,000 bytes in
size using a command like this:

<codeblock>
./kvtest init test1.db --count 100k --size 10k
</codeblock>

<p>
Next, make copies of all the blobs into individual files in a directory
using commands like this:

<codeblock>
mkdir test1.dir
./kvtest export test1.db test1.dir
</codeblock>

<p>
At this point, you can measure the amount of disk space used by
the test1.db database and the space used by the test1.dir directory
and all of its content.  On a standard Ubuntu Linux desktop, the
database file will be 1,024,512,000 bytes in size and the test1.dir
directory will use 1,228,800,000 bytes of space (according to "du -k"),
about 20% more than the database.

<p>
Measure the performance for reading blobs from the database and from
individual files using these commands:

<codeblock>
./kvtest run test1.db --count 100k --blob-api
./kvtest run test1.dir --count 100k
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
will be amortized over more bytes transferred using read().  The break-even
point, the point where it becomes faster to read directly from the filesystem,
will vary from one system to another.

<p>The --blob-api option causes database reads to occur using the
[sqlite3_blob_open()], [sqlite3_blob_reopen()], and [sqlite3_blob_read()]
interfaces instead of using SQL statements.  Without the --blob-api
option, a separate SQL statement is run for reading each blob and
the performance of reading from the database is approximately
the same as the performance from reading directly from files.
This is still a significant finding, however, since few people would
expect an SQL database to run as fast as a direct read from a file
(much less faster) and yet it does.

<p>If the --random option is added to the "run" command, that causes the
blobs to be read in a random order.  This causes a noticable decrease in
the performance of database reads, since it forces more movement of B-Tree
cursors.  When --random is used and --blob-api is omitted, reading directly
from files on disk is generally a little faster, but reads from the database
are still competitive.

<h1>Other Considerations</h1>

<p>Other SQL database engines advise applications to store blobs in separate
files and then store the filename in the database.  In that case, where
the database must first be consulted to find the filename before opening
and reading the file, simply storing the entire blob in the database is
gives much faster read performance with SQLite.
See the [Internal Versus External BLOBs] article for more information.

<p>This report only looks at the performance of reads, not writes.







|


|
|
|










|







104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
will be amortized over more bytes transferred using read().  The break-even
point, the point where it becomes faster to read directly from the filesystem,
will vary from one system to another.

<p>The --blob-api option causes database reads to occur using the
[sqlite3_blob_open()], [sqlite3_blob_reopen()], and [sqlite3_blob_read()]
interfaces instead of using SQL statements.  Without the --blob-api
option, a separate SQL statement is run to read each blob and
the performance of reading from the database is approximately
the same as the performance from reading directly from files.
This is still a significant finding, since few people would
expect a [full-featured SQL] database to run as fast as direct file reads,
and yet SQLite does.

<p>If the --random option is added to the "run" command, that causes the
blobs to be read in a random order.  This causes a noticable decrease in
the performance of database reads, since it forces more movement of B-Tree
cursors.  When --random is used and --blob-api is omitted, reading directly
from files on disk is generally a little faster, but reads from the database
are still competitive.

<h1>Other Considerations</h1>

<p>Some other SQL database engines advise developers to store blobs in separate
files and then store the filename in the database.  In that case, where
the database must first be consulted to find the filename before opening
and reading the file, simply storing the entire blob in the database is
gives much faster read performance with SQLite.
See the [Internal Versus External BLOBs] article for more information.

<p>This report only looks at the performance of reads, not writes.