Documentation Source Text

Check-in [b37cb6bc60]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add a reference to the Jim Gray paper to the faster-than-filesystem article.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: b37cb6bc6053501d116fcdf1022b35510c48b6b61625d784857a3ee0fbad3c57
User & Date: drh 2017-06-17 13:51:31.423
Context
2017-06-26
14:43
Correction to the schema for the DBSTAT virtual table. (check-in: a30aad4d5c user: drh tags: trunk)
2017-06-17
13:51
Add a reference to the Jim Gray paper to the faster-than-filesystem article. (check-in: b37cb6bc60 user: drh tags: trunk)
10:14
Add change log, news, and chronology entries for the 3.18.2 backpatch release. (check-in: d7d183be5b user: drh tags: trunk)
Changes
Side-by-Side Diff Ignore Whitespace Patch
Changes to pages/fasterthanfs.in.
55
56
57
58
59
60
61


























62
63
64
65
66
67
68
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94







+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+







<p>
So let your take-away be this: read/write latency for
SQLite is competitive with read/write latency of individual files on
disk.  Often SQLite is faster.  Sometimes SQLite is almost
as fast.  Either way, this article disproves the common
assumption that a relational database must be slower than direct
filesystem I/O.

<h2>Related Studies</h2>

<p>
[https://www.microsoft.com/en-us/research/people/gray/|Jim Gray]
and others studied the read performance of BLOBs
versus file I/O for Microsoft SQL Server and found that reading BLOBs 
out of the 
database was faster for BLOB sizes less than between 250KiB and 1MiB.
([https://www.microsoft.com/en-us/research/publication/to-blob-or-not-to-blob-large-object-storage-in-a-database-or-a-filesystem/|Paper]).
In that study, the database still stores the filename of the content even
if the content is held in a separate file.  So the database is consulted
for every BLOB, even if it is only to extract the filename.  In this
article, the key for the BLOB is the filename, so no preliminary database
access is required.  Because the database is never used at all when
reading content from individual files in this article, the threshold
at which direct file I/O becomes faster is smaller than it is in Gray's
paper.

<p>
The [Internal Versus External BLOBs] article on this website is an
earlier investigation (circa 2011) that uses the same approach as the
Jim Gray paper &mdash; storing the blob filenames as entries in the
database &mdash but for SQLite instead of SQL Server.



<h1>How These Measurements Are Made</h1>

<p>I/O performance is measured using the
[https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program
from the SQLite source tree.
To compile this test program, first gather the kvtest.c source file