Documentation Source Text

Check-in [4764f16dc0]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos and make improvements to the 35% faster document.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 4764f16dc04e1db9cf675fb8b841e30058d9cfb7bd2777d8cf5684d431972970
User & Date: drh 2017-05-04 20:14:45
Context
2017-05-04
20:23
More tweaks to the 35% faster document. check-in: 3563b95114 user: drh tags: trunk
20:14
Fix typos and make improvements to the 35% faster document. check-in: 4764f16dc0 user: drh tags: trunk
19:44
Add the "35% Faster Than The Filesystem" document. check-in: fd906f77ce user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/fasterthanfs.in.

    20     20   when reading the blobs from individual files.  In other words, the
    21     21   overhead of calling open() and close() is greater than the overhead
    22     22   of using the database.  The size reduction arises from the fact that
    23     23   individual files are padded out to the next multiple of the filesystem
    24     24   block size, whereas the blobs are packed more tightly into an SQLite
    25     25   database.
    26     26   
    27         -<h1>How Performance Was Measuremented</h1>
           27  +<h1>How Performance Was Measured</h1>
    28     28   
    29     29   <p>The performance comparison is accomplished using the
    30     30   [https://www.sqlite.org/src/file/test/kvtest.c|kvtest.c] program
    31     31   found in the SQLite source tree.
    32     32   To compile the test program, first gather the kvtest.c source file
    33     33   into a directory with the [amalgamation|SQLite amalgamation] source
    34     34   files "sqlite3.c" and "sqlite3.h".  Then on unix, run a command like
................................................................................
    35     35   the following:
    36     36   
    37     37   <codeblock>
    38     38   gcc -Os -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c \
    39     39       -o kvtest -ldl -lpthread
    40     40   </codeblock>
    41     41   
           42  +<p>Or on Windows with MSVC:
           43  +
           44  +<codeblock>
           45  +cl -I. -DSQLITE_DIRECT_OVERFLOW_READ kvtest.c sqlite3.c
           46  +</codeblock>
           47  +
    42     48   <p>
    43         -Generate a test database with 100,000 random blobs, each 10,000 bytes in
           49  +Use the resulting "kvtest" program to
           50  +generate a test database with 100,000 random blobs, each 10,000 bytes in
    44     51   size using a command like this:
    45     52   
    46     53   <codeblock>
    47     54   ./kvtest init test1.db --count 100k --size 10k
    48     55   </codeblock>
    49     56   
    50     57   <p>
................................................................................
    52     59   using commands like this:
    53     60   
    54     61   <codeblock>
    55     62   mkdir test1.dir
    56     63   ./kvtest export test1.db test1.dir
    57     64   </codeblock>
    58     65   
           66  +<p>
           67  +At this point, you can measure the amount of disk space used by
           68  +the test1.db database and the space used by the test1.dir directory
           69  +and all of its content.  On a standard Ubuntu Linux desktop, the
           70  +database file will be 1,024,512,000 bytes in size and the test1.dir
           71  +directory will use 1,228,800,000 bytes of space (according to "du -k"),
           72  +about 20% more than the database.
           73  +
    59     74   <p>
    60     75   Measure the performance for reading blobs from the database and from
    61     76   individual files using these commands:
    62     77   
    63     78   <codeblock>
    64     79   ./kvtest run test1.db --count 100k --blob-api
    65     80   ./kvtest run test1.dir --count 100k
................................................................................
    89    104   will be amortized over more bytes transferred using read().  The break-even
    90    105   point, the point where it becomes faster to read directly from the filesystem,
    91    106   will vary from one system to another.
    92    107   
    93    108   <p>The --blob-api option causes database reads to occur using the
    94    109   [sqlite3_blob_open()], [sqlite3_blob_reopen()], and [sqlite3_blob_read()]
    95    110   interfaces instead of using SQL statements.  Without the --blob-api
    96         -option, a separate SQL statement is run for reading each blob and
          111  +option, a separate SQL statement is run to read each blob and
    97    112   the performance of reading from the database is approximately
    98    113   the same as the performance from reading directly from files.
    99         -This is still a significant finding, however, since few people would
   100         -expect an SQL database to run as fast as a direct read from a file
   101         -(much less faster) and yet it does.
          114  +This is still a significant finding, since few people would
          115  +expect a [full-featured SQL] database to run as fast as direct file reads,
          116  +and yet SQLite does.
   102    117   
   103    118   <p>If the --random option is added to the "run" command, that causes the
   104    119   blobs to be read in a random order.  This causes a noticable decrease in
   105    120   the performance of database reads, since it forces more movement of B-Tree
   106    121   cursors.  When --random is used and --blob-api is omitted, reading directly
   107    122   from files on disk is generally a little faster, but reads from the database
   108    123   are still competitive.
   109    124   
   110    125   <h1>Other Considerations</h1>
   111    126   
   112         -<p>Other SQL database engines advise applications to store blobs in separate
          127  +<p>Some other SQL database engines advise developers to store blobs in separate
   113    128   files and then store the filename in the database.  In that case, where
   114    129   the database must first be consulted to find the filename before opening
   115    130   and reading the file, simply storing the entire blob in the database is
   116    131   gives much faster read performance with SQLite.
   117    132   See the [Internal Versus External BLOBs] article for more information.
   118    133   
   119    134   <p>This report only looks at the performance of reads, not writes.