Documentation Source Text

Artifact [1d85a76adf]

Artifact 1d85a76adf3be84dfb6ad7f578ba0bc79ed7c635:

<title>SQLite As An Application File Format</title>

<h2>What Is An Application File Format?</h2>

An "application file format" is the schema and/or low-level byte 
format used to persist appliation state to disk or to exchange
information between programs.
Some examples of application file formats include:

<li>"*.doc" - Word Perfect and Microsoft Office documents
<li>"*.dwg" - AutoCAD drawings
<li>"*.pdf" - Portable Document Format from Adobe
<li>"*.xls" - Microsoft Excel Spreadsheet
<li>".git" - Git source code repository
<li>"*.epub" - The Eletronic Publication format used by non-Kindle eBooks
<li>"*.odt" - The Open Document format used by OpenOffice and others
<li>"*.ppt" - Microsoft PowerPoint presentations
<li>"*.odp" - The Open Document presentation format used by OpenOffice and others

<p>The application file formats listed above can be divided into
several categories:

<li><p><b>Fully Custom Formats.</b>
Custom formats are specifically designed for a single application.
DOC, DWG, PDF, XLS, and PPT are examples of custom formats.  Custom
formats are usually contained within a single file, for ease of transport.
They are also usually binary, though the DWG format is a notable exception.
Custom file formats require specialized application code
to read and write and are not normally accessible from commonly
available tools such as unix command-line programs.  Generally speaking,
to access the content of a custom application file format, you have to
have a tool that is specifically engineered to read and/or write that

<li><p><b>Pile-of-Files Formats.</b>
Sometimes the application state is stored in a directory hierarchy of
files.  Git is a prime example of this, though the phenomenon occurs
frequently one-off and bespoke applications.  A pile-of-files format
essentially uses the filesystem as a key/value database and small
chunks of information into separate files.  This gives the
advantage of making the content more accessible to common utility
programs.  But even if many of the files in a pile-of-files format
are easily readable, there are usually some files that have their
own custom format, such as Git "Packfiles", and are hence not readable
or writable without specialized tools.  It is much less convenient
to move a pile-of-files from one place or machine to another, than
it is to move a single file.  It is hard to make a pile-of-files
document into an email attachment, for example.  Finally, a pile-of-files
format breaks the "document metaphor":
there is no one file that a user can point to
that is the "document".

<li><p><b>ZIP-ed Pile-of-Files Formats.</b>
Some applications use a Pile-of-Files that is then encapsulated into
a ZIP archive.  EPUB, ODT,and ODP are examples of this approach.
An EPUB book is really just a ZIP archive that contains various
XML files for the text of book chapters, GIF and JPEG images for
the artwork, and a specialized catalog file that tells the eBook
reader how all the XML and image files fit together.  OpenOffice
documents (ODT and ODP) are also ZIP archives containing XML and
images that represent their content as well as "catalog" files that
show the interrelationships between the component parts.
A ZIP-ed pile-of-files format
is not quite as accessible as a pure pile-of-files format, since
the component parts are not immediately visible.  But a knowledgeable
user can easily get to the component parts using any ZIP archiver.
And ZIP-ed pile-of-files formats are not as easy to edit, since
one most normally rewrite the entire file just to change one small
component part.
On the other hand, ZIP-ed pile-of-files formats have the advantage
of being contained in a single disk file, which makes them easier
to copy and attach and gives a more convenient metaphor for the

<h2>SQLite As The Application File Format</h2>

Designers of new applications are encouraged to consider using an
SQLite database as their application file format.  The use of SQLite
as the application file format has a number of advantages over
custom file formats, pile-of-file formats, and ZIP-ed pile-of-file

<p><b>Ease Of Development.</b>
No code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can save having to
write, debug, and maintain thousands and thousands of lines of code.

<p><b>Single-File Documents.</b>
An SQLite database is contained in a single file, which is easily
copied or moved or attached.  The "document" metaphor is preserved.
furthermore, SQLite does not have any database file naming requirements
and so the application can use any custom file suffix that it wants.

<p><b>High-level Query Language.</b>
SQLite is a complete relational database engine, which means that the
application can access content using high-level queries.  Application
developers need not spend time thinking about "how" to retrieve the
information they need from a document.  Developers write SQL that
expresses "what" information they want and leave it to the database engine
to figure out how to best retrieve that content.  This helps developers
to operate "heads up" and remain focused on solving the user's problem,
and avoid time spent "heads down" fiddling with low-level file
formatting details.

<p><b>Accessible Content.</b>
Information held in an SQLite database file is accessible using 
commonly available command-line tools - tools that 
are installed by default on Mac and Linux systems and that are 
freely available as a self-contained EXE file on Windows.
Unlike custom file formats, application-specific programs are
<u>not</u> required to read or write content in an SQLite database.
An SQLite database is uses a [file format | well-defined and well-documented]
file format that is in widespread use by literally hundreds of 
thousands of applications and
is backwards compatible to its inception in 2003 and which promises
to continue to be compatible in years to come.  The longevity of
SQLite database files is particularly important to bespoke applications,
since it allows the document content to be accessed years or decades in the
future, long after all traces of the original application have been lost.

SQLite database files are portable between 32-bit and 64-bit machines and
between big-endian and little-endian architectures and between windows
and unix in all of their various flavors and incarnations.  The application need
not worry over the byte-order of integers or floating point numbers.
Text content can be read or written as UTF-8, UTF-16LE, or UTF-16BE and 
SQLite will automatically perform any necessary translations on-the-fly.

<p><b>Atomic Updates.</b>
Writes to an SQLite database are atomic.  They either happen completely
or not at all, even in the face of system crashes or power failures.  So
there is no danger of corrupting a document just because the power happened
to go out at the same instant that a change was being written to disk.

<p><b>Incremental Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage of SQLite over custom
and ZIP-ed pile-of-files formats, both of which must completely
rewritten the entire document in order to change a single byte.  
Simple pile-of-files formats can also
do incremental updates to some extent, though granularity of writes is larger
with pile-of-file formats (a single file) than with SQLite (a single page).

As an application grows and evolves, new features can be added to an
SQLite application file format simply by adding new tables to the schema
or by adding new columns to existing tables.  With a modicum of care,
backwards compatibility of the application is preserved.  It is also possible
to create an extensible custom or pile-of-files format too, of course,
but doing so requires more work and more design discipline.

In many cases, an SQLite application file format can be faster than a
custom or pile-of-files format.  In the case of a custom format, SQLite
often dramatically improves start-up times because instead of having to
read and parse the entire document into memory, the application can
do queries to extract only the information needed for the initial screen
display.  In the case of a pile-of-files format, it might be surprising,
but SQLite can read and write smaller BLOBs from its database faster than
they can be read or written as separate files.  (See
[Internal Versus External BLOBs] for further information.)

<p><b>Safe For Concurrent Use By Multiple Processes.</b>
SQLite automatically coordinates concurrent access to the same
document from multiple threads and/or processes.  Two or more
applications can connect and read from the same document at the
same time.  Writes are serialized, but as writes normally only
take milliseconds, usually applications simply take turns writing.
SQLite automatically ensures that the low-level format of the
document is uncorrupted.  Accomplishing the same with a custom
or pile-of-files format, in contrast, requires extensive support
in the application.  And the application code used to coordinate 
concurrent is a notoriously bug-magnet.

<p><b>Works With Multiple Programming Languages.</b>
Though SQLite is itself written in ANSI-C, interfaces exist for
just about every other programming language you can think of:
C++, C#, Objective-C, Java, Tcl, Perl, Python, Ruby, Erlang,
JavaScript, and so forth.  So programmers can develop in whatever
language they are most comfortable with and which best matches
the application.  An SQLite application file format is a great 
choice in cases where there is a collection or "federation" of
separate programs, often written in different langauges and by
different development teams, that need to cooperate through the
use of a common document format.

<p><b>Built-In Support For Application Identification.</b>
SQLite database file contain a 4-byte [Application ID] in
their headers that can be set to an application-defined value
and then used to identify the "type" of the file for utility
programs such as [ | file(1)].

<h2>Case Studies</h2>