Documentation Source Text

Check-in [9e12f0cedd]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Improvements to the application file format document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 9e12f0ceddbc3af91e56f4213084db6bdc4fb29a
User & Date: drh 2014-03-13 15:38:15.208
Context
2014-03-14
16:35
Further tuning of the application file format document. (check-in: 1b422ce8de user: drh tags: trunk)
2014-03-13
15:38
Improvements to the application file format document. (check-in: 9e12f0cedd user: drh tags: trunk)
00:43
First complete draft of the new application file format document. Integrate with the rest of the documentation via hyperlinks. (check-in: 6d257b8d92 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/about.in.
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
files.  A complete SQL database with multiple tables, indices,
triggers, and views, is contained in a single disk file.
The database file format is cross-platform - you can freely copy a database
between 32-bit and 64-bit systems or between 
[http://en.wikipedia.org/wiki/Endianness | big-endian] and
[http://en.wikipedia.org/wiki/Endianness | little-endian]
architectures.  These features make SQLite a popular choice as
an <a href="whentouse.html#appfileformat">Application File Format</a>.
Think of SQLite not as a replacement for 
[http://www.oracle.com/database/index.html|Oracle] but
as a replacement for [http://man.he.net/man3/fopen|fopen()]</p>

<p>SQLite is a compact library.
With all features enabled, the [library size] can be less than 500KiB,
depending on the target platform and compiler optimization settings.







|







49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
files.  A complete SQL database with multiple tables, indices,
triggers, and views, is contained in a single disk file.
The database file format is cross-platform - you can freely copy a database
between 32-bit and 64-bit systems or between 
[http://en.wikipedia.org/wiki/Endianness | big-endian] and
[http://en.wikipedia.org/wiki/Endianness | little-endian]
architectures.  These features make SQLite a popular choice as
an [Application File Format].
Think of SQLite not as a replacement for 
[http://www.oracle.com/database/index.html|Oracle] but
as a replacement for [http://man.he.net/man3/fopen|fopen()]</p>

<p>SQLite is a compact library.
With all features enabled, the [library size] can be less than 500KiB,
depending on the target platform and compiler optimization settings.
Changes to pages/appfileformat.in.
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<tcl>hd_keywords *appformat {application file-format}</tcl>

<title>SQLite As An Application File Format</title>

<h1 align="center">
SQLite As An Application File Format
</h1>

<h2>Executive Summary</h2>

<p>This essay advocates for the use of SQLite as the file format in new
applications.  SQLite is often a better choice for an application file
format than other techniques in common use.  These are some of the
reasons to prefer SQLite:

<ol>
<li> Ease Of Development
<li> Single-File Documents
<li> High-Level Query Language
<li> Accessible Content
<li> Cross-Platform
<li> Atomic Transactions
<li> Incremental And Continuous Updates
<li> Easily Extensible
<li> Performance
<li> Concurrent Use By Multiple Processes
<li> Multiple Programming Languages
<li> Better Applications
</ol>

<p>Each of these reasons will be described in more detail following
a brief discussion of what exactly this article means by "application
file format".

<h2>What Is An Application File Format?</h2>

<p>
An "application file format" is the file format
used to persist application state to disk or to exchange
information between programs.
|
>








|
|
|
<


|













|
|
|







1
2
3
4
5
6
7
8
9
10
11
12
13

14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<tcl>hd_keywords *appformat {application file-format} \
     {Application File Format}</tcl>
<title>SQLite As An Application File Format</title>

<h1 align="center">
SQLite As An Application File Format
</h1>

<h2>Executive Summary</h2>

<p>An SQLite database file with a defined schema
often make an excellent application file format.
Here are a dozen reaons why this is so:


<ol>
<li> Simplified Application Development
<li> Single-File Documents
<li> High-Level Query Language
<li> Accessible Content
<li> Cross-Platform
<li> Atomic Transactions
<li> Incremental And Continuous Updates
<li> Easily Extensible
<li> Performance
<li> Concurrent Use By Multiple Processes
<li> Multiple Programming Languages
<li> Better Applications
</ol>

<p>Each of these points will be described in more detail below,
after first considering more closely what this article means by
"application file format".

<h2>What Is An Application File Format?</h2>

<p>
An "application file format" is the file format
used to persist application state to disk or to exchange
information between programs.
48
49
50
51
52
53
54















55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

127









128


129




130




131
132
133
134
135
136
137
138
139
140
141
<li>GIT - Git source code repository
<li>EPUB - The Electronic Publication format used by non-Kindle eBooks
<li>ODT - The Open Document format used by OpenOffice and others
<li>PPT - Microsoft PowerPoint presentations
<li>ODP - The Open Document presentation format used by OpenOffice and others
</ul>
















<p>Many application file formats fit into one of these three categories:

<ol>
<li><p><b>Fully Custom Formats.</b>
Custom formats are specifically designed for a single application.
DOC, DWG, PDF, XLS, and PPT are examples of custom formats.  Custom
formats are usually contained within a single file, for ease of transport.
They are also usually binary, though the DWG format is a notable exception.
Custom file formats require specialized application code
to read and write and are not normally accessible from commonly
available tools such as unix command-line programs and text editors.
In other words, custom formats are often "opaque blobs".  Generally speaking,
to access the content of a custom application file format, you have to
have a tool that is specifically engineered to read and/or write that
format.

<li><p><b>Pile-of-Files Formats.</b>
Sometimes the application state is stored as a hierarchy of
files.  Git is a prime example of this, though the phenomenon occurs
frequently in one-off and bespoke applications.  A pile-of-files format
essentially uses the filesystem as a key/value database, storing small
chunks of information into separate files.  This gives the
advantage of making the content more accessible to common utility
programs such as text editors or "awk" or "grep".  But even if many 
of the files in a pile-of-files format
are easily readable, there are usually some files that have their
own custom format (example: Git "Packfiles") and are hence
"opaque blobs" that are not readable
or writable without specialized tools.  It is also much less convenient
to move a pile-of-files from one place or machine to another, than
it is to move a single file.  And it is hard to make a pile-of-files
document into an email attachment, for example.  Finally, a pile-of-files
format breaks the "document metaphor":
there is no one file that a user can point to
that is the "document".

<li><p><b>ZIP-ed Pile-of-Files Formats.</b>
Some applications use a Pile-of-Files that is then encapsulated into

a ZIP archive.  EPUB, ODT,and ODP are examples of this approach.
An EPUB book is really just a ZIP archive that contains various
XHTML files for the text of book chapters, GIF and JPEG images for
the artwork, and a specialized catalog file that tells the eBook
reader how all the XML and image files fit together.  OpenOffice
documents (ODT and ODP) are also ZIP archives containing XML and
images that represent their content as well as "catalog" files that
show the interrelationships between the component parts.

<p>A ZIP-ed pile-of-files format is a compromise between a full
custom file format and a pure pile-of-files format.
A ZIP-ed pile-of-files format is not an opaque blob in the same sense
as a custom file format, since the component parts can still be accessed
using any common ZIP archiver, but the format is not quite as accessible
as a pure pile-of-files format because one does still need the ZIP 
archiver, and one cannot normally use command-line tools like "find"
on the file hierarchy without first un-zipping it.  On the other
hand, a ZIP-ed pile-of-files format does preserve the document
metaphor by putting all content into a single disk file.  And
because it is compressed, the ZIP-ed pile-of-files format tends to
be more compact.

<p>As with custom file formats, and unlike pure pile-of-file formats,
a ZIP-ed pile-of-files format is not as easy to edit, since
one most normally rewrite the entire file to change any
component part.
</ol>

<p>The purpose of this document is to argue in favor of a fourth
new catagory of application file format: An SQLite database file.

<h2>SQLite As The Application File Format</h2>

<p>

For many applications, the use of SQLite









as the application file format has a dozen or more compelling advantages over


custom file formats, pile-of-file formats, and ZIP-ed pile-of-file




formats.  To wit:




</p>

<ol>
<li><p><b>Ease Of Development.</b>
No code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can reduce application code size by many thousands of
lines, with corresponding saving in development and maintenance costs.








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|










|
|
|
<



















|

|

>
|








|

|





|

|



|










>
|
>
>
>
>
>
>
>
>
>
|
>
>
|
>
>
>
>
|
>
>
>
>



|







48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
<li>GIT - Git source code repository
<li>EPUB - The Electronic Publication format used by non-Kindle eBooks
<li>ODT - The Open Document format used by OpenOffice and others
<li>PPT - Microsoft PowerPoint presentations
<li>ODP - The Open Document presentation format used by OpenOffice and others
</ul>

<p>We make a distinction between a "file format" and an "application format".
A file format is used to store a single object.  So, for example, a GIF or
JPEG file stores a single image, and an XHTML file stores text,
so those are "file formats" and not "application formats".  A EPUB file, 
in contrast, stores both text and images (as contained XHTML and GIF/JPEG
files) and so it is considered a "application format".  This article is
about "application formats".

<p>The boundary between a file format and an application format is fuzzy.
This article calls JPEG a file format, but for an image editor, JPEG 
might be considered the application format.  Much depends on context.
For this article, let us say that a file format stores a single object
and an application format stores many different objects and their relationships
to one another.

<p>Most application formats fit into one of these three categories:

<ol>
<li><p><b>Fully Custom Formats.</b>
Custom formats are specifically designed for a single application.
DOC, DWG, PDF, XLS, and PPT are examples of custom formats.  Custom
formats are usually contained within a single file, for ease of transport.
They are also usually binary, though the DWG format is a notable exception.
Custom file formats require specialized application code
to read and write and are not normally accessible from commonly
available tools such as unix command-line programs and text editors.
In other words, custom formats are usually "opaque blobs".
To access the content of a custom application file format, one needs
a tool specifically engineered to read and/or write that format.


<li><p><b>Pile-of-Files Formats.</b>
Sometimes the application state is stored as a hierarchy of
files.  Git is a prime example of this, though the phenomenon occurs
frequently in one-off and bespoke applications.  A pile-of-files format
essentially uses the filesystem as a key/value database, storing small
chunks of information into separate files.  This gives the
advantage of making the content more accessible to common utility
programs such as text editors or "awk" or "grep".  But even if many 
of the files in a pile-of-files format
are easily readable, there are usually some files that have their
own custom format (example: Git "Packfiles") and are hence
"opaque blobs" that are not readable
or writable without specialized tools.  It is also much less convenient
to move a pile-of-files from one place or machine to another, than
it is to move a single file.  And it is hard to make a pile-of-files
document into an email attachment, for example.  Finally, a pile-of-files
format breaks the "document metaphor":
there is no one file that a user can point to
that is "the document".

<li><p><b>Wrapped Pile-of-Files Formats.</b>
Some applications use a Pile-of-Files that is then encapsulated into
some kind of single-file container, usually a ZIP archive.  
EPUB, ODT,and ODP are examples of this approach.
An EPUB book is really just a ZIP archive that contains various
XHTML files for the text of book chapters, GIF and JPEG images for
the artwork, and a specialized catalog file that tells the eBook
reader how all the XML and image files fit together.  OpenOffice
documents (ODT and ODP) are also ZIP archives containing XML and
images that represent their content as well as "catalog" files that
show the interrelationships between the component parts.

<p>A wrapped pile-of-files format is a compromise between a full
custom file format and a pure pile-of-files format.
A wrapped pile-of-files format is not an opaque blob in the same sense
as a custom file format, since the component parts can still be accessed
using any common ZIP archiver, but the format is not quite as accessible
as a pure pile-of-files format because one does still need the ZIP 
archiver, and one cannot normally use command-line tools like "find"
on the file hierarchy without first un-zipping it.  On the other
hand, a wrapped pile-of-files format does preserve the document
metaphor by putting all content into a single disk file.  And
because it is compressed, the wrapped pile-of-files format tends to
be more compact.

<p>As with custom file formats, and unlike pure pile-of-file formats,
a wrapped pile-of-files format is not as easy to edit, since
one most normally rewrite the entire file to change any
component part.
</ol>

<p>The purpose of this document is to argue in favor of a fourth
new catagory of application file format: An SQLite database file.

<h2>SQLite As The Application File Format</h2>

<p>
An SQLite database file makes an excellent alternative to a
custom or pile-of-files application format.  In its simplest form,
an SQLite database with a single key/value table like
<blockquote><pre>
CREATE TABLE files(filename TEXT PRIMARY KEY, content BLOB);
</pre></blockquote>
could serve as a direct replacement for a wrapped pile-of-files format.
If the content is compressed, then such an SQLite database is only
slightly larger than an equivalent ZIP archive, and it has the advantage
of being able to write individual "files" without having to rewrite
the entire document.

<p>
But an SQLite database is not limited to a simple key/value structure
like a pile-of-files database.  An SQLite database can have dozens
or hundreds or thousands of different of tables, with dozens or
hundreds or thousands of fields per table, each with different datatypes and
particular meanings, all cross-referencing each other, and all stored
efficiently and compactly in a single disk file.

<p>
Compared to other approaches, the use of
an SQLite database as an application file format has
compelling advantages:
</p>

<ol>
<li><p><b>Simplified Application Development.</b>
No code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can reduce application code size by many thousands of
lines, with corresponding saving in development and maintenance costs.

227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
to verify that no repository history has been lost prior to each change
to the repository.

<li><p><b>Incremental And Continuous Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage over custom
and ZIP-ed pile-of-files formats, both of which must completely
rewrite the entire document in order to change a single byte.  
Pure pile-of-files formats can also
do incremental updates to some extent, though the granularity of writes is 
usually larger with pile-of-file formats (a single file) than with SQLite
(a single page).

<p>A desktop application built on SQLite can also do continuous update.







|







262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
to verify that no repository history has been lost prior to each change
to the repository.

<li><p><b>Incremental And Continuous Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage over custom
and wrapped pile-of-files formats, both of which must completely
rewrite the entire document in order to change a single byte.  
Pure pile-of-files formats can also
do incremental updates to some extent, though the granularity of writes is 
usually larger with pile-of-file formats (a single file) than with SQLite
(a single page).

<p>A desktop application built on SQLite can also do continuous update.
368
369
370
371
372
373
374
375
376
377
378
379
</ol>

<h2>Conclusion</h2>

<p>
SQLite is not the perfect application file format for every situation.
But in many cases, SQLite is a far better choice than either a custom
file format, a pile-of-files, or a ZIP-ed pile-of-files.
SQLite is a high-level, stable, reliable, cross-platform, widely-deployed,
extensible, performant, accessible, concurrent file format.  It deserves
your consideration as the standard file format on your next application
design.







|




403
404
405
406
407
408
409
410
411
412
413
414
</ol>

<h2>Conclusion</h2>

<p>
SQLite is not the perfect application file format for every situation.
But in many cases, SQLite is a far better choice than either a custom
file format, a pile-of-files, or a wrapped pile-of-files.
SQLite is a high-level, stable, reliable, cross-platform, widely-deployed,
extensible, performant, accessible, concurrent file format.  It deserves
your consideration as the standard file format on your next application
design.