Documentation Source Text

Check-in [6d257b8d92]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:First complete draft of the new application file format document. Integrate with the rest of the documentation via hyperlinks.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 6d257b8d928cfd003a0a10eb10c0e4d5978b787f
User & Date: drh 2014-03-13 00:43:56
Context
2014-03-13
15:38
Improvements to the application file format document. (check-in: 9e12f0cedd user: drh tags: trunk)
00:43
First complete draft of the new application file format document. Integrate with the rest of the documentation via hyperlinks. (check-in: 6d257b8d92 user: drh tags: trunk)
2014-03-12
18:12
First draft of a document that advocates for using SQLite as an application file format. (check-in: aa63d85794 user: drh tags: trunk)
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/appfileformat.in.


1
2






























3
4
5
6
7
8
9

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

35
36
37
38
39
40
41
42
43
44
45

46
47
48

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66



67


68





69
70

71
72
73
74
75
76
77
78
79



80
81
82
83
84
85
86
87
88
89

90
91
92
93
94
95

96





97
98
99
100
101
102

103






104
105
106
107
108
109
110
111
112
113
114







115
116
117
118
119
120
121






122
123
124
125
126
127
128
129

130
131
132
133
134


135
136
137
138
139
140

141
142
143
144











145
146
147
148
149
150
151
152
153

154









155
156
157
158


159

160
161




162
163
164
165
166
167
168




169


170
171
172



173









174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191


192
193
194
195








196
197
198
199








200

201




















202







203
204
205
206








<title>SQLite As An Application File Format</title>































<h2>What Is An Application File Format?</h2>

<p>
An "application file format" is the schema and/or low-level byte 
format used to persist appliation state to disk or to exchange
information between programs.
Some examples of application file formats include:


<ul>
<li>"*.doc" - Word Perfect and Microsoft Office documents
<li>"*.dwg" - AutoCAD drawings
<li>"*.pdf" - Portable Document Format from Adobe
<li>"*.xls" - Microsoft Excel Spreadsheet
<li>".git" - Git source code repository
<li>"*.epub" - The Eletronic Publication format used by non-Kindle eBooks
<li>"*.odt" - The Open Document format used by OpenOffice and others
<li>"*.ppt" - Microsoft PowerPoint presentations
<li>"*.odp" - The Open Document presentation format used by OpenOffice and others
</ul>

<p>The application file formats listed above can be divided into
several categories:

<ol>
<li><p><b>Fully Custom Formats.</b>
Custom formats are specifically designed for a single application.
DOC, DWG, PDF, XLS, and PPT are examples of custom formats.  Custom
formats are usually contained within a single file, for ease of transport.
They are also usually binary, though the DWG format is a notable exception.
Custom file formats require specialized application code
to read and write and are not normally accessible from commonly
available tools such as unix command-line programs.  Generally speaking,

to access the content of a custom application file format, you have to
have a tool that is specifically engineered to read and/or write that
format.

<li><p><b>Pile-of-Files Formats.</b>
Sometimes the application state is stored in a directory hierarchy of
files.  Git is a prime example of this, though the phenomenon occurs
frequently one-off and bespoke applications.  A pile-of-files format
essentially uses the filesystem as a key/value database and small
chunks of information into separate files.  This gives the
advantage of making the content more accessible to common utility

programs.  But even if many of the files in a pile-of-files format
are easily readable, there are usually some files that have their
own custom format, such as Git "Packfiles", and are hence not readable

or writable without specialized tools.  It is much less convenient
to move a pile-of-files from one place or machine to another, than
it is to move a single file.  It is hard to make a pile-of-files
document into an email attachment, for example.  Finally, a pile-of-files
format breaks the "document metaphor":
there is no one file that a user can point to
that is the "document".

<li><p><b>ZIP-ed Pile-of-Files Formats.</b>
Some applications use a Pile-of-Files that is then encapsulated into
a ZIP archive.  EPUB, ODT,and ODP are examples of this approach.
An EPUB book is really just a ZIP archive that contains various
XML files for the text of book chapters, GIF and JPEG images for
the artwork, and a specialized catalog file that tells the eBook
reader how all the XML and image files fit together.  OpenOffice
documents (ODT and ODP) are also ZIP archives containing XML and
images that represent their content as well as "catalog" files that
show the interrelationships between the component parts.



A ZIP-ed pile-of-files format


is not quite as accessible as a pure pile-of-files format, since





the component parts are not immediately visible.  But a knowledgeable
user can easily get to the component parts using any ZIP archiver.

And ZIP-ed pile-of-files formats are not as easy to edit, since
one most normally rewrite the entire file just to change one small
component part.
On the other hand, ZIP-ed pile-of-files formats have the advantage
of being contained in a single disk file, which makes them easier
to copy and attach and gives a more convenient metaphor for the
"document".
</ol>




<h2>SQLite As The Application File Format</h2>

<p>
Designers of new applications are encouraged to consider using an
SQLite database as their application file format.  The use of SQLite
as the application file format has a number of advantages over
custom file formats, pile-of-file formats, and ZIP-ed pile-of-file
formats.
</p>


<p><b>Ease Of Development.</b>
No code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can save having to

write, debug, and maintain thousands and thousands of lines of code.






<p><b>Single-File Documents.</b>
An SQLite database is contained in a single file, which is easily
copied or moved or attached.  The "document" metaphor is preserved.
furthermore, SQLite does not have any database file naming requirements
and so the application can use any custom file suffix that it wants.








<p><b>High-level Query Language.</b>
SQLite is a complete relational database engine, which means that the
application can access content using high-level queries.  Application
developers need not spend time thinking about "how" to retrieve the
information they need from a document.  Developers write SQL that
expresses "what" information they want and leave it to the database engine
to figure out how to best retrieve that content.  This helps developers
to operate "heads up" and remain focused on solving the user's problem,
and avoid time spent "heads down" fiddling with low-level file
formatting details.








<p><b>Accessible Content.</b>
Information held in an SQLite database file is accessible using 
commonly available command-line tools - tools that 
are installed by default on Mac and Linux systems and that are 
freely available as a self-contained EXE file on Windows.
Unlike custom file formats, application-specific programs are
<u>not</u> required to read or write content in an SQLite database.






An SQLite database is uses a [file format | well-defined and well-documented]
file format that is in widespread use by literally hundreds of 
thousands of applications and
is backwards compatible to its inception in 2003 and which promises
to continue to be compatible in years to come.  The longevity of
SQLite database files is particularly important to bespoke applications,
since it allows the document content to be accessed years or decades in the
future, long after all traces of the original application have been lost.


<p><b>Cross-Platform.</b>
SQLite database files are portable between 32-bit and 64-bit machines and
between big-endian and little-endian architectures and between windows
and unix in all of their various flavors and incarnations.  The application need


not worry over the byte-order of integers or floating point numbers.
Text content can be read or written as UTF-8, UTF-16LE, or UTF-16BE and 
SQLite will automatically perform any necessary translations on-the-fly.

<p><b>Atomic Updates.</b>
Writes to an SQLite database are atomic.  They either happen completely

or not at all, even in the face of system crashes or power failures.  So
there is no danger of corrupting a document just because the power happened
to go out at the same instant that a change was being written to disk.












<p><b>Incremental Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage of SQLite over custom
and ZIP-ed pile-of-files formats, both of which must completely
rewritten the entire document in order to change a single byte.  
Simple pile-of-files formats can also
do incremental updates to some extent, though granularity of writes is larger
with pile-of-file formats (a single file) than with SQLite (a single page).











<p><b>Extensible.</b>
As an application grows and evolves, new features can be added to an
SQLite application file format simply by adding new tables to the schema
or by adding new columns to existing tables.  With a modicum of care,


backwards compatibility of the application is preserved.  It is also possible

to create an extensible custom or pile-of-files format too, of course,
but doing so requires more work and more design discipline.





<p><b>Performance.</b>
In many cases, an SQLite application file format can be faster than a
custom or pile-of-files format.  In the case of a custom format, SQLite
often dramatically improves start-up times because instead of having to
read and parse the entire document into memory, the application can
do queries to extract only the information needed for the initial screen




display.  In the case of a pile-of-files format, it might be surprising,


but SQLite can read and write smaller BLOBs from its database faster than
they can be read or written as separate files.  (See
[Internal Versus External BLOBs] for further information.)













<p><b>Safe For Concurrent Use By Multiple Processes.</b>
SQLite automatically coordinates concurrent access to the same
document from multiple threads and/or processes.  Two or more
applications can connect and read from the same document at the
same time.  Writes are serialized, but as writes normally only
take milliseconds, usually applications simply take turns writing.
SQLite automatically ensures that the low-level format of the
document is uncorrupted.  Accomplishing the same with a custom
or pile-of-files format, in contrast, requires extensive support
in the application.  And the application code used to coordinate 
concurrent is a notoriously bug-magnet.

<p><b>Works With Multiple Programming Languages.</b>
Though SQLite is itself written in ANSI-C, interfaces exist for
just about every other programming language you can think of:
C++, C#, Objective-C, Java, Tcl, Perl, Python, Ruby, Erlang,
JavaScript, and so forth.  So programmers can develop in whatever
language they are most comfortable with and which best matches


the application.  An SQLite application file format is a great 
choice in cases where there is a collection or "federation" of
separate programs, often written in different langauges and by
different development teams, that need to cooperate through the








use of a common document format.

<p><b>Built-In Support For Application Identification.</b>
SQLite database file contain a 4-byte [Application ID] in








their headers that can be set to an application-defined value

and then used to identify the "type" of the file for utility




















programs such as [http://linux.die.net/man/1/file | file(1)].








<h2>Case Studies</h2>

<h2>Summary</h2>







>


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>



|
|

|
>


|
|
|
|
|
|
|
|
|


|
<









|
>





|

|
|


>
|

|
>
|

|









|





>
>
>
|
>
>
|
>
>
>
>
>
|
|
>
|
|

<
<
<
<


>
>
>



<
|
|

|


>
|




|
>
|
>
>
>
>
>

|


|
|
>

>
>
>
>
>
>
|




|

|



>
>
>
>
>
>
>
|

|




>
>
>
>
>
>
|


|




>

|

|
|
>
>
|



|
|
>
|



>
>
>
>
>
>
>
>
>
>
>
|


|

|
|
|
|
>

>
>
>
>
>
>
>
>
>
|
|

|
>
>
|
>
|
|
>
>
>
>

|
|



|
>
>
>
>
|
>
>
|
|
|
>
>
>

>
>
>
>
>
>
>
>
>
|




|



|
|

|





>
>
|

|
|
>
>
>
>
>
>
>
>
|

|
|
>
>
>
>
>
>
>
>
|
>
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>

|

|
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118




119
120
121
122
123
124
125
126

127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
<tcl>hd_keywords *appformat {application file-format}</tcl>
<title>SQLite As An Application File Format</title>

<h1 align="center">
SQLite As An Application File Format
</h1>

<h2>Executive Summary</h2>

<p>This essay advocates for the use of SQLite as the file format in new
applications.  SQLite is often a better choice for an application file
format than other techniques in common use.  These are some of the
reasons to prefer SQLite:

<ol>
<li> Ease Of Development
<li> Single-File Documents
<li> High-Level Query Language
<li> Accessible Content
<li> Cross-Platform
<li> Atomic Transactions
<li> Incremental And Continuous Updates
<li> Easily Extensible
<li> Performance
<li> Concurrent Use By Multiple Processes
<li> Multiple Programming Languages
<li> Better Applications
</ol>

<p>Each of these reasons will be described in more detail following
a brief discussion of what exactly this article means by "application
file format".

<h2>What Is An Application File Format?</h2>

<p>
An "application file format" is the file format
used to persist application state to disk or to exchange
information between programs.
There are thousands of application file formats in use today.
Here are a few examples:

<ul>
<li>DOC - Word Perfect and Microsoft Office documents
<li>DWG - AutoCAD drawings
<li>PDF - Portable Document Format from Adobe
<li>XLS - Microsoft Excel Spreadsheet
<li>GIT - Git source code repository
<li>EPUB - The Electronic Publication format used by non-Kindle eBooks
<li>ODT - The Open Document format used by OpenOffice and others
<li>PPT - Microsoft PowerPoint presentations
<li>ODP - The Open Document presentation format used by OpenOffice and others
</ul>

<p>Many application file formats fit into one of these three categories:


<ol>
<li><p><b>Fully Custom Formats.</b>
Custom formats are specifically designed for a single application.
DOC, DWG, PDF, XLS, and PPT are examples of custom formats.  Custom
formats are usually contained within a single file, for ease of transport.
They are also usually binary, though the DWG format is a notable exception.
Custom file formats require specialized application code
to read and write and are not normally accessible from commonly
available tools such as unix command-line programs and text editors.
In other words, custom formats are often "opaque blobs".  Generally speaking,
to access the content of a custom application file format, you have to
have a tool that is specifically engineered to read and/or write that
format.

<li><p><b>Pile-of-Files Formats.</b>
Sometimes the application state is stored as a hierarchy of
files.  Git is a prime example of this, though the phenomenon occurs
frequently in one-off and bespoke applications.  A pile-of-files format
essentially uses the filesystem as a key/value database, storing small
chunks of information into separate files.  This gives the
advantage of making the content more accessible to common utility
programs such as text editors or "awk" or "grep".  But even if many 
of the files in a pile-of-files format
are easily readable, there are usually some files that have their
own custom format (example: Git "Packfiles") and are hence
"opaque blobs" that are not readable
or writable without specialized tools.  It is also much less convenient
to move a pile-of-files from one place or machine to another, than
it is to move a single file.  And it is hard to make a pile-of-files
document into an email attachment, for example.  Finally, a pile-of-files
format breaks the "document metaphor":
there is no one file that a user can point to
that is the "document".

<li><p><b>ZIP-ed Pile-of-Files Formats.</b>
Some applications use a Pile-of-Files that is then encapsulated into
a ZIP archive.  EPUB, ODT,and ODP are examples of this approach.
An EPUB book is really just a ZIP archive that contains various
XHTML files for the text of book chapters, GIF and JPEG images for
the artwork, and a specialized catalog file that tells the eBook
reader how all the XML and image files fit together.  OpenOffice
documents (ODT and ODP) are also ZIP archives containing XML and
images that represent their content as well as "catalog" files that
show the interrelationships between the component parts.

<p>A ZIP-ed pile-of-files format is a compromise between a full
custom file format and a pure pile-of-files format.
A ZIP-ed pile-of-files format is not an opaque blob in the same sense
as a custom file format, since the component parts can still be accessed
using any common ZIP archiver, but the format is not quite as accessible
as a pure pile-of-files format because one does still need the ZIP 
archiver, and one cannot normally use command-line tools like "find"
on the file hierarchy without first un-zipping it.  On the other
hand, a ZIP-ed pile-of-files format does preserve the document
metaphor by putting all content into a single disk file.  And
because it is compressed, the ZIP-ed pile-of-files format tends to
be more compact.

<p>As with custom file formats, and unlike pure pile-of-file formats,
a ZIP-ed pile-of-files format is not as easy to edit, since
one most normally rewrite the entire file to change any
component part.




</ol>

<p>The purpose of this document is to argue in favor of a fourth
new catagory of application file format: An SQLite database file.

<h2>SQLite As The Application File Format</h2>

<p>

For many applications, the use of SQLite
as the application file format has a dozen or more compelling advantages over
custom file formats, pile-of-file formats, and ZIP-ed pile-of-file
formats.  To wit:
</p>

<ol>
<li><p><b>Ease Of Development.</b>
No code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can reduce application code size by many thousands of
lines, with corresponding saving in development and maintenance costs.

<p>SQLite is widely used and stable.  There are literally billions of
SQLite database files in use daily, on smartphones and in desktop applications.
SQLite is [tested | carefully tested] and proven reliable.  It is not
a component that needs much tuning or debugging, allowing developers
to stay focused on application logic.

<li><p><b>Single-File Documents.</b>
An SQLite database is contained in a single file, which is easily
copied or moved or attached.  The "document" metaphor is preserved.
SQLite does not have any file naming requirements
and so the application can use any custom file suffix that it wants
to help identify the file as "belonging" to the application.

<p>SQLite database files contain a 4-byte [Application ID] in
their headers that can be set to an application-defined value
and then used to identify the "type" of the document for utility
programs such as [http://linux.die.net/man/1/file | file(1)].


<li><p><b>High-Level Query Language.</b>
SQLite is a complete relational database engine, which means that the
application can access content using high-level queries.  Application
developers need not spend time thinking about "how" to retrieve the
information they need from a document.  Developers write SQL that
expresses "what" information they want and let the database engine
to figure out how to best retrieve that content.  This helps developers
operate "heads up" and remain focused on solving the user's problem,
and avoid time spent "heads down" fiddling with low-level file
formatting details.

<p>A pile-of-files format can viewed as a key/value database.  
A key/value database is better than no database at all.
But without transactions or indices or a high-level query language or
a proper schema,
it much harder and more error prone to use a key/value database than
a relational database.

<li><p><b>Accessible Content.</b>
Information held in an SQLite database file is accessible using 
commonly available open-source command-line tools - tools that 
are installed by default on Mac and Linux systems and that are 
freely available as a self-contained EXE file on Windows.
Unlike custom file formats, application-specific programs are
<u>not</u> required to read or write content in an SQLite database.
An SQLite database file is <u>not</u> an opaque blob.  It is true
that command-line tools such as text editors or "grep" or "awk" are
not useful on an SQLite database, but the SQL query language is a much
more powerful and convenient way for examining the the content, so the
inability to use "grep" and "awk" and the like is not seen as a loss.

<p>An SQLite database is a [file format | well-defined and well-documented]
file format that is in widespread use by literally hundreds of 
thousands of applications and
is backwards compatible to its inception in 2004 and which promises
to continue to be compatible in years to come.  The longevity of
SQLite database files is particularly important to bespoke applications,
since it allows the document content to be accessed years or decades in the
future, long after all traces of the original application have been lost.
Data lives longer than code.

<li><p><b>Cross-Platform.</b>
SQLite database files are portable between 32-bit and 64-bit machines and
between big-endian and little-endian architectures and between any of the
various flavors of windows and unix-like operating systems.
The application using an SQLite application file format can store
binary numeric data without having to worry about the byte-order of
integers or floating point numbers.
Text content can be read or written as UTF-8, UTF-16LE, or UTF-16BE and 
SQLite will automatically perform any necessary translations on-the-fly.

<li><p><b>Atomic Transactions.</b>
Writes to an SQLite database are [atomic commit | atomic].  
They either happen completely
or not at all, even during system crashes or power failures.  So
there is no danger of corrupting a document just because the power happened
to go out at the same instant that a change was being written to disk.

<p>SQLite is transactional, meaning that multiple changes can be grouped
together such that either all or none of them occur, and so that the
changes can be rolled back if a problem is found prior to to commit.
This allows an application to make a change incrementally, then run
various sanity and consistency checks on the resulting data prior to
committing the changes to disk.  The
[http://www.fossil-scm.org/ | Fossil] DVCS 
[http://www.fossil-scm.org/fossil/doc/tip/www/selfcheck.wiki|uses this technique]
to verify that no repository history has been lost prior to each change
to the repository.

<li><p><b>Incremental And Continuous Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage over custom
and ZIP-ed pile-of-files formats, both of which must completely
rewrite the entire document in order to change a single byte.  
Pure pile-of-files formats can also
do incremental updates to some extent, though the granularity of writes is 
usually larger with pile-of-file formats (a single file) than with SQLite
(a single page).

<p>A desktop application built on SQLite can also do continuous update.
In other words, instead of collecting changes in memory and then writing
them to disk only on a File/Save action, changes can be written back to
the disk as they occur.  The avoids loss of work on a system crash or
power failure.  The undo/redo stack can also be kept in the on-disk 
database, meaning that undo/redo can occur across session boundaries.
And maintenance of the undo/redo stack can often be automated using
triggers.

<li><p><b>Easily Extensible.</b>
As an application grows, new features can be added to an
SQLite application file format simply by adding new tables to the schema
or by adding new columns to existing tables.  Adding columns or tables
does not change the meaning of prior queries against the database, so with a 
modicum of care to ensuring that the meaning of the columns and
tables are preserved, backwards compatibility is maintained.

<p>It is possible to extend custom or pile-of-files formats too, of course,
but doing is often much harder.  If indices are added, then all application
code that changes the corresponding tables must be located and modified to
keep those indices up-to-date.  If columns are added, then all application
code that accesses the corresponding table must be located and modified to 
take into account the new columns.

<li><p><b>Performance.</b>
In many cases, an SQLite application file format will be faster than a
custom or pile-of-files format.  In the case of a custom format, SQLite
often dramatically improves start-up times because instead of having to
read and parse the entire document into memory, the application can
do queries to extract only the information needed for the initial screen.
As the application progresses, it only needs to load as much material as
is needed to draw the next screen, and can discard information from
prior screens that is no longer in use.  This helps keep the memory
footprint of the application under control.

<p>A pile-of-files format can be read incrementally just like SQLite.
But many developers are surprised to learn that SQLite can read and 
write smaller BLOBs (less than about 100KB in size) from its database 
faster than those same blobs can be read or written as separate files 
from the filesystem.  (See [Internal Versus External BLOBs] for further 
information.)  There is overhead associated with operating a relational
database engine, however one should not assume that direct file I/O
is faster than SQLite database I/O, as often it is not.

<p>In either case, if performance problems do arise in an SQLite application 
those problems can often be resolved by adding one or two [CREATE INDEX]
statements to the schema or perhaps running [ANALYZE] one time
and without having to touch a single line of
application code.  But if a performance problem comes up in a custom or 
pile-of-files format, the the fix will often require extensive changes
to application code to add and maintain new indices or to extract 
information using different algorithms.

<li><p><b>Concurrent Use By Multiple Processes.</b>
SQLite automatically coordinates concurrent access to the same
document from multiple threads and/or processes.  Two or more
applications can connect and read from the same document at the
same time.  Writes are serialized, but as writes normally only
take milliseconds, applications simply take turns writing.
SQLite automatically ensures that the low-level format of the
document is uncorrupted.  Accomplishing the same with a custom
or pile-of-files format, in contrast, requires extensive support
in the application.  And the application logic needed to support
concurrency is a notorious bug-magnet.

<li><p><b>Multiple Programming Languages.</b>
Though SQLite is itself written in ANSI-C, interfaces exist for
just about every other programming language you can think of:
C++, C#, Objective-C, Java, Tcl, Perl, Python, Ruby, Erlang,
JavaScript, and so forth.  So programmers can develop in whatever
language they are most comfortable with and which best matches
the needs of the application.

<p>An SQLite application file format is a great 
choice in cases where there is a collection or "federation" of
separate programs, often written in different languages and by
different development teams.
This comes up commonly in research or laboratory
environments where one team is responsible for data acquisition
and one or more different teams are responsible for analysis of
the data.  Each team can use whatever hardware, operating system,
programming language and development methodology that they are 
most comfortable with, and as long as all programs use an SQLite 
database with a common schema as their application file format, 
they can all interoperate.


<li><p><b>Better Applications.</b>
If the application file format is an SQLite database, the complete
documentation for that file format consists of the database schema,
with perhaps a few extra words about what each table and column
represent.  The description of a custom file format,
on the other hand, typically runs on for hundreds of 
pages.  A pile-of-files format, while much simpler and easier to
describe than a fully custom format, still tends to be much larger
and more complex than an SQL schema dump, since the names and format
for the individual files must still be described.

<p>This is not a trivial point.  A clear, concise, and easy to understand
file format is a crucial part of any application design.
Fred Brooks, in his all-time best-selling computer science text,
<i>The Mythical Man-Month</i> says:
<blockquote><i>Representation is the
essence of computer programming.<br />...<br />
Show me your flowcharts and conceal your tables, and I shall 
continue to be mystified.  Show me your tables, and I won't usually
need your flowcharts; they'll be obvious.</i></blockquote>
<p>Rob Pike, in his
<i>Rules of Programming</i> expresses the same idea this way:
<blockquote>
<i>Data dominates.  If you've chosen the right data structures
and organized things well, the algorithms will almost always
be self-evident.  Data structures, not algorithms, are central
to programming.</i></blockquote>
<p> Linus Torvalds used different words to say
much the same thing on the Git mailing list on 2006-06-27: 
<blockquote>
<i>Bad programmers worry about the code.  Good programmers worry
about data structures and their relationships.</i>
</blockquote>

<p>The point is this: an SQL database schema almost always does 
a far better job of defining and organizing the tables and 
data structures and their relationships.
And having clear, concise, and well-defined representation
almost always results in an application that performs better,
has fewer problems, and is easier to develop and maintain.
</ol>

<h2>Conclusion</h2>

<p>
SQLite is not the perfect application file format for every situation.
But in many cases, SQLite is a far better choice than either a custom
file format, a pile-of-files, or a ZIP-ed pile-of-files.
SQLite is a high-level, stable, reliable, cross-platform, widely-deployed,
extensible, performant, accessible, concurrent file format.  It deserves
your consideration as the standard file format on your next application
design.

Changes to pages/whentouse.in.

52
53
54
55
56
57
58
59
60


61
62
63
64
65
66
67
It is designed to replace [http://man.he.net/man3/fopen | fopen()].
</p>

<h2>Situations Where SQLite Works Well</h2>

<ul>

<tcl>hd_fragment appfileformat {application file-format}</tcl>
<li><p><b>Application File Format</b></p>



<p>
SQLite has been used with great success as the on-disk file format
for desktop applications such as version control systems,
financial analysis tools, media cataloging and editing suites, CAD
packages, record keeping programs, and so forth.  The traditional
File/Open operation calls sqlite3_open() to attach to the database







|

>
>







52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
It is designed to replace [http://man.he.net/man3/fopen | fopen()].
</p>

<h2>Situations Where SQLite Works Well</h2>

<ul>

<tcl>hd_fragment appfileformat</tcl>
<li><p><b>Application File Format</b></p>

<p>(Further details [application file-format | here].)</p>

<p>
SQLite has been used with great success as the on-disk file format
for desktop applications such as version control systems,
financial analysis tools, media cataloging and editing suites, CAD
packages, record keeping programs, and so forth.  The traditional
File/Open operation calls sqlite3_open() to attach to the database