Documentation Source Text
Check-in [1b422ce8de]
Not logged in

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
SHA1 Hash:1b422ce8de0ff259acb0d76bb56ed31c6056ff4b
Date: 2014-03-14 16:35:01
User: drh
Comment:Further tuning of the application file format document.
Tags And Properties
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/appfileformat.in

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
..
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
...
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149

150
151
152
153
154
155
156
157
158
159

160


161













162
163
164
165

166
167
168
169
170
171
172
173
174
175
176
177
...
179
180
181
182
183
184
185

186
187
188
189
190
191
192
193

194
195
196
197
198
199
200
...
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
...
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
...
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
<h1 align="center">
SQLite As An Application File Format
</h1>

<h2>Executive Summary</h2>

<p>An SQLite database file with a defined schema
often make an excellent application file format.
Here are a dozen reaons why this is so:

<ol>
<li> Simplified Application Development
<li> Single-File Documents
<li> High-Level Query Language
<li> Accessible Content
<li> Cross-Platform
................................................................................
<li> Performance
<li> Concurrent Use By Multiple Processes
<li> Multiple Programming Languages
<li> Better Applications
</ol>

<p>Each of these points will be described in more detail below,
after first considering more closely what this article means by
"application file format".

<h2>What Is An Application File Format?</h2>

<p>
An "application file format" is the file format
used to persist application state to disk or to exchange
information between programs.
There are thousands of application file formats in use today.
Here are a few examples:

<ul>
<li>DOC - Word Perfect and Microsoft Office documents
<li>DWG - AutoCAD drawings
<li>PDF - Portable Document Format from Adobe
<li>XLS - Microsoft Excel Spreadsheet
<li>GIT - Git source code repository
................................................................................

<p>The purpose of this document is to argue in favor of a fourth
new catagory of application file format: An SQLite database file.

<h2>SQLite As The Application File Format</h2>

<p>
An SQLite database file makes an excellent alternative to a
custom or pile-of-files application format.  In its simplest form,
an SQLite database with a single key/value table like
<blockquote><pre>
CREATE TABLE files(filename TEXT PRIMARY KEY, content BLOB);
</pre></blockquote>
could serve as a direct replacement for a wrapped pile-of-files format.
If the content is compressed, then such an SQLite database is only

slightly larger than an equivalent ZIP archive, and it has the advantage
of being able to write individual "files" without having to rewrite
the entire document.

<p>
But an SQLite database is not limited to a simple key/value structure
like a pile-of-files database.  An SQLite database can have dozens
or hundreds or thousands of different of tables, with dozens or
hundreds or thousands of fields per table, each with different datatypes and
particular meanings, all cross-referencing each other, and all stored

efficiently and compactly in a single disk file.
















<p>
Compared to other approaches, the use of
an SQLite database as an application file format has
compelling advantages:

</p>

<ol>
<li><p><b>Simplified Application Development.</b>
No code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can reduce application code size by many thousands of
lines, with corresponding saving in development and maintenance costs.

<p>SQLite is widely used and stable.  There are literally billions of
................................................................................
SQLite is [tested | carefully tested] and proven reliable.  It is not
a component that needs much tuning or debugging, allowing developers
to stay focused on application logic.

<li><p><b>Single-File Documents.</b>
An SQLite database is contained in a single file, which is easily
copied or moved or attached.  The "document" metaphor is preserved.

SQLite does not have any file naming requirements
and so the application can use any custom file suffix that it wants
to help identify the file as "belonging" to the application.

<p>SQLite database files contain a 4-byte [Application ID] in
their headers that can be set to an application-defined value
and then used to identify the "type" of the document for utility
programs such as [http://linux.die.net/man/1/file | file(1)].



<li><p><b>High-Level Query Language.</b>
SQLite is a complete relational database engine, which means that the
application can access content using high-level queries.  Application
developers need not spend time thinking about "how" to retrieve the
information they need from a document.  Developers write SQL that
................................................................................

<li><p><b>Accessible Content.</b>
Information held in an SQLite database file is accessible using 
commonly available open-source command-line tools - tools that 
are installed by default on Mac and Linux systems and that are 
freely available as a self-contained EXE file on Windows.
Unlike custom file formats, application-specific programs are
<u>not</u> required to read or write content in an SQLite database.
An SQLite database file is <u>not</u> an opaque blob.  It is true
that command-line tools such as text editors or "grep" or "awk" are
not useful on an SQLite database, but the SQL query language is a much
more powerful and convenient way for examining the the content, so the
inability to use "grep" and "awk" and the like is not seen as a loss.

<p>An SQLite database is a [file format | well-defined and well-documented]
file format that is in widespread use by literally hundreds of 
................................................................................
together such that either all or none of them occur, and so that the
changes can be rolled back if a problem is found prior to to commit.
This allows an application to make a change incrementally, then run
various sanity and consistency checks on the resulting data prior to
committing the changes to disk.  The
[http://www.fossil-scm.org/ | Fossil] DVCS 
[http://www.fossil-scm.org/fossil/doc/tip/www/selfcheck.wiki|uses this technique]
to verify that no repository history has been lost prior to each change
to the repository.

<li><p><b>Incremental And Continuous Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage over custom
and wrapped pile-of-files formats, both of which must completely
rewrite the entire document in order to change a single byte.  
Pure pile-of-files formats can also
do incremental updates to some extent, though the granularity of writes is 
usually larger with pile-of-file formats (a single file) than with SQLite
(a single page).

<p>A desktop application built on SQLite can also do continuous update.
In other words, instead of collecting changes in memory and then writing
them to disk only on a File/Save action, changes can be written back to
the disk as they occur.  The avoids loss of work on a system crash or
power failure.  The undo/redo stack can also be kept in the on-disk 
database, meaning that undo/redo can occur across session boundaries.
And maintenance of the undo/redo stack can often be automated using
triggers.

<li><p><b>Easily Extensible.</b>
As an application grows, new features can be added to an
SQLite application file format simply by adding new tables to the schema
or by adding new columns to existing tables.  Adding columns or tables
does not change the meaning of prior queries against the database, so with a 
modicum of care to ensuring that the meaning of the columns and
tables are preserved, backwards compatibility is maintained.

<p>It is possible to extend custom or pile-of-files formats too, of course,
but doing is often much harder.  If indices are added, then all application
code that changes the corresponding tables must be located and modified to
keep those indices up-to-date.  If columns are added, then all application
code that accesses the corresponding table must be located and modified to 
................................................................................

<p>An SQLite application file format is a great 
choice in cases where there is a collection or "federation" of
separate programs, often written in different languages and by
different development teams.
This comes up commonly in research or laboratory
environments where one team is responsible for data acquisition
and one or more different teams are responsible for analysis of
the data.  Each team can use whatever hardware, operating system,
programming language and development methodology that they are 
most comfortable with, and as long as all programs use an SQLite 
database with a common schema as their application file format, 
they can all interoperate.


<li><p><b>Better Applications.</b>
If the application file format is an SQLite database, the complete
documentation for that file format consists of the database schema,
with perhaps a few extra words about what each table and column
represent.  The description of a custom file format,
on the other hand, typically runs on for hundreds of 
pages.  A pile-of-files format, while much simpler and easier to
describe than a fully custom format, still tends to be much larger
and more complex than an SQL schema dump, since the names and format
for the individual files must still be described.

<p>This is not a trivial point.  A clear, concise, and easy to understand







|
|







 







|









|







 







|
|
|



<

>
|
|







|
>
|
>
>

>
>
>
>
>
>
>
>
>
>
>
>
>

|

|
>




|







 







>
|


<
|


|
>







 







|
|







 







|
<





|
|











|






|
|







 







|
|










|







5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
..
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
...
135
136
137
138
139
140
141
142
143
144
145
146
147

148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
...
196
197
198
199
200
201
202
203
204
205
206

207
208
209
210
211
212
213
214
215
216
217
218
...
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
...
273
274
275
276
277
278
279
280

281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
...
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
<h1 align="center">
SQLite As An Application File Format
</h1>

<h2>Executive Summary</h2>

<p>An SQLite database file with a defined schema
often makes an excellent application file format.
Here are a dozen reasons why this is so:

<ol>
<li> Simplified Application Development
<li> Single-File Documents
<li> High-Level Query Language
<li> Accessible Content
<li> Cross-Platform
................................................................................
<li> Performance
<li> Concurrent Use By Multiple Processes
<li> Multiple Programming Languages
<li> Better Applications
</ol>

<p>Each of these points will be described in more detail below,
after first considering more closely the meaning of
"application file format".

<h2>What Is An Application File Format?</h2>

<p>
An "application file format" is the file format
used to persist application state to disk or to exchange
information between programs.
There are thousands of application file formats in use today.
Here are just a few examples:

<ul>
<li>DOC - Word Perfect and Microsoft Office documents
<li>DWG - AutoCAD drawings
<li>PDF - Portable Document Format from Adobe
<li>XLS - Microsoft Excel Spreadsheet
<li>GIT - Git source code repository
................................................................................

<p>The purpose of this document is to argue in favor of a fourth
new catagory of application file format: An SQLite database file.

<h2>SQLite As The Application File Format</h2>

<p>
Any application state that can be recorded in a pile-of-files can
also be recorded in an SQLite database with a simple key/value schema
like this:
<blockquote><pre>
CREATE TABLE files(filename TEXT PRIMARY KEY, content BLOB);
</pre></blockquote>

If the content is compressed, then such an SQLite database is only
slightly larger (less than 2% larger)
than an equivalent ZIP archive, and it has the advantage
of being able to update individual "files" without rewrite
the entire document.

<p>
But an SQLite database is not limited to a simple key/value structure
like a pile-of-files database.  An SQLite database can have dozens
or hundreds or thousands of different of tables, with dozens or
hundreds or thousands of fields per table, each with different datatypes and
and constraints and particular meanings, all cross-referencing each other,
appropriately and automatically indexed for rapid retrieval,
and all stored efficiently and compactly in a single disk file.
And all of this structure is succinctly documented for humans
by the SQL schema.

<p>In other words, an SQLite database can do everything that a 
pile-of-files or wrapped pile-of-files format can do, plus much more,
and with greater lucidity.
An SQLite database is a more versatile container than key/value
filesystem or a ZIP archive.

<p>The power of an SQLite database could, in theory, be achieved using
a custom file format.  But any custom file format that is as expressive
as a relational database would likely require an enormous design specification 
and many tens or hundreds of thousands of lines of code to 
implement.  And the end result would be an "opaque blob" that is
inaccessible without specialized tools.

<p>
Hence, in comparison to other approaches, the use of
an SQLite database as an application file format has
compelling advantages.  Here are a few of these advantages,
enumerated and expounded:
</p>

<ol>
<li><p><b>Simplified Application Development.</b>
No new code is needed for reading or writing the application file.
One has merely to link against the SQLite library, or include the 
[amalgamation | single "sqlite3.c" source file] with the rest of the
application C code, and SQLite will take care of all of the application
file I/O.  This can reduce application code size by many thousands of
lines, with corresponding saving in development and maintenance costs.

<p>SQLite is widely used and stable.  There are literally billions of
................................................................................
SQLite is [tested | carefully tested] and proven reliable.  It is not
a component that needs much tuning or debugging, allowing developers
to stay focused on application logic.

<li><p><b>Single-File Documents.</b>
An SQLite database is contained in a single file, which is easily
copied or moved or attached.  The "document" metaphor is preserved.

<p>SQLite does not have any file naming requirements
and so the application can use any custom file suffix that it wants
to help identify the file as "belonging" to the application.

SQLite database files contain a 4-byte [Application ID] in
their headers that can be set to an application-defined value
and then used to identify the "type" of the document for utility
programs such as [http://linux.die.net/man/1/file | file(1)], futher
enhancing the document metaphor.


<li><p><b>High-Level Query Language.</b>
SQLite is a complete relational database engine, which means that the
application can access content using high-level queries.  Application
developers need not spend time thinking about "how" to retrieve the
information they need from a document.  Developers write SQL that
................................................................................

<li><p><b>Accessible Content.</b>
Information held in an SQLite database file is accessible using 
commonly available open-source command-line tools - tools that 
are installed by default on Mac and Linux systems and that are 
freely available as a self-contained EXE file on Windows.
Unlike custom file formats, application-specific programs are
not required to read or write content in an SQLite database.
An SQLite database file is not an opaque blob.  It is true
that command-line tools such as text editors or "grep" or "awk" are
not useful on an SQLite database, but the SQL query language is a much
more powerful and convenient way for examining the the content, so the
inability to use "grep" and "awk" and the like is not seen as a loss.

<p>An SQLite database is a [file format | well-defined and well-documented]
file format that is in widespread use by literally hundreds of 
................................................................................
together such that either all or none of them occur, and so that the
changes can be rolled back if a problem is found prior to to commit.
This allows an application to make a change incrementally, then run
various sanity and consistency checks on the resulting data prior to
committing the changes to disk.  The
[http://www.fossil-scm.org/ | Fossil] DVCS 
[http://www.fossil-scm.org/fossil/doc/tip/www/selfcheck.wiki|uses this technique]
to verify that no repository history has been lost prior to each change.


<li><p><b>Incremental And Continuous Updates.</b>
When writing to an SQLite database file, only those parts of the file that
actually change are written out to disk.  This makes the writing happen faster
and saves wear on SSDs.  This is an enormous advantage over custom
and wrapped pile-of-files formats, both of which usually require a
rewrite of the entire document in order to change a single byte.  
Pure pile-of-files formats can also
do incremental updates to some extent, though the granularity of writes is 
usually larger with pile-of-file formats (a single file) than with SQLite
(a single page).

<p>A desktop application built on SQLite can also do continuous update.
In other words, instead of collecting changes in memory and then writing
them to disk only on a File/Save action, changes can be written back to
the disk as they occur.  The avoids loss of work on a system crash or
power failure.  The undo/redo stack can also be kept in the on-disk 
database, meaning that undo/redo can occur across session boundaries.
Maintenance of the undo/redo stack can often be automated using SQL
triggers.

<li><p><b>Easily Extensible.</b>
As an application grows, new features can be added to an
SQLite application file format simply by adding new tables to the schema
or by adding new columns to existing tables.  Adding columns or tables
does not change the meaning of prior queries, so with a 
modicum of care to ensuring that the meaning of legacy columns and
tables are preserved, backwards compatibility is maintained.

<p>It is possible to extend custom or pile-of-files formats too, of course,
but doing is often much harder.  If indices are added, then all application
code that changes the corresponding tables must be located and modified to
keep those indices up-to-date.  If columns are added, then all application
code that accesses the corresponding table must be located and modified to 
................................................................................

<p>An SQLite application file format is a great 
choice in cases where there is a collection or "federation" of
separate programs, often written in different languages and by
different development teams.
This comes up commonly in research or laboratory
environments where one team is responsible for data acquisition
and other teams are responsible for various stages of analysis.
Each team can use whatever hardware, operating system,
programming language and development methodology that they are 
most comfortable with, and as long as all programs use an SQLite 
database with a common schema as their application file format, 
they can all interoperate.


<li><p><b>Better Applications.</b>
If the application file format is an SQLite database, the complete
documentation for that file format consists of the database schema,
with perhaps a few extra words about what each table and column
represents.  The description of a custom file format,
on the other hand, typically runs on for hundreds of 
pages.  A pile-of-files format, while much simpler and easier to
describe than a fully custom format, still tends to be much larger
and more complex than an SQL schema dump, since the names and format
for the individual files must still be described.

<p>This is not a trivial point.  A clear, concise, and easy to understand