Documentation Source Text

Check-in [a39452f2a5]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add the OpenDocument as SQLite case study. (Cherrypicked from trunk.)
Timelines: family | ancestors | descendants | both | version-3.8.6
Files: files | file ages | folders
SHA1: a39452f2a59f64e47b719bf9cd720a33279250ea
User & Date: drh 2014-09-08 12:11:55
Context
2014-09-08
12:34
Fix typos in the OpenDocument case study. check-in: 6c66d1ee77 user: drh tags: version-3.8.6
12:11
Add the OpenDocument as SQLite case study. (Cherrypicked from trunk.) check-in: a39452f2a5 user: drh tags: version-3.8.6
12:09
Add the OpenDocument as SQLite case study. check-in: 2c57c78315 user: drh tags: trunk
2014-08-27
15:25
Add the Aditya and Karn book on SQLite in Android. check-in: 9def66ffaf user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Added pages/affcase1.in.













































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
<tcl>hd_keywords {What If OpenOffice Used SQLite} \
{OpenOffice case study}</tcl>
<title>What If OpenDocument Used SQLite?</title>

<h1 align="center">
What If OpenDocument Used SQLite?</h1>

<h2>Introduction</h2>

<p>Suppose the
[http://en.wikipedia.org/wiki/OpenDocument|OpenDocument] file format,
and specifically the "ODP" OpenDocument Presentation format, where
built around SQLite.  Benefits would include:
<ul>
<li>Smaller documents
<li>Faster File/Save times
<li>Faster startup times
<li>Less memory used
<li>Document versioning
<li>A better user experience
</ul>

<p>
Note that this is only a thought experiment.
We are not suggesting that OpenDocument be changed.
Nor is this article a criticism of the current OpenDocument
design.  The point of this essay is to suggest ways to improve
future file format designs.

<h2>About OpenDocument And OpenDocument Presentation</h2>

<p>
The OpenDocument file format is used for office applications:
word processors, spreadsheets, and presentations.  It was originally
designed for the OpenOffice suite but has since been incorporated into
other desktop application suites.  The OpenOffice application has been
forked and renamed a few times.  Theis author's primary use for OpenDocument is 
building slide presentations with either 
[https://www.neooffice.org/neojava/en/index.php|NeoOffice] on Mac, or
[http://www.libreoffice.org/|LibreOffice] on Linux and Windows.

<p>
An OpenDocument Presentation or "ODP" file is a
[http://en.wikipedia.org/wiki/Zip_%28file_format%29|ZIP archive] containing
XML files describing presentation slides and separate image files for the
various images that are included as part of the presentation.
(OpenDocument word processors and spreedsheet files are similarly
structured but are not considered by this article.) The reader can
easily see the content of an ODP file by using the "zip -l" command.
For example, the following is the "zip -l" output from a 49-slide presentation
about SQLite that the author gave at the 2014
[http://southeastlinuxfest.org/|SouthEast LinuxFest]:

<blockquote><pre>
Archive:  self2014.odp
  Length      Date    Time    Name
---------  ---------- -----   ----
       47  2014-06-21 12:34   mimetype
        0  2014-06-21 12:34   Configurations2/statusbar/
        0  2014-06-21 12:34   Configurations2/accelerator/current.xml
        0  2014-06-21 12:34   Configurations2/floater/
        0  2014-06-21 12:34   Configurations2/popupmenu/
        0  2014-06-21 12:34   Configurations2/progressbar/
        0  2014-06-21 12:34   Configurations2/menubar/
        0  2014-06-21 12:34   Configurations2/toolbar/
        0  2014-06-21 12:34   Configurations2/images/Bitmaps/
    54702  2014-06-21 12:34   Pictures/10000000000001F40000018C595A5A3D.png
    46269  2014-06-21 12:34   Pictures/100000000000012C000000A8ED96BFD9.png
<i>... 58 other pictures omitted...</i>
    13013  2014-06-21 12:34   Pictures/10000000000000EE0000004765E03BA8.png
  1005059  2014-06-21 12:34   Pictures/10000000000004760000034223EACEFD.png
   211831  2014-06-21 12:34   content.xml
    46169  2014-06-21 12:34   styles.xml
     1001  2014-06-21 12:34   meta.xml
     9291  2014-06-21 12:34   Thumbnails/thumbnail.png
    38705  2014-06-21 12:34   Thumbnails/thumbnail.pdf
     9664  2014-06-21 12:34   settings.xml
     9704  2014-06-21 12:34   META-INF/manifest.xml
---------                     -------
 10961006                     78 files
</pre></blockquote>

<p>
The ODP ZIP archive contains four different XML files:
content.xml, styles.xml, meta.xml, and settings.xml.  Those four files
define the slide layout, text content, and styling.  This particular
presentation contains 62 images, ranging from full-screen pictures to
tiny icons, each stored as a separate file in the Pictures
folder.  The "mimetype" file contains a single line of text that says:

<blockquote><pre>
application/vnd.oasis.opendocument.presentation
</pre></blockquote>

<p>The purpose of the other files and folders is presently 
unknown to the author but is probably not difficult to figure out.

<h2>Limitations Of The OpenDocument Presentation Format</h2>

<p>
The use of a ZIP archive to encapsulate XML files plus resources is an
elegant approach to an application file format.
It is clearly superior to a custom binary file format.
But using an SQLite database as the
container, instead of ZIP, would be more elegant still.

<p>A ZIP archive is basically a key/value database, optimized for
the case of write-once/read-many and for a relatively small number
of distinct keys (a few hundred to a few thousand) each with a large BLOB
as its value.  A ZIP archive can be viewed as a "pile-of-files"
database.  This works, but it has some shortcomings relative to an
SQLite database, as follows:

<ol>
<li><p><b>Incremental update is hard.</b>
<p>
It is difficult to update individual entries in a ZIP archive.
It is especially difficult to update individual entries in a ZIP
archive in a way that does not destroy
the entire document if the computer loses power and/or crashes
in the middle of the update.  It is not impossible to do this, but
it is sufficiently difficult that nobody actually does it.  Instead, whenever
the user selects "File/Save", the entire ZIP archive is rewritten.  
Hence, "File/Save" takes longer than it ought, especially on
older hardware.  Newer machines are faster, but it is still bothersome
that changing a single character in a 50 megabyte presentation causes one
to burn through 50 megabytes of the finite write life on the SSD.

<li><p><b>Startup is slow.</b>
<p>
In keeping with the pile-of-files theme, OpenDocument stores all slide 
content in a single big XML file named "content.xml".  
LibreOffice reads and parses this entire file just to display
the first slide.
LibreOffice also seems to
read all images into memory as well, which makes sense seeing as when
the user does "File/Save" it is going to have to write them all back out
again, even though none of them changed.  The net effect is that
start-up is slow.  Double-clicking an OpenDocument file brings up a
progress bar rather than the first slide.
This results in a bad user experience.
The situation grows ever more annoying as
the document size increases.

<li><p><b>More memory is required.</b>
<p>
Because ZIP archives are optimized for storing big chunks of content, they
encourage a style of programming where the entire document is read into
memory at startup, all editing occurs in memory, then the entire document
is written to disk during "File/Save".  OpenOffice and its descendents
embrace that pattern.

<p>
One might argue that it is ok, in this era of multi-gigabyte desktops, to
read the entire document into memory.
But it is not ok.
For one, the amount of memory used far exceeds the (compressed) file size
on disk.  So a 50MB presentation might take 200MB or more RAM.  
That still is not a problem if one only edits a single document at a time.  
But when working on a talk, this author will typically have 10 or 15 different 
presentations up all at the same
time (to facilitate copy/paste of slides from past presentation) and so
gigabytes of memory are required.
Add in an open web browser or two and a few other 
desktop apps, and suddenly the disk is whirling and the machine is swapping.
And even having just a single document is a problem when working
on an inexpensive Chromebook retrofitted with Ubuntu.
Using less memory is always better.
</p>

<li><p><b>Crash recovery is difficult.</b>
<p>
The descendents of OpenOffice tend to segfault more often than commercial
competitors.  Perhaps for this reason, the OpenOffice forks make
periodic backups of their in-memory documents so that users do not lose
all pending edits when the inevitable application crash does occur.
This causes frustrating pauses in the application for the few seconds
while each backup is being made.
After restarting from a crash, the user is presented with a dialog box
that walks them through the recovery process.  Managing the crash
recovery this way involves lots of extra application logic and is
generally an annoyance to the user.

<li><p><b>Content is inaccessible.</b>
<p>
One cannot easily view, change, or extract the content of an 
OpenDocument presentation using generic tools.
The only reasonable way to view or edit an OpenDocument document is to open
it up using an application that is specifically designed to read or write
OpenDocument (read: LibreOffice or one of its cousins).  The situation
could be worse.  One can extract and view individual images (say) from
a presentation using just the "zip" archiver tool.  But it is not reasonable
try to extract the text from a slide.  Remember that all content is stored
in a single "context.xml" file.  That file is XML, so it is a text file.
But it is not a text file that can be managed with an ordinary text
editor.  For the example presentation above, the content.xml file
consist of exactly two lines. The first line of the file is just:

<blockquote><pre>
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
</pre></blockquote>

<p>The second line of the file contains 211792 characters of
impenetrable XML.  Yes, 211792 characters all on one line.
This file is a good stress-test for a text editor.
Thankfully, the file is not some obscure
binary format, but in terms of accessibility, it might as well be
written in Sanskrit.
</ol>

<h2>First Improvement:  Replace ZIP with SQLite</h2>

<p>
Let us suppose that instead of using a ZIP archive to store its files,
OpenDocument used a very simple SQLite database with the following
single-table schema:

<blockquote><pre>
CREATE TABLE OpenDocTree(
  filename TEXT PRIMARY KEY,  -- Name of file
  filesize BIGINT,            -- Size of file after decompression
  content BLOB                -- Compressed file content
);
</pre></blockquote>

<p>
For this first experiment, nothing else about the file format is changed.
The OpenDocument is still a pile-of-files, only now each file is a row
in an SQLite database rather than an entry in a ZIP archive.
This simple change does not use the power of a relational
database.  Even so, this simple change shows some improvements.

<tcl>hd_fragment smaller {SQLAR smaller than ZIP}</tcl>
<p>
Surprisingly, using SQLite in place of ZIP makes the presentation
file smaller.  Really.  One would think that a relational database file
would be larger than a ZIP archive, but at least in the case of NeoOffice
that is not so.  The following is an actual screen-scrape showing
the sizes of the same NeoOffice presentation, both in its original 
ZIP archive format as generated by NeoOffice (self2014.odp), and 
as repacked as an SQLite database using the 
[http://www.sqlite.org/sqlar/doc/trunk/README.md|SQLAR] utility:

<blockquote><pre>
-rw-r--r--  1 drh  staff  10514994 Jun  8 14:32 self2014.odp
-rw-r--r--  1 drh  staff  10464256 Jun  8 14:37 self2014.sqlar
-rw-r--r--  1 drh  staff  10416644 Jun  8 14:40 zip.odp
</pre></blockquote>

<p>
The SQLite database file ("self2014.sqlar") is about a
half percent smaller than the equivalent ODP file!  How can this be?
Apparently the ZIP archive generator logic in NeoOffice
is not as efficient as it could be, because when the same pile-of-files
is recompressed using the command-line "zip" utility, one gets a file
("zip.odp") that is smaller still, by another half percent, as seen
in the third line above.  So, a well-written ZIP archive
can be slightly smaller than the equivalent SQLite database, as one would
expect.  But the difference is slight.  The key take-away is that an
SQLite database is size-competitive with a ZIP archive.

<p>
The other advantage to using SQLite in place of
ZIP is that the document can now be updated incrementally, without risk
of corrupting the document if a power loss or other crash occurs in the
middle of the update.  (Remember that writes to 
[atomic commit|SQLite databases are atomic].)   True, all the
content is still kept in a single big XML file ("content.xml") which must
be completely rewritten if so much as a single character changes.  But
with SQLite, only that one file needs to change.  The other 77 files in the
repository can remain unaltered.  They do not all have to be rewritten,
which in turn makes "File/Save" run much faster and saves wear on SSDs.

<h2>Second Improvement:  Split content into smaller pieces</h2>

<p>
A pile-of-files encourages content to be stored in a few large chunks.
In the case of ODP, there are just four XML files that define the layout
off all slides in a presentation.  An SQLite database allows storing
information in a few large chunks, but SQLite is also adept and efficient
at storing information in numerous smaller pieces.

<p>
So then, instead of storing all content for all slides in a single
oversized XML file ("content.xml"), suppose there was a separate table
for storing the content of each slide separately.  The table schema
might look something like this:

<blockquote><pre>
CREATE TABLE slide(
  pageNumber INTEGER,   -- The slide page number
  slideContent TEXT     -- Slide content as XML or JSON
);
CREATE INDEX slide_pgnum ON slide(pageNumber); -- Optional
</pre></blockquote>

<p>The content of each slide could still be stored as compressed XML.
But now each page is stored separately.  So when opening a new document,
the application could simply run:

<blockquote><pre>
SELECT slideContent FROM slide WHERE pageNumber=1;
</pre></blockquote>

<p>This query will quickly and efficiently return the content of the first
slide, which could then be speedily parsed and displayed to the user.
Only one page needs to be read and parsed in order render the first screen,
which means that the first screen appears much faster and
there is no longer a need for an annoying progress bar.

<p>If the application wanted
to keep all content in memory, it could continue reading and parsing the
other pages using a background thread after drawing the first page.  Or,
since reading from SQLite is so efficient, the application might 
instead choose to reduce its memory footprint and only keep a single
slide in memory at a time.  Or maybe it keeps the current slide and the
next slide in memory, to facility rapid transitions to the next slide.

<p>
Notice that dividing up the content into smaller pieces using an SQLite
table gives flexibility to the implementation.  The application can choose
to read all content into memory at startup.  Or it can read just a
few pages into memory and keep the rest on disk.  Or it can read just
single page into memory at a time.  And different versions of the application
can make different choices without having to make any changes to the
file format.  Such options are not available when all content is in
a single big XML file in a ZIP archive.

<p>
Splitting content into smaller pieces also helps File/Save operations
to go faster.  Instead of having to write back the content of all pages
when doing a File/Save, the application only has to write back those
pages that have actually changed.

<p>
One minor downside of splitting content into smaller pieces is that
compression does not work as well on shorter texts and so the size of
the document might increase.  But as the bulk of the document space 
is used to store images, a small reduction in the compression efficiency 
of the text content will hardly be noticeable, and is a small price 
to pay for an improved user experience.

<h2>Third Improvement:  Versioning</h2>

<p>
Once one is comfortable with the concept of storing each slide separately,
it is a small step to support versioning of the presentation.  Consider
the following schema:

<blockquote><pre>
CREATE TABLE slide(
  slideId INTEGER PRIMARY KEY,
  derivedFrom INTEGER REFERENCES slide,
  content TEXT     -- XML or JSON or whatever
);
CREATE TABLE version(
  versionId INTEGER PRIMARY KEY,
  priorVersion INTEGER REFERENCES version,
  checkinTime DATETIME,   -- When this version was saved
  comment TEXT,           -- Description of this version
  manifest TEXT           -- List of integer slideIds
);
</pre></blockquote>

<p>
In this schema, instead of each slide having a page number that determines
its order within the presentation, each slide has a unique
integer identifier that is unrelated to where it occurs in sequence.
The order of slides in the presentation is determined by a list of
slideIds, stored as a text string in the MANIFEST column of the VERSION
table.
Since multiple entries are allowed in the VERSION table, that means that
multiple presentations can be stored in the same document.

<p>
On startup, the application first decides which version it
wants to display.  Since the versionId will naturally increase in time
and one would normally want to see the latest version, an appropriate
query might be:

<blockquote><pre>
SELECT manifest, versionId FROM version ORDER BY versionId DESC LIMIT 1;
</pre></blockquote>

<p>
Or perhaps the application would rather use the
most recent checkinTime:

<blockquote><pre>
SELECT manifest, versionId, max(checkinTime) FROM version;
</pre></blockquote>

<p>
Using a single query such as the above, the application obtains a list
of the slideIds for all slides in the presentation.  The application then
queries for the content of the first slide, and parses and displays that
content, as before.

<p>(Aside:  Yes, that second query above that uses "max(checkinTime)"
really does work and really does return a well-defined answer in SQLite.
Such a query either returns an undefined answer or generates an error
in many other SQL database engines, but in SQLite it does what you would 
expect: it returns the manifest and versionId of the entry that has the
maximum checkinTime.)

<p>When the user does a "File/Save", instead of overwriting the modified
slides, the application can now make new entries in the SLIDE table for
just those slides that have been added or altered.  Then it creates a
new entry in the VERSION table containing the revised manifest.

<p>The VERSION table shown above has columns to record a check-in comment
(presumably supplied by the user) and the time and date at which the File/Save
action occurred.  It also records the parent version to record the history
of changes.  Perhaps the manifest could be stored as a delta from the
parent version, though typically the manifest will be small enough that
storing a delta might be more trouble than it is worth.  The SLIDE table
also contains a derivedFrom column which could be used for delta encoding
it it is determined that saving the slide content as a delta from its
previous version is a worthwhile optimization.

<p>So with this simple change, the ODP file now stores not just the most
recent edit to the presentation, but a history of all historic edits.  The
user would normally want to see just the most recent edition of the
presentation, but if desired, the user can now go backwards in time to 
see historical versions of the same presentation.

<p>Or, multiple presentations could be stored within the same document.

<p>With such a schema, the application would no longer need to make
periodic backups of the unsaved changes to a separate file to avoid lost
work in the event of a crash.  Instead, a special "pending" version could
be allocated and unsaved changes could be written into the pending version.
Because only changes would need to be written, not the entire document,
saving the pending changes would only involve writing a few kilobytes of
content, not multiple megabytes, and would take milliseconds instead of
seconds, and so it could be done frequently and silently in the background.
Then when a crash occurs and the user reboots, all (or almost all)
of their work is retained.  If the user decides to discard unsaved changes, 
they simply go back to the previous version.

<p>
There are details to fill in here.
Perhaps a screen can be provided that displays a history changes
(perhaps with a graph) allowing the user to select which version they
want to view or edit.  Perhaps some facility can be provided to merge
forks that might occur in the version history.  And perhaps the
application should provide a means to purge old and unwanted versions.
The key point is that using an SQLite database to store the content,
rather than a ZIP archive, makes all of these features much, much easier
to implement, which increases the possibility that they will eventually
get implemented.

<h2>And So Forth...</h2>

<p>
In the previous sections, we have seen how moving from a key/value
store implemented as a ZIP archive to a simple SQLite database
with just three tables can add significant capabilities to an application
file format.
We could continue to enhance the schema with new tables, with indexes
added for performance, with triggers and views for programming convenience,
and constraints to enforce consistency of content even in the face of
programming errors.  Further enhancement ideas include:
<ul>
<li> Store the undo/redo stack in a database table so that
     Undo could go back into prior edit sessions.
<li> Add [FTS4|full text search] capabilities to the slide deck, or across
     multiple slide decks.
<li> Decompose the "settings.xml" file into an SQL table kind that
     is more easily viewed and edited by separate applications.
<li> Break out the "Presentor Notes" from each slide into a separate
     table, for easier access from third-party applications and/or scripts.
<li> Enhance the presentation concept beyond the simple linear sequence of
     slides to allow for side-tracks and excursions to be taken depending on
     how the audience is responding.
</ul>

<p>
An SQLite database has a lot of capability, which
this essay has only begun to touch upon.  But hopefully this quick glimpse
has convinced some readers that using an SQL database as an application
file format is worth a second look.

<p>
Some readers might resist using SQLite as an application
file format due to prior exposure to enterprise SQL databases and
the caveats and limitations of those other systems.  
For example, many enterprise database
engines advise against storing large strings or BLOBs in the database
and instead suggest that large strings and BLOBs be stored as separate
files and the filename stored in the database.  But SQLite 
is not like that.  Any column of an SQLite database can hold
a string or BLOB up to about a gigabyte in size.  And for strings and
BLOBs of 100 kilobytes or less, 
[BLOB I/O performance|I/O performance is better] than using separate
files.

<p>
Some readers might be reluctant to consider SQLite as an application
file format because they have been inculcated with the idea that all
SQL database schemas must be factored into third normal form and store
only small primitive data types such as strings and integers.  Certainly
relational theory is important and designers should strive to understand
it.  But, as demonstrated above, it is often quite acceptable to store
complex information as XML or JSON in text fields of a database.
Do what works, not what your database professor said you ought to do.

<h2>Review Of The Benefits Of Using SQLite</h2>

<p>
In summary,
the claim of this essay is that using SQLite as a container for an application
file format like OpenDocument
and storing lots of smaller objects in that container
works out much better than using a ZIP archive holding a few larger objects.
To wit:

<ol>
<li><p>
An SQLite database file is approximately the same size, and in some cases
smaller, than a ZIP archive holding the same information.

<li><p>
The [atomic commit|atomic update capabilities]
of SQLite allow small incremental changes
to be safely written into the document.  This reduces total disk I/O
and improves File/Save performance, enhancing the user experience.

<li><p>
Startup time is reduced by allowing the application to read in only the
content shown for the initial screen.  This largely eliminates the
need to show a progress bar when opening a new document.  The document
just pops up immediately, further enhancing the user experience.

<li><p>
The memory footprint of the application can be dramatically reduced by
only loading content that is relevant to the current display and keeping
the bulk of the content on disk.  The fast query capability of SQLite
make this a viable alternative to keeping all content in memory at all times.
And when applications use less memory, it makes the entire computer more
responsive, further enhancing the user experience.

<li><p>
The schema of an SQL database is able to represent information more directly
and succinctly than a key/value database such as a ZIP archive.  This makes
the document content more accessible to third-party applications and scripts
and facilitates advanced features such as built-in document versioning, and
incremental saving of work in progress for recovery after a crash.
</ol>

<p>
These are just a few of the benefits of using SQLite as an application file
format &mdash; the benefits that seem most likely to improve the user
experience for applications like OpenOffice.  Other applications might
benefit from SQLite in different ways. See the [Application File Format]
document for additional ideas.

<p>
Finally, let us reiterate that this essay is a though experiment.
The OpenDocument format is well-established and already well-designed.
Nobody really believes that OpenDocument should be changed to use SQLite
as its container instead of ZIP.  Nor is this article a criticism of
OpenDocument for not choosing SQLite as its container since OpenDocument
predates SQLite.  Rather, the point of this article is to use OpenDocument
as a concrete example of how SQLite can be used to build a better 
application file formats for future projects.

Changes to pages/appfileformat.in.

142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
...
162
163
164
165
166
167
168
169

170
171
172
173
174
175
176
Any application state that can be recorded in a pile-of-files can
also be recorded in an SQLite database with a simple key/value schema
like this:
<blockquote><pre>
CREATE TABLE files(filename TEXT PRIMARY KEY, content BLOB);
</pre></blockquote>
If the content is compressed, then such an SQLite database is only
slightly larger (less than 2% larger)
than an equivalent ZIP archive, and it has the advantage
of being able to update individual "files" without rewrite
the entire document.

<p>
But an SQLite database is not limited to a simple key/value structure
like a pile-of-files database.  An SQLite database can have dozens
................................................................................
And all of this structure is succinctly documented for humans
by the SQL schema.

<p>In other words, an SQLite database can do everything that a 
pile-of-files or wrapped pile-of-files format can do, plus much more,
and with greater lucidity.
An SQLite database is a more versatile container than key/value
filesystem or a ZIP archive.


<p>The power of an SQLite database could, in theory, be achieved using
a custom file format.  But any custom file format that is as expressive
as a relational database would likely require an enormous design specification 
and many tens or hundreds of thousands of lines of code to 
implement.  And the end result would be an "opaque blob" that is
inaccessible without specialized tools.







|







 







|
>







142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
...
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
Any application state that can be recorded in a pile-of-files can
also be recorded in an SQLite database with a simple key/value schema
like this:
<blockquote><pre>
CREATE TABLE files(filename TEXT PRIMARY KEY, content BLOB);
</pre></blockquote>
If the content is compressed, then such an SQLite database is only
slightly larger, and [SQLAR smaller than ZIP|sometimes smaller] 
than an equivalent ZIP archive, and it has the advantage
of being able to update individual "files" without rewrite
the entire document.

<p>
But an SQLite database is not limited to a simple key/value structure
like a pile-of-files database.  An SQLite database can have dozens
................................................................................
And all of this structure is succinctly documented for humans
by the SQL schema.

<p>In other words, an SQLite database can do everything that a 
pile-of-files or wrapped pile-of-files format can do, plus much more,
and with greater lucidity.
An SQLite database is a more versatile container than key/value
filesystem or a ZIP archive.  (For a detailed example, see the
[OpenOffice case study] essay.)

<p>The power of an SQLite database could, in theory, be achieved using
a custom file format.  But any custom file format that is as expressive
as a relational database would likely require an enormous design specification 
and many tens or hundreds of thousands of lines of code to 
implement.  And the end result would be an "opaque blob" that is
inaccessible without specialized tools.