Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | First draft of a document that advocates for using SQLite as an application file format. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
aa63d8579423f4fea9afeeaf1e2b3985 |
User & Date: | drh 2014-03-12 18:12:24.817 |
Context
2014-03-13
| ||
00:43 | First complete draft of the new application file format document. Integrate with the rest of the documentation via hyperlinks. (check-in: 6d257b8d92 user: drh tags: trunk) | |
2014-03-12
| ||
18:12 | First draft of a document that advocates for using SQLite as an application file format. (check-in: aa63d85794 user: drh tags: trunk) | |
2014-03-11
| ||
15:30 | Version 3.8.4.1 (check-in: 103145ba79 user: drh tags: trunk, release, version-3.8.4.1) | |
Changes
Added pages/appfileformat.in.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | <title>SQLite As An Application File Format</title> <h2>What Is An Application File Format?</h2> <p> An "application file format" is the schema and/or low-level byte format used to persist appliation state to disk or to exchange information between programs. Some examples of application file formats include: <ul> <li>"*.doc" - Word Perfect and Microsoft Office documents <li>"*.dwg" - AutoCAD drawings <li>"*.pdf" - Portable Document Format from Adobe <li>"*.xls" - Microsoft Excel Spreadsheet <li>".git" - Git source code repository <li>"*.epub" - The Eletronic Publication format used by non-Kindle eBooks <li>"*.odt" - The Open Document format used by OpenOffice and others <li>"*.ppt" - Microsoft PowerPoint presentations <li>"*.odp" - The Open Document presentation format used by OpenOffice and others </ul> <p>The application file formats listed above can be divided into several categories: <ol> <li><p><b>Fully Custom Formats.</b> Custom formats are specifically designed for a single application. DOC, DWG, PDF, XLS, and PPT are examples of custom formats. Custom formats are usually contained within a single file, for ease of transport. They are also usually binary, though the DWG format is a notable exception. Custom file formats require specialized application code to read and write and are not normally accessible from commonly available tools such as unix command-line programs. Generally speaking, to access the content of a custom application file format, you have to have a tool that is specifically engineered to read and/or write that format. <li><p><b>Pile-of-Files Formats.</b> Sometimes the application state is stored in a directory hierarchy of files. Git is a prime example of this, though the phenomenon occurs frequently one-off and bespoke applications. A pile-of-files format essentially uses the filesystem as a key/value database and small chunks of information into separate files. This gives the advantage of making the content more accessible to common utility programs. But even if many of the files in a pile-of-files format are easily readable, there are usually some files that have their own custom format, such as Git "Packfiles", and are hence not readable or writable without specialized tools. It is much less convenient to move a pile-of-files from one place or machine to another, than it is to move a single file. It is hard to make a pile-of-files document into an email attachment, for example. Finally, a pile-of-files format breaks the "document metaphor": there is no one file that a user can point to that is the "document". <li><p><b>ZIP-ed Pile-of-Files Formats.</b> Some applications use a Pile-of-Files that is then encapsulated into a ZIP archive. EPUB, ODT,and ODP are examples of this approach. An EPUB book is really just a ZIP archive that contains various XML files for the text of book chapters, GIF and JPEG images for the artwork, and a specialized catalog file that tells the eBook reader how all the XML and image files fit together. OpenOffice documents (ODT and ODP) are also ZIP archives containing XML and images that represent their content as well as "catalog" files that show the interrelationships between the component parts. A ZIP-ed pile-of-files format is not quite as accessible as a pure pile-of-files format, since the component parts are not immediately visible. But a knowledgeable user can easily get to the component parts using any ZIP archiver. And ZIP-ed pile-of-files formats are not as easy to edit, since one most normally rewrite the entire file just to change one small component part. On the other hand, ZIP-ed pile-of-files formats have the advantage of being contained in a single disk file, which makes them easier to copy and attach and gives a more convenient metaphor for the "document". </ol> <h2>SQLite As The Application File Format</h2> <p> Designers of new applications are encouraged to consider using an SQLite database as their application file format. The use of SQLite as the application file format has a number of advantages over custom file formats, pile-of-file formats, and ZIP-ed pile-of-file formats. </p> <p><b>Ease Of Development.</b> No code is needed for reading or writing the application file. One has merely to link against the SQLite library, or include the [amalgamation | single "sqlite3.c" source file] with the rest of the application C code, and SQLite will take care of all of the application file I/O. This can save having to write, debug, and maintain thousands and thousands of lines of code. <p><b>Single-File Documents.</b> An SQLite database is contained in a single file, which is easily copied or moved or attached. The "document" metaphor is preserved. furthermore, SQLite does not have any database file naming requirements and so the application can use any custom file suffix that it wants. <p><b>High-level Query Language.</b> SQLite is a complete relational database engine, which means that the application can access content using high-level queries. Application developers need not spend time thinking about "how" to retrieve the information they need from a document. Developers write SQL that expresses "what" information they want and leave it to the database engine to figure out how to best retrieve that content. This helps developers to operate "heads up" and remain focused on solving the user's problem, and avoid time spent "heads down" fiddling with low-level file formatting details. <p><b>Accessible Content.</b> Information held in an SQLite database file is accessible using commonly available command-line tools - tools that are installed by default on Mac and Linux systems and that are freely available as a self-contained EXE file on Windows. Unlike custom file formats, application-specific programs are <u>not</u> required to read or write content in an SQLite database. An SQLite database is uses a [file format | well-defined and well-documented] file format that is in widespread use by literally hundreds of thousands of applications and is backwards compatible to its inception in 2003 and which promises to continue to be compatible in years to come. The longevity of SQLite database files is particularly important to bespoke applications, since it allows the document content to be accessed years or decades in the future, long after all traces of the original application have been lost. <p><b>Cross-Platform.</b> SQLite database files are portable between 32-bit and 64-bit machines and between big-endian and little-endian architectures and between windows and unix in all of their various flavors and incarnations. The application need not worry over the byte-order of integers or floating point numbers. Text content can be read or written as UTF-8, UTF-16LE, or UTF-16BE and SQLite will automatically perform any necessary translations on-the-fly. <p><b>Atomic Updates.</b> Writes to an SQLite database are atomic. They either happen completely or not at all, even in the face of system crashes or power failures. So there is no danger of corrupting a document just because the power happened to go out at the same instant that a change was being written to disk. <p><b>Incremental Updates.</b> When writing to an SQLite database file, only those parts of the file that actually change are written out to disk. This makes the writing happen faster and saves wear on SSDs. This is an enormous advantage of SQLite over custom and ZIP-ed pile-of-files formats, both of which must completely rewritten the entire document in order to change a single byte. Simple pile-of-files formats can also do incremental updates to some extent, though granularity of writes is larger with pile-of-file formats (a single file) than with SQLite (a single page). <p><b>Extensible.</b> As an application grows and evolves, new features can be added to an SQLite application file format simply by adding new tables to the schema or by adding new columns to existing tables. With a modicum of care, backwards compatibility of the application is preserved. It is also possible to create an extensible custom or pile-of-files format too, of course, but doing so requires more work and more design discipline. <p><b>Performance.</b> In many cases, an SQLite application file format can be faster than a custom or pile-of-files format. In the case of a custom format, SQLite often dramatically improves start-up times because instead of having to read and parse the entire document into memory, the application can do queries to extract only the information needed for the initial screen display. In the case of a pile-of-files format, it might be surprising, but SQLite can read and write smaller BLOBs from its database faster than they can be read or written as separate files. (See [Internal Versus External BLOBs] for further information.) <p><b>Safe For Concurrent Use By Multiple Processes.</b> SQLite automatically coordinates concurrent access to the same document from multiple threads and/or processes. Two or more applications can connect and read from the same document at the same time. Writes are serialized, but as writes normally only take milliseconds, usually applications simply take turns writing. SQLite automatically ensures that the low-level format of the document is uncorrupted. Accomplishing the same with a custom or pile-of-files format, in contrast, requires extensive support in the application. And the application code used to coordinate concurrent is a notoriously bug-magnet. <p><b>Works With Multiple Programming Languages.</b> Though SQLite is itself written in ANSI-C, interfaces exist for just about every other programming language you can think of: C++, C#, Objective-C, Java, Tcl, Perl, Python, Ruby, Erlang, JavaScript, and so forth. So programmers can develop in whatever language they are most comfortable with and which best matches the application. An SQLite application file format is a great choice in cases where there is a collection or "federation" of separate programs, often written in different langauges and by different development teams, that need to cooperate through the use of a common document format. <p><b>Built-In Support For Application Identification.</b> SQLite database file contain a 4-byte [Application ID] in their headers that can be set to an application-defined value and then used to identify the "type" of the file for utility programs such as [http://linux.die.net/man/1/file | file(1)]. <h2>Case Studies</h2> <h2>Summary</h2> |