Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Further tuning of the application file format document. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
1b422ce8de0ff259acb0d76bb56ed31c |
User & Date: | drh 2014-03-14 16:35:01.257 |
Context
2014-03-14
| ||
22:27 | More edits to the application-file-format document. (check-in: c0ac5630f2 user: drh tags: trunk) | |
16:35 | Further tuning of the application file format document. (check-in: 1b422ce8de user: drh tags: trunk) | |
2014-03-13
| ||
15:38 | Improvements to the application file format document. (check-in: 9e12f0cedd user: drh tags: trunk) | |
Changes
Changes to pages/appfileformat.in.
1 2 3 4 5 6 7 8 9 10 11 | <tcl>hd_keywords *appformat {application file-format} \ {Application File Format}</tcl> <title>SQLite As An Application File Format</title> <h1 align="center"> SQLite As An Application File Format </h1> <h2>Executive Summary</h2> <p>An SQLite database file with a defined schema | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | <tcl>hd_keywords *appformat {application file-format} \ {Application File Format}</tcl> <title>SQLite As An Application File Format</title> <h1 align="center"> SQLite As An Application File Format </h1> <h2>Executive Summary</h2> <p>An SQLite database file with a defined schema often makes an excellent application file format. Here are a dozen reasons why this is so: <ol> <li> Simplified Application Development <li> Single-File Documents <li> High-Level Query Language <li> Accessible Content <li> Cross-Platform <li> Atomic Transactions <li> Incremental And Continuous Updates <li> Easily Extensible <li> Performance <li> Concurrent Use By Multiple Processes <li> Multiple Programming Languages <li> Better Applications </ol> <p>Each of these points will be described in more detail below, after first considering more closely the meaning of "application file format". <h2>What Is An Application File Format?</h2> <p> An "application file format" is the file format used to persist application state to disk or to exchange information between programs. There are thousands of application file formats in use today. Here are just a few examples: <ul> <li>DOC - Word Perfect and Microsoft Office documents <li>DWG - AutoCAD drawings <li>PDF - Portable Document Format from Adobe <li>XLS - Microsoft Excel Spreadsheet <li>GIT - Git source code repository |
︙ | ︙ | |||
135 136 137 138 139 140 141 | <p>The purpose of this document is to argue in favor of a fourth new catagory of application file format: An SQLite database file. <h2>SQLite As The Application File Format</h2> <p> | < | | > < > | | | > | > > > > > > > > > > > > > > > | | > | > | < | | > | 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 | <p>The purpose of this document is to argue in favor of a fourth new catagory of application file format: An SQLite database file. <h2>SQLite As The Application File Format</h2> <p> Any application state that can be recorded in a pile-of-files can also be recorded in an SQLite database with a simple key/value schema like this: <blockquote><pre> CREATE TABLE files(filename TEXT PRIMARY KEY, content BLOB); </pre></blockquote> If the content is compressed, then such an SQLite database is only slightly larger (less than 2% larger) than an equivalent ZIP archive, and it has the advantage of being able to update individual "files" without rewrite the entire document. <p> But an SQLite database is not limited to a simple key/value structure like a pile-of-files database. An SQLite database can have dozens or hundreds or thousands of different of tables, with dozens or hundreds or thousands of fields per table, each with different datatypes and and constraints and particular meanings, all cross-referencing each other, appropriately and automatically indexed for rapid retrieval, and all stored efficiently and compactly in a single disk file. And all of this structure is succinctly documented for humans by the SQL schema. <p>In other words, an SQLite database can do everything that a pile-of-files or wrapped pile-of-files format can do, plus much more, and with greater lucidity. An SQLite database is a more versatile container than key/value filesystem or a ZIP archive. <p>The power of an SQLite database could, in theory, be achieved using a custom file format. But any custom file format that is as expressive as a relational database would likely require an enormous design specification and many tens or hundreds of thousands of lines of code to implement. And the end result would be an "opaque blob" that is inaccessible without specialized tools. <p> Hence, in comparison to other approaches, the use of an SQLite database as an application file format has compelling advantages. Here are a few of these advantages, enumerated and expounded: </p> <ol> <li><p><b>Simplified Application Development.</b> No new code is needed for reading or writing the application file. One has merely to link against the SQLite library, or include the [amalgamation | single "sqlite3.c" source file] with the rest of the application C code, and SQLite will take care of all of the application file I/O. This can reduce application code size by many thousands of lines, with corresponding saving in development and maintenance costs. <p>SQLite is widely used and stable. There are literally billions of SQLite database files in use daily, on smartphones and in desktop applications. SQLite is [tested | carefully tested] and proven reliable. It is not a component that needs much tuning or debugging, allowing developers to stay focused on application logic. <li><p><b>Single-File Documents.</b> An SQLite database is contained in a single file, which is easily copied or moved or attached. The "document" metaphor is preserved. <p>SQLite does not have any file naming requirements and so the application can use any custom file suffix that it wants to help identify the file as "belonging" to the application. SQLite database files contain a 4-byte [Application ID] in their headers that can be set to an application-defined value and then used to identify the "type" of the document for utility programs such as [http://linux.die.net/man/1/file | file(1)], futher enhancing the document metaphor. <li><p><b>High-Level Query Language.</b> SQLite is a complete relational database engine, which means that the application can access content using high-level queries. Application developers need not spend time thinking about "how" to retrieve the information they need from a document. Developers write SQL that |
︙ | ︙ | |||
213 214 215 216 217 218 219 | <li><p><b>Accessible Content.</b> Information held in an SQLite database file is accessible using commonly available open-source command-line tools - tools that are installed by default on Mac and Linux systems and that are freely available as a self-contained EXE file on Windows. Unlike custom file formats, application-specific programs are | | | | 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | <li><p><b>Accessible Content.</b> Information held in an SQLite database file is accessible using commonly available open-source command-line tools - tools that are installed by default on Mac and Linux systems and that are freely available as a self-contained EXE file on Windows. Unlike custom file formats, application-specific programs are not required to read or write content in an SQLite database. An SQLite database file is not an opaque blob. It is true that command-line tools such as text editors or "grep" or "awk" are not useful on an SQLite database, but the SQL query language is a much more powerful and convenient way for examining the the content, so the inability to use "grep" and "awk" and the like is not seen as a loss. <p>An SQLite database is a [file format | well-defined and well-documented] file format that is in widespread use by literally hundreds of |
︙ | ︙ | |||
255 256 257 258 259 260 261 | together such that either all or none of them occur, and so that the changes can be rolled back if a problem is found prior to to commit. This allows an application to make a change incrementally, then run various sanity and consistency checks on the resulting data prior to committing the changes to disk. The [http://www.fossil-scm.org/ | Fossil] DVCS [http://www.fossil-scm.org/fossil/doc/tip/www/selfcheck.wiki|uses this technique] | | < | | | | | | 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | together such that either all or none of them occur, and so that the changes can be rolled back if a problem is found prior to to commit. This allows an application to make a change incrementally, then run various sanity and consistency checks on the resulting data prior to committing the changes to disk. The [http://www.fossil-scm.org/ | Fossil] DVCS [http://www.fossil-scm.org/fossil/doc/tip/www/selfcheck.wiki|uses this technique] to verify that no repository history has been lost prior to each change. <li><p><b>Incremental And Continuous Updates.</b> When writing to an SQLite database file, only those parts of the file that actually change are written out to disk. This makes the writing happen faster and saves wear on SSDs. This is an enormous advantage over custom and wrapped pile-of-files formats, both of which usually require a rewrite of the entire document in order to change a single byte. Pure pile-of-files formats can also do incremental updates to some extent, though the granularity of writes is usually larger with pile-of-file formats (a single file) than with SQLite (a single page). <p>A desktop application built on SQLite can also do continuous update. In other words, instead of collecting changes in memory and then writing them to disk only on a File/Save action, changes can be written back to the disk as they occur. The avoids loss of work on a system crash or power failure. The undo/redo stack can also be kept in the on-disk database, meaning that undo/redo can occur across session boundaries. Maintenance of the undo/redo stack can often be automated using SQL triggers. <li><p><b>Easily Extensible.</b> As an application grows, new features can be added to an SQLite application file format simply by adding new tables to the schema or by adding new columns to existing tables. Adding columns or tables does not change the meaning of prior queries, so with a modicum of care to ensuring that the meaning of legacy columns and tables are preserved, backwards compatibility is maintained. <p>It is possible to extend custom or pile-of-files formats too, of course, but doing is often much harder. If indices are added, then all application code that changes the corresponding tables must be located and modified to keep those indices up-to-date. If columns are added, then all application code that accesses the corresponding table must be located and modified to |
︙ | ︙ | |||
348 349 350 351 352 353 354 | <p>An SQLite application file format is a great choice in cases where there is a collection or "federation" of separate programs, often written in different languages and by different development teams. This comes up commonly in research or laboratory environments where one team is responsible for data acquisition | | | | | 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 | <p>An SQLite application file format is a great choice in cases where there is a collection or "federation" of separate programs, often written in different languages and by different development teams. This comes up commonly in research or laboratory environments where one team is responsible for data acquisition and other teams are responsible for various stages of analysis. Each team can use whatever hardware, operating system, programming language and development methodology that they are most comfortable with, and as long as all programs use an SQLite database with a common schema as their application file format, they can all interoperate. <li><p><b>Better Applications.</b> If the application file format is an SQLite database, the complete documentation for that file format consists of the database schema, with perhaps a few extra words about what each table and column represents. The description of a custom file format, on the other hand, typically runs on for hundreds of pages. A pile-of-files format, while much simpler and easier to describe than a fully custom format, still tends to be much larger and more complex than an SQL schema dump, since the names and format for the individual files must still be described. <p>This is not a trivial point. A clear, concise, and easy to understand |
︙ | ︙ |