Documentation Source Text: Artifact [f99fc1c46f]

Artifact f99fc1c46f490c7c616067d11bdfc456a763ee446bc21455e582538390a53354:

File pages/sqlar.in — part of check-in [45753832a8] at 2018-05-18 18:02:52 on branch trunk — Merge fixes from the 3.23 branch. (user: drh size: 11233) [more...]
<title>SQLite Archive Files</title>
<tcl>hd_keywords {SQLAR} {SQLite Archive} {SQL Archive} \
     {SQLite Archive format} {SQLite Archive files} </tcl>
<table_of_contents>

<h1>Introduction</h1>

<p>
An "SQLite Archive" is a file container similar to a 
[https://en.wikipedia.org/wiki/Zip_(file_format)|ZIP archive] or
[https://en.wikipedia.org/wiki/Tar_(computing)|Tarball] but
based on an SQLite database.

<p>
An SQLite Archive is an ordinary SQLite database file that contains the
following table as part of its schema:

<codeblock>
CREATE TABLE sqlar(
  name TEXT PRIMARY KEY,  -- name of the file
  mode INT,               -- access permissions
  mtime INT,              -- last modification time
  sz INT,                 -- original file size
  data BLOB               -- compressed content
);
</codeblock>

<p>
Each row of the SQLAR table holds the content of a single file.
The filename (the full pathname relative to the root of the archive)
is in the "name" field.
The "mode" field is an integer which is the unix-style access permissions
for the file.  "mtime" is the modification time of the file in seconds
since 1970.  "sz" is the original uncompressed size of the file.
The "data" field contains the file content.  The content is usually
compressed using [http://zlib.net/|Deflate], though not always.  If the
"sz" field is equal to the size of the "data" field, then the content
is stored uncompressed.

<tcl>hd_fragment dbasobj {database as object} \
     {database as container object}</tcl>
<h2>Database As Container Object</h2>

<p>
An SQLite Archive is one example of a more general
idea that an SQLite database can behave as a container object holding
lots of smaller data components.

<p>
With client/server databases like PostgreSQL or Oracle, users and
developers tend to think of the database as a service or a "node", not
as an object.  This is because the database content is spread out across
multiple files on the server, or possibly across multiple servers in a
service cluster.  One cannot point to a single file or even a single
directory and say "this is the database".

<p>
SQLite, in contrast, stores all content in a [file format|single file on disk].
That single file is something you can point to and say
"this is the database".  It behaves as an object.
An SQLite database file can be copied, renamed, sent as an
email attachment, passed as the argument a POST HTTP request,
or otherwise treated as other data object such as an image,
document, or media file.

<p>
Studies show that many applications already use
SQLite as a container object.  For example,
[https://odin.cse.buffalo.edu/papers/2015/TPCTC-sqlite-final.pdf|Kennedy]
(no relation to the [Dan Kennedy|SQLite developer]) reports that 14% of
Android applications never write to their SQLite databases.  It is
believed that these applications are downloading entire databases
from the cloud and then using the information locally as needed.  In other
words, the applications are using SQLite not so much as a database but as
a queryable wire-transfer format.

<h2>Applications Using SQLite Archives</h2>

<p>
The [https://fossil-scm.org/|Fossil Distributed Version Control] system
provides users with the option to download check-ins as either Tarballs,
ZIP Archives, or SQLite Archives.
<h1>Advantages Of SQLite Archives</h1>

<ol>
<li><p>
An SQLite Archive is flexible.
ZIP Archives and Tarballs are limited to storing only files.  An
SQLite Archive stores files plus whatever other tabular
and/or relational data seems useful to the application.

<li><p>
An SQLite Archive is transactional.
Updates are atomic and durable, even if there are crashes
or power losses in the middle of the update.
Readers see a consistent and unchanging version of the content even
is some other process is simultaneously updating the archive.

<li><p>
An SQLite Archive can be updated incrementally.
Individual files can be added or removed or replaced without having
to rewrite the entire archive.

<li><p>
An SQLite Archive can be queried using a high-level query language (SQL).
Some examples:
<ul>
<li> What is the total size of all files in the archive whose names
     end in ".h" or ".cpp"?
<li> What percentage of the files are compressed by less than 25%?
<li> How many executable files are in the archive?
</ul>
Questions like these (and countless others) can be answered without
having to uncompress or extract any content.

<li><p>
Applications that already use SQLite for other purposes can easily
add support for SQLite Archives using a small extension
([https://sqlite.org/src/file/ext/misc/sqlar.c]) to handle the compression
and decompression of content.  Even this tiny extension can be omitted
if the files in the archive are uncompressed.  In contrast, supporting
ZIP Archives and/or Tarballs requires either separate libraries or
lots of extra custom code, or sometimes both.

<li><p>
An SQLite Archive can work around firewall-imposed censorship.
For example, certain file types that are considered "dangerous" 
(examples: DLLs) will be
[https://support.google.com/mail/answer/6590|blocked by Gmail]
and probably many other email services and firewall, even if those
files are wrapped inside a ZIP Archive or Tarball.
But these firewall usually do not care about SQLite Archives and
so content can be put inside an SQLite Archive to evade censorship.
</ol>

<h1>Disadvantages Of SQLite Archives</h1>

<ol>
<li><p>
The SQLite Archive is a relatively new format.  It was first described in
in 2014.  ZIP Archives and Tarballs, on the other hand, have been around
for decades and are well-entrenched as standard formats.  Most programmers
know what a ZIP Archive or Tarball is, but if you say "SQLite Archive" you
are more likely to get a reply of "What?"  Tooling to process ZIP Archives
and Tarballs is more likely to be installed on stock computers.

<li><p>
Since an SQLite database is a more general format (it is designed to do
much more than simply store a bunch of files) it is not as compact as either
the ZIP Archive or Tarball formats.  An SQLite Archive is usually about 1%
larger than the equivalent ZIP Archive.  Tarballs are compressed as a single
unit rather than compressing each file separately as is done by both
SQLite and ZIP Archives.  For these reason, Tarballs tend to be smaller
than either ZIP or SQLite Archives.
<p>
As an example, the following table show the relative sizes for an
SQLite Archive, a ZIP Archive, and a Tarball of the 1,743 files
in the SQLite 3.22.0 source tree:
<table striped=1>
  <tr><td>SQLite Archive<td>10,754,048
  <tr><td>ZIP Archive (using Info-ZIP 3.0)<td>10,662,365
  <tr><td>ZIP Archive (using [zipfile])<td>10,390,215
  <tr><td>Tarball<td>&#x2007;9,781,109
</table>

<li><p>
An SQLite Archive supports only the [https://zlib.net/|Deflate] compression
method.  Tarballs and ZIP Archive support a wider assortment of
compression methods.
</ol>

<tcl>hd_fragment cltools {managing SQLite Archives from the command-line}</tcl>
<h1>Managing An SQLite Archive From The Command-Line</h1>

<p>
The recommended way of creating, updating, listing, and extracting
an SQLite Archive is to use the [sqlite3.exe command-line shell] 
for SQLite [version 3.23.0] ([dateof:3.23.0]) or later.  This CLI
supports the -A command-line option that allows easy management
of SQLite Archives.
The CLI for SQLite [version 3.22.0] ([dateof:3.22.0]) has the
[.archive command] for managing SQLite Archives, but that requires
interacting with the shell.

<p>
To list all of the files in an SQLite Archive named "example.sqlar"
using one of these commands:

<codeblock>
sqlite3 example.sqlar -At
sqlite3 example.sqlar -Atv
</codeblock>

<p>
To extract all files from an SQLite Archive named "example.sqlar":

<codeblock>
sqlite3 example.sqlar -Ax
</codeblock>

<p>
To create a new SQLite Archive named "alltxt.sqlar" containing all *.txt
files in the current directory:

<codeblock>
sqlite3 alltxt.sqlar -Ac *.txt
</codeblock>

<p>
To add or update files in an existing SQLite Archive:

<codeblock>
sqlite3 example.sqlar -Au *.md
</codeblock>

<p>
For usage hints and a summary of all options, simply give the [CLI]
the -A option with no additional arguments:

<codeblock>
sqlite3 -A
</codeblock>

<p>
All of these commands work the same way if the filename argument is
is a ZIP Archive instead of an SQLite database.

<h2>Other command-line tools</h2>

<p>
Just as there is the "zip" program to manage ZIP Archives, and the
"tar" program to manage Tarballs, the 
[https://sqlite.org/sqlar|"sqlar" program] exists to manage SQL Archives.
The "sqlar" program is able to create a new SQLite Archive, list the
content of an existing archive, add or remove files from the archive,
and/or extract files from the archive.
A separate "sqlarfs" program is able to mount the SQLite Archive as
a [https://github.com/libfuse/libfuse|Fuse Filesystem].

<h1>Managing SQLite Archives From Application Code</h1>

<p>
Applications can easily read or write SQLite Archives by linking against
SQLite and including the 
[https://sqlite.org/src/file/ext/misc/sqlar.c|ext/misc/sqlar.c] extension
to handle the compression and decompression.  The sqlar.c extension
creates two new SQL functions.

<dl>
<dt><b>sqlar_compress(X)</b></dt>
<dd><p>
The sqlar_compress(X) function attempts to compress a copy of the
string or blob X using the [https://zlib.net/|Default] algorithm and
returns the result as a blob.  If the input X is incompressible, then
a copy of X is returned.  This routine is used when inserting content
into an SQLite Archive.
<dt><b>sqlar_uncompress(Y,SZ)</b></dt>
<dd><p>
The sqlar_uncompress(Y,SZ) function will undo the compression accomplished
by sqlar_compress(X).  The Y parameter is the compressed content (the output
from a prior call to sqlar_compress()) and SZ is the original uncompressed
size of the input X that generated Y.  If SZ is less than or equal to the
size of Y, that indicates that no compression occurred, and so
sqlar_uncompress(Y,SZ) returns a copy of Y.  Otherwise, sqlar_uncompress(Y,SZ)
runs the Inflate algorithm on Y to uncompress it and restore it to its
original form and returns the uncompressed content.
This routine is used when extracting content from an SQLite Archive.
</dl>

<p>
Using the two routines above, it is simple for applications to insert
new records into or extract existing records from an SQLite Archive.
Insert a new into an SQLite Archive using code like this:

<codeblock>
INSERT INTO sqlar(name,mode,mtime,sz,data)
 VALUES ($name,$mode,strftime('%s',$mtime),
         length($content),sqlar_compress($content));
</codeblock>

<p>
Extract an entry from the SQLite Archive using code like this:

<codeblock>
SELECT name, mode, datetime(mtime,'unixepoch'), sqlar_uncompress(data,sz)
  FROM sqlar
 WHERE ...;
</codeblock>

<p>
The code above is for the general case.  For the special case of an
SQLite Archive that only stores uncompressed or uncompressible content
(this might come up, for example, in an SQLite Archive that stores only
JPEG, GIF, and/or PNG images) then the content can be inserted into
and extracted from the database without using the sqlar_compress()
and sqlar_uncompress() functions, and the sqlar.c extension is not
required.