Documentation Source Text

Artifact Content
Login

Artifact ae3e37f28521a657a927cd808437f07b509e2368:


<title>Temporary Files Used By SQLite</title>
<tcl>hd_keywords {temporary disk files}</tcl>

<h1 align="center">SQLite's Use Of Temporary Disk Files</h1>

<h2>1.0 Introduction</h2>

<p>
One of the <a href="different.html">distinctive features</a> of
SQLite is that a database consists of a single disk file.
This simplifies the use of SQLite since moving or backing up a
database is a simple as copying a single file.  It also makes
SQLite appropriate for use as an
<a href="whentouse.html#appfileformat">application file format</a>.
But while a complete database is held in a single disk file,
SQLite does make use of many temporary files during the
course of processing a database.
</p>

<p>
This article describes the various temporary files that SQLite
creates and uses.  It describes when the files are created, when
they are deleted, what they are used for, why they are important,
and how to avoid them on systems where creating temporary files is
expensive.
</p>

<p>
The manner in which SQLite uses temporary files is not considered
part of the contract that SQLite makes with applications.  The
information in this document is a correct description of how
SQLite operates at the time that this document was written or last
updated.  But there is no guarantee that future versions of SQLite
will use temporary files in the same way.  New kinds of temporary
files might be employed  and some of
the current temporary file uses might be discontinued
in future releases of SQLite.
</p>

<tcl>hd_fragment types</tcl>
<h2>2.0 Nine Kinds Of Temporary Files</h2>

<p>
SQLite currently uses nine distinct types of temporary files:
</p>

<ol>
<li>Rollback journals</li>
<li>Master journals</li>
<li>Write-ahead Log (WAL) files</li>
<li>Shared-memory files</li>
<li>Statement journals</li>
<li>TEMP databases</li>
<li>Materializations of views and subqueries</li>
<li>Transient indices</li>
<li>Transient databases used by VACUUM</li>
</ol>

<p>
Additional information about each of these temporary file types
is in the sequel.
</p>

<tcl>hd_fragment rollbackjrnl</tcl>
<h3>2.1 Rollback Journals</h3>

<p>
A rollback journal is a temporary file used to implement
atomic commit and rollback capabilities in SQLite.
(For a detailed discussion of how this works, see
the separate document titled
<a href="atomiccommit.html">Atomic Commit In SQLite</a>.)
The rollback journal is always located in the same directory
as the database file and has the same name as the database
file except with the 8 characters "<b>-journal</b>" appended.
The rollback journal is usually created when a transaction
is first started and is usually deleted when a transaction
commits or rolls back.
The rollback journal file is essential for implementing the
atomic commit and rollback capabilities of SQLite.  Without
a rollback journal, SQLite would be unable to rollback an
incomplete transaction, and if a crash or power loss occurred
in the middle of a transaction the entire database would likely
go corrupt without a rollback journal.
</p>

<p>
The rollback journal is <i>usually</i> created and destroyed at the
start and end of a transaction, respectively.  But there are exceptions
to this rule.
</p>

<p>
If a crash or power loss occurs in the middle of a transaction,
then the rollback journal file is left on disk.  The next time
another application attempts to open the database file, it notices
the presence of the abandoned rollback journal (we call it a "hot
journal" in this circumstance) and uses the information in the
journal to restore the database to its state prior to the start
of the incomplete transaction.  This is how SQLite implements
atomic commit.
</p>

<p>
If an application puts SQLite in 
[PRAGMA locking_mode | exclusive locking mode] using
the pragma:
</p>

<blockquote><pre>
PRAGMA locking_mode=EXCLUSIVE;
</pre></blockquote>

<p>
SQLite creates a new rollback journal at the start of the first
transaction within an exclusive locking mode session.  But at the
conclusion of the transaction, it does not delete the rollback
journal.  The rollback journal might be truncated, or its header
might be zeroed (depending on what version of SQLite you are using)
but the rollback journal is not deleted.  The rollback journal is
not deleted until exclusive access mode is exited.</p>

<p>
Rollback journal creation and deletion is also changed by the
[journal_mode pragma].
The default journaling mode is DELETE, which is the default behavior
of deleting the rollback journal file at the end of each transaction,
as described above.  The PERSIST journal mode foregoes the deletion of
the journal file and instead overwrites the rollback journal header
with zeros, which prevents other processes from rolling back the
journal and thus has the same effect as deleting the journal file, though
without the expense of actually removing the file from disk.  In other
words, journal mode PERSIST exhibits the same behavior as is seen
in EXCLUSIVE locking mode. The
OFF journal mode causes SQLite to omit the rollback journal, completely.
In other words, no rollback journal is ever written if journal mode is
set to OFF.
The OFF journal mode disables the atomic
commit and rollback capabilities of SQLite.  The ROLLBACK command
is not available when OFF journal mode is set.  And if a crash or power
loss occurs in the middle of a transaction that uses the OFF journal
mode, no recovery is possible and the database file will likely
go corrupt.
The MEMORY journal mode causes the rollback journal to be stored in
memory rather than on disk.  The ROLLBACK command still works when
the journal mode is MEMORY, but because no file exists on disks for
recovery, a crash or power loss in the middle of a transaction that uses
the MEMORY journal mode will likely result in a corrupt database.
</p>

<tcl>hd_fragment walfile</tcl>
<h3>2.2 Write-Ahead Log (WAL) Files</h3>

<p>
A write-ahead log or WAL file is used in place of a rollback journal
when SQLite is operating in [WAL mode].  As with the rollback journal,
the purpose of the WAL file is to implement atomic commit and rollback.
The WAL file is always located in the same directory
as the database file and has the same name as the database
file except with the 4 characters "<b>-wal</b>" appended.
The WAL file is created when the first connection to the
database is opened and is normally removed when the last
connection to the database closes.  However, if the last connection
does not shutdown cleanly, the WAL file will remain in the filesystem
and will be automatically cleaned up the next time the database is
opened.
</p>

<tcl>hd_fragment shmfile</tcl>
<h3>2.3 Shared-Memory Files</h3>

<p>
When operating in [WAL mode], all SQLite database connections associated
with the same database file need to share some memory that is used as an
index for the WAL file.  In most implementations, this shared memory is
implemented by calling mmap() on a file created for this sole purpose:
the shared-memory file.  The shared-memory file, if it exists, is located
in the same directory as the database file and has the same name as the
database file except with the 4 characters "<b>-shm</b>" appended.
Shared memory files only exist while running in WAL mode.
</p>

<p>
The shared-memory file contains no persistent content.  The only purpose
of the shared-memory file is to provide a block of shared memory for use
by multiple processes all accessing the same database in WAL mode.
If the [VFS] is able to provide an alternative method for accessing shared
memory, then that alternative method might be used rather than the
shared-memory file.  For example, if [PRAGMA locking_mode] is set to
EXCLUSIVE (meaning that only one process is able to access the database
file) then the shared memory will be allocated from heap rather than out
of the shared-memory file, and the shared-memory file will never be
created.
</p>

<p>
The shared-memory file has the same lifetime as its associated WAL file.
The shared-memory file is created when the WAL file is created and is
deleted when the WAL file is deleted.  During WAL file recovery, the
shared memory file is recreated from scratch based on the contents of
the WAL file being recovered.
</p>

<tcl>hd_fragment masterjrnl {master journal}</tcl>
<h3>2.4 Master Journal Files</h3>

<p>
The master journal file is used as part of the atomic commit
process when a single transaction makes changes to multiple
databases that have been added to a single [database connection]
using the [ATTACH] statement.  The master journal file is always
located in the same directory as the main database file
(the main database file is the database that is identified
in the original [sqlite3_open()], [sqlite3_open16()], or
[sqlite3_open_v2()] call that created the [database connection])
with a randomized suffix.  The master journal file contains
the names of all of the various attached auxiliary databases
that were changed during the transaction.  The multi-database
transaction commits when the master journal file is deleted.
See the documentation titled
<a href="atomiccommit.html">Atomic Commit In SQLite</a> for
additional detail.
</p>

<p>
Without the master journal, the transaction commit on a multi-database
transaction would be atomic for each database individually, but it
would not be atomic across all databases.  In other words, if the
commit were interrupted in the middle by a crash or power loss, then
the changes to one of the databases might complete while the changes
to another database might roll back.  The master journal causes all
changes in all databases to either rollback or commit together.
</p>

</p>
The master journal file is only created for [COMMIT] operations that
involve multiple database files where at least two of the databases 
meet all of the following requirements:
<ol>
<li>The database is modified by the transaction
<li>The [PRAGMA synchronous] setting is not OFF
<li>The [PRAGMA journal_mode] is not OFF, MEMORY, or WAL
</ol>
This means that SQLite transactions are not atomic
across multiple database files on a power-loss when the database 
files have synchronous turned off or when they are using journal 
modes of OFF, MEMORY, or WAL.  For synchronous OFF and for
journal_modes OFF and MEMORY, database will usually corrupt if
a transaction commit is interrupted by a power loss.  For 
[WAL mode], individual database files are updated atomically
across a power-loss, but in the case of a multi-file transactions,
some files might rollback while others roll forward after
power is restored.
</p>


<tcl>hd_fragment stmtjrnl {statement journal} {statement journals} \
   {Statement journals}</tcl>
<h3>2.5 Statement Journal Files</h3>

<p>
A statement journal file is used to rollback partial results of
a single statement within a larger transaction.  For example, suppose
an UPDATE statement will attempt to modify 100 rows in the database.
But after modifying the first 50 rows, the UPDATE hits
a constraint violation which should block the entire statement.
The statement journal is used to undo the first 50 row changes
so that the database is restored to the state it was in at the start
of the statement.
</p>

<p>
A statement journal is only created for an UPDATE or INSERT statement
that might change multiple rows of a database and which might hit a
constraint or a RAISE exception within a trigger and thus need to
undo partial results.
If the UPDATE or INSERT is not contained within BEGIN...COMMIT and if
there are no other active statements on the same database connection then
no statement journal is created since the ordinary
rollback journal can be used instead.
The statement journal is also omitted if an alternative
[ON CONFLICT | conflict resolution algorithm] is
used.  For example:
</p>

<blockquote><pre>
UPDATE OR FAIL ...
UPDATE OR IGNORE ...
UPDATE OR REPLACE ...
UPDATE OR ROLLBACK ...
INSERT OR FAIL ...
INSERT OR IGNORE ...
INSERT OR REPLACE ...
INSERT OR ROLLBACK ...
REPLACE INTO ....
</pre></blockquote>

<p>
The statement journal is given a randomized name, not necessarily
in the same directory as the main database, and is automatically
deleted at the conclusion of the transaction.  The size of the
statement journal is proportional to the size of the change implemented
by the UPDATE or INSERT statement that caused the statement journal
to be created.
</p>

<tcl>hd_fragment tempdb</tcl>
<h3>2.6 TEMP Databases</h3>

<p>Tables created using the "CREATE TEMP TABLE" syntax are only
visible to the [database connection] in which the "CREATE TEMP TABLE"
statement is originally evaluated.  These TEMP tables, together
with any associated indices, triggers, and views, are collectively
stored in a separate temporary database file that is created as
soon as the first "CREATE TEMP TABLE" statement is seen.
This separate temporary database file also has an associated
rollback journal.
The temporary database file used to store TEMP tables is deleted
automatically when the [database connection] is closed
using [sqlite3_close()].
</p>

<p>
The TEMP database file is very similar to auxiliary database
files added using the [ATTACH] statement, though with a few
special properties.
The TEMP database is always automatically deleted when the
[database connection] is closed.
The TEMP database always uses the
[synchronous=OFF] and [journal_mode=PERSIST]
PRAGMA settings.
And, the TEMP database cannot be used with [DETACH] nor can
another process [ATTACH] the TEMP database.
</p>

<p>
The temporary files associated with the TEMP database and its
rollback journal are only created if the application makes use
of the "CREATE TEMP TABLE" statement.
</p>

<tcl>hd_fragment views</tcl>
<h3>2.7 Materializations Of Views And Subqueries</h3>

<p>Queries that contain subqueries must sometime evaluate
the subqueries separately and store the results in a temporary
table, then use the content of the temporary table to evaluate
the outer query.
We call this "materializing" the subquery.
The query optimizer in SQLite attempts to avoid materializing,
but sometimes it is not easily avoidable.
The temporary tables created by materialization are each stored
in their own separate temporary file, which is automatically
deleted at the conclusion of the query.
The size of these temporary tables depends on the amount of
data in the materialization of the subquery, of course.
</p>

<p>
A subquery on the right-hand side of IN operator must often
be materialized.  For example:
</p>

<blockquote><pre>
SELECT * FROM ex1 WHERE ex1.a IN (SELECT b FROM ex2);
</pre></blockquote>

<p>
In the query above, the subquery "SELECT b FROM ex2" is evaluated
and its results are stored in a temporary table (actually a temporary
index) that allows one to determine whether or not a value ex2.b
exists using a simple binary search.  Once this table is constructed,
the outer query is run and for each prospective result row a check
is made to see if ex1.a is contained within the temporary table.
The row is output only if the check is true.
</p>

<p>
To avoid creating the temporary table, the query might be rewritten
as follows:
</p>

<blockquote><pre>
SELECT * FROM ex1 WHERE EXISTS(SELECT 1 FROM ex2 WHERE ex2.b=ex1.a);
</pre></blockquote>

<p>
Recent versions of SQLite (version 3.5.4 and later)
will do this rewrite automatically
if an index exists on the column ex2.b.
</p>

<p>
If the right-hand side of an IN operator can be list of values
as in the following:
</p>
<blockquote><pre>
SELECT * FROM ex1 WHERE a IN (1,2,3);
</pre></blockquote>
<p>
List values on the right-hand side of IN are treated as a 
subquery that must be materialized.  In other words, the
previous statement acts as if it were:
</p>
<blockquote><pre>
SELECT * FROM ex1 WHERE a IN (SELECT 1 UNION ALL
                              SELECT 2 UNION ALL
                              SELECT 3);
</pre></blockquote>
<p>
A temporary index is always used to hold the values of the
right-hand side of an IN operator when that right-hand side
is a list of values.
</p>

<p>
Subqueries might also need to be materialized when they appear
in the FROM clause of a SELECT statement.  For example:
</p>

<blockquote><pre>
SELECT * FROM ex1 JOIN (SELECT b FROM ex2) AS t ON t.b=ex1.a;
</pre></blockquote>

<p>
Depending on the query, SQLite might need to materialize the 
"(SELECT b FROM ex2)" subquery into a temporary table, then
perform the join between ex1 and the temporary table.  The
query optimizer tries to avoid this by "flattening" the
query.  In the previous example the query can be flattened,
and SQLite will automatically transform the query into
</p>

<blockquote><pre>
SELECT ex1.*, ex2.b FROM ex1 JOIN ex2 ON ex2.b=ex1.a;
</blockquote></pre>

<p>
More complex queries may or may not be able to employ query
flattening to avoid the temporary table.  Whether or not
the query can be flattened depends on such factors as whether
or not the subquery or outer query contain aggregate functions,
ORDER BY or GROUP BY clauses, LIMIT clauses, and so forth.
The rules for when a query can and cannot be flattened are
very complex and are beyond the scope of this document.
</p>

<tcl>hd_fragment transidx</tcl>
<h3>2.8 Transient Indices</h3>

<p>
SQLite may make use of transient indices to
implement SQL language features such as:
</p>

<ul>
<li>An ORDER BY or GROUP BY clause</li>
<li>The DISTINCT keyword in an aggregate query</li>
<li>Compound SELECT statements joined by UNION, EXCEPT, or INTERSECT</li>
</ul>

<p>
Each transient index is stored in its own temporary file.
The temporary file for a transient index is automatically deleted
at the end of the statement that uses it.
</p>

<p>
SQLite strives to implement ORDER BY clauses using a preexisting
index.  If an appropriate index already exists, SQLite will walk
the index, rather than the underlying table, to extract the 
requested information, and thus cause the rows to come out in
the desired order.  But if SQLite cannot find an appropriate index
it will evaluate the query and store each row in a transient index
whose data is the row data and whose key is the ORDER BY terms.
After the query is evaluated, SQLite goes back and walks the
transient index from beginning to end in order to output the
rows in the desired order.
</p>

<p>
SQLite implements GROUP BY by ordering the output rows in the
order suggested by the GROUP BY terms.  Each output row is
compared to the previous to see if it starts a new "group".
The ordering by GROUP BY terms is done in exactly the same way
as the ordering by ORDER BY terms.  A preexisting index is used
if possible, but if no suitable index is available, a transient
index is created.
</p>

<p>
The DISTINCT keyword on an aggregate query is implemented by
creating a transient index in a temporary file and storing
each result row in that index.  As new result rows are computed
a check is made to see if they already exist in the transient
index and if they do the new result row is discarded.
</p>

<p>
The UNION operator for compound queries is implemented by creating
a transient index in a temporary file and storing the results
of the left and right subquery in the transient index, discarding
duplicates.  After both subqueries have been evaluated, the
transient index is walked from beginning to end to generate the final output.
</p>

<p>
The EXCEPT operator for compound queries is implemented by creating
a transient index in a temporary file, storing the results of the
left subquery in this transient index, then removing the result 
from right subquery from the transient index, and finally walking
the index from beginning to end to obtain the final output.
</p>

<p>
The INTERSECT operator for compound queries is implemented by
creating two separate transient indices, each in a separate
temporary file.  The left and right subqueries are evaluated
each into a separate transient index.  Then the two indices
are walked together and entries that appear in both indices
are output.
</p>

<p>
Note that the UNION ALL operator for compound queries does not
use transient indices by itself (though of course the right
and left subqueries of the UNION ALL might use transient indices
depending on how they are composed.)

<tcl>hd_fragment vacuumdb</tcl>
<h3>2.9 Transient Database Used By [VACUUM]</h3>

<p>
The [VACUUM] command works by creating a temporary file
and then rebuilding the entire database into that temporary
file.  Then the content of the temporary file is copied back
into the original database file and the temporary file is
deleted.
</p>

<p>
The temporary file created by the [VACUUM] command exists only
for the duration of the command itself.  The size of the temporary
file will be no larger than the original database.
</p>

<tcl>hd_fragment tempstore *tempstore</tcl>
<h2>3.0 The SQLITE_TEMP_STORE Compile-Time Parameter and Pragma</h2>

<p>
The temporary files associated with transaction control, namely
the rollback journal, master journal, write-ahead log (WAL) files,
and shared-memory files, are always written to disk.
But the other kinds of temporary files might be stored in memory
only and never written to disk.
Whether or not temporary files other than the rollback,
master, and statement journals are written to disk or stored only in memory
depends on the [SQLITE_TEMP_STORE] compile-time parameter, the
[temp_store pragma],
and on the size of the temporary file.
</p>

<p>
The [SQLITE_TEMP_STORE] compile-time parameter is a #define whose value is
an integer between 0 and 3, inclusive.  The meaning of the
[SQLITE_TEMP_STORE] compile-time parameter is as follows:
</p>

<ol type="1">
<li value="0">
Temporary files are always stored on disk regardless of the setting
of the [temp_store pragma].
</li>
<li value="1">
Temporary files are stored on disk by default but this can be
overridden by the [temp_store pragma].
</li>
<li value="2">
Temporary files are stored in memory by default but this can be
overridden by the [temp_store pragma].
</li>
<li value="3">
Temporary files are always stored in memory regardless of the setting
of the [temp_store pragma].
</li>
</ol>

<p>
The default value of the [SQLITE_TEMP_STORE] compile-time parameter is 1,
which means to store temporary files on disk but provide the option
of overriding the behavior using the [temp_store pragma].
</p>

<p>
The [temp_store pragma] has
an integer value which also influences the decision of where to store
temporary files.  The values of the temp_store pragma have the
following meanings:
</p>

<ol type="1">
<li value="0">
Use either disk or memory storage for temporary files as determined
by the [SQLITE_TEMP_STORE] compile-time parameter.
</li>
<li value="1">
If the [SQLITE_TEMP_STORE] compile-time parameter specifies memory storage for
temporary files, then override that decision and use disk storage instead.
Otherwise follow the recommendation of the [SQLITE_TEMP_STORE] compile-time
parameter.
</li>
<li value="2">
If the [SQLITE_TEMP_STORE] compile-time parameter specifies disk storage for
temporary files, then override that decision and use memory storage instead.
Otherwise follow the recommendation of the [SQLITE_TEMP_STORE] compile-time
parameter.
</li>
</ol>

<p>
The default setting for the [temp_store pragma] is 0,
which means to following the recommendation of [SQLITE_TEMP_STORE] compile-time
parameter.
</p>

<p>
To reiterate, the [SQLITE_TEMP_STORE] compile-time parameter and the 
[temp_store pragma] only
influence the temporary files other than the rollback journal
and the master journal.  The rollback journal and the master
journal are always written to disk regardless of the settings of
the [SQLITE_TEMP_STORE] compile-time parameter and the
[temp_store pragma].
</p>

<tcl>hd_fragment otheropt</tcl>
<h2>4.0 Other Temporary File Optimizations</h2>

<p>
SQLite uses a page cache of recently read and written database
pages.  This page cache is used not just for the main database
file but also for transient indices and tables stored in temporary
files.  If SQLite needs to use a temporary index or table and
the [SQLITE_TEMP_STORE] compile-time parameter and the
[temp_store pragma] are
set to store temporary tables and index on disk, the information
is still initially stored in memory in the page cache.  The 
temporary file is not opened and the information is not truly
written to disk until the page cache is full.
</p>

<p>
This means that for many common cases where the temporary tables
and indices are small (small enough to fit into the page cache)
no temporary files are created and no disk I/O occurs.  Only
when the temporary data becomes too large to fit in RAM does
the information spill to disk.
</p>

<p>
Each temporary table and index is given its own page cache
which can store a maximum number of database pages determined
by the SQLITE_DEFAULT_TEMP_CACHE_SIZE compile-time parameter.
(The default value is 500 pages.)
The maximum number of database pages in the page cache is the
same for every temporary table and index.  The value cannot
be changed at run-time or on a per-table or per-index basis.
Each temporary file gets its own private page cache with its
own SQLITE_DEFAULT_TEMP_CACHE_SIZE page limit.
</p>

<tcl>hd_fragment tempdir {temporary directory search algorithm}</tcl>
<h2>5.0 Temporary File Storage Locations</h2>

<p>
The directory or folder in which temporary files are created is
determined by the OS-specific [VFS].

<p>
On unix-like systems, directories are searched in the following order:
<ol>
<li>The directory set by [PRAGMA temp_store_directory] or by the
    [sqlite3_temp_directory] global variable
<li>The SQLITE_TMPDIR environment variable
<li>The TMPDIR environment variable
<li>/var/tmp
<li>/usr/tmp
<li>/tmp
<li>The current working directory (".")
</ol>
The first of the above that is found to exist and have the write and
execute bits set is used.  The final "." fallback is important for some
applications that use SQLite inside of chroot jails that do not have
the standard temporary file locations available.

<p>
On Windows systems, folders are searched in the following order:
<ol>
<li>The folder set by [PRAGMA temp_store_directory] or by the
    [sqlite3_temp_directory] global variable
<li>The folder returned by the GetTempPath() system interface.
</ol>
SQLite itself does not pay any attention to environment variables
in this case, though presumably the GetTempPath() system call does.
The search algorithm is different for CYGWIN builds.  Check the 
source code for details.