SQLite
View Ticket
Not logged in
Ticket UUID: ff5be73dee08608976bd32851fa07b60381609ca
Title: Database corruption following power-loss in WAL mode
Status: Fixed Type: Code_Defect
Severity: Severe Priority: Immediate
Subsystem: Unknown Resolution: Fixed
Last Modified: 2012-01-14 15:35:02
Version Found In: 3.7.9
Description:
A carefully timed power loss on a filesystem that does out-of-order writes
can cause database corruption when operating in WAL mode.  The setup for the
corruption scenario is as follows:

  1.  Write many small transactions
  2.  Checkpoint
  3.  Begin writing a large transaction
  4.  Power-loss with out-of-order write prior to the transaction fsync

If transaction fsyncs are disabled because PRAGMA synchronous is set to NORMAL,
then the power-loss can occur at any time after SQLite begins overwriting
the WAL file again from the beginning.

If out-of-order writing causes disk sectors after the first transaction
of the previous cycle of the WAL to be overwritten and earlier sectors to
remain unchanged, then after reboot the first
process to connect will find a valid WAL file containing one or more
transactions from the previous cycle of the WAL
and it will replay those transactions.  But subsequent transactions in that
same the WAL cycle
will be lost due to the overwrite and will not be replayed.  The later
transactions will already exist in the database file, however, meaning that
transactions will occur out-of-order.  This can easily result in database
corruption.

This problem has existed in all prior versions of SQLite that support WAL
mode.  This problem was detected during internal stress testing and has
not been observed in the wild.  We believe that the behavior of real-world
filesystems is such that this fault, while theoretically possible, is 
unlikely to occur in practice.

<hr>
Further notes:

In most filesystems, space to hold a file is allocated in sequential
sectors on disk.  Filesystems do this because it is usually faster to
read sectors sequentially rather than randomly.  There can be 
discontinuities in the sector sequence for larger files, but the beginning
of the file in particular is usually arranged contiguously on disk.

Similarly, when writing to disk, filesystems tend to order sector writes
sequentially.  (See, for example, the
[http://en.wikipedia.org/wiki/Elevator_algorithm | elevator algorithm].)

Disk controllers might also reorder writes, but they too seem likely
to perform writes in sector order when possible.

When you consider the three factors above, it seems that the common case
behavior when overwriting a file sequentially will be that
content will reach disk oxide sequentially as well.  And when that 
happens, the failure scenario described by this ticket will not occur.

Combine this with the fact that database damage also requires a well-timed 
power loss or hard reset (which we also hope is a rare event) and one
begins to understand that the failure scenario described by this
ticket is a very unlikely thing to happen.

The problem is still worth fixing, to be sure.  But in consideration of
the above factors, we have downgraded the severity from Critical to Severe.
We will push forward the next release of SQLite in order to get a fix 
into circulation sooner rather than later.  But we do not feel
compelled to rush out a patch release nor take heroic measures to make
the 3.7.10 release during the Christmas holiday.