SQLite

View Ticket
Login
Ticket Hash: 343634942dd54ab57b702411597dc8872c1241ff
Title: Index on expression goes corrupt from valid SQL
Status: Fixed Type: Code_Defect
Severity: Severe Priority: Immediate
Subsystem: Unknown Resolution: Fixed
Last Modified: 2018-02-01 16:17:02
7.38 years ago
Created: 2018-02-01 00:08:37
7.39 years ago
Version Found In: 3.22.0
User Comments:
drh added on 2018-02-01 00:08:37: (text/x-fossil-wiki)
The integrity_check at the end of the following block of SQL reports
index corruption.

<blockquote><verbatim>
CREATE TABLE t1 (a INTEGER);
CREATE INDEX idx1 ON t1 (lower(a));
INSERT INTO t1 VALUES('0001234');
PRAGMA integrity_check;
</verbatim></blockquote>

Running REINDEX clears the problem.

This problem appears to have been in the code since indexes on expressions
were first introduced with version 3.9.0 (2015-10-14).

This problem was discovered in the wild and reported to the SQLite developers
through private channels.

drh added on 2018-02-01 16:17:02: (text/x-fossil-wiki)
Here is deeper discussion of the problem:

Internally, SQLite tracks values as Mem objects.  (The sqlite3_value object
in the public interface is just a renamed Mem object.)  Each Mem object is
able to hold both a numeric value and its equivalent string representation
at the same time.  Hence sqlite3_value_int() and 
sqlite3_value_text() can both be answered for the same object without having
to do any conversions.

The mapping between string representations and numeric representations is
many to one.  There are many different strings that map to the same numeric
value.  For example, '123', '0123', and '+123' all map into numeric 123.
Every numeric value has a canonical string representation, which is what is
generated when SQLite converts numeric to string.  So a numeric-to-string
conversion is one-to-one.  But a string-to-numeric conversion is many-to-one.

If you have situation where a table is storing a numeric value, and an
index is based on the string representation of that numeric value, it is
critical that the string representation used by the index be the canonical
representation of the numeric value, and not one of the other strings that
happen to map into that numeric value.  Otherwise, the index will be
inconsistent with the table.

This bug arose when a non-canonical string representation was used to build
the index.

The fix was to add logic to SQLite that ensures that when the same Mem object
holds both a string representation and a numeric representation, that the
string rep is the canonical string rep.