|Title:||Index on expression goes corrupt from valid SQL|
|Last Modified:||2018-02-01 16:17:02|
|Version Found In:||3.22.0|
drh added on 2018-02-01 00:08:37:
The integrity_check at the end of the following block of SQL reports index corruption.
CREATE TABLE t1 (a INTEGER); CREATE INDEX idx1 ON t1 (lower(a)); INSERT INTO t1 VALUES('0001234'); PRAGMA integrity_check;
Running REINDEX clears the problem.
This problem appears to have been in the code since indexes on expressions were first introduced with version 3.9.0 (2015-10-14).
This problem was discovered in the wild and reported to the SQLite developers through private channels.
drh added on 2018-02-01 16:17:02:
Here is deeper discussion of the problem:
Internally, SQLite tracks values as Mem objects. (The sqlite3_value object in the public interface is just a renamed Mem object.) Each Mem object is able to hold both a numeric value and its equivalent string representation at the same time. Hence sqlite3_value_int() and sqlite3_value_text() can both be answered for the same object without having to do any conversions.
The mapping between string representations and numeric representations is many to one. There are many different strings that map to the same numeric value. For example, '123', '0123', and '+123' all map into numeric 123. Every numeric value has a canonical string representation, which is what is generated when SQLite converts numeric to string. So a numeric-to-string conversion is one-to-one. But a string-to-numeric conversion is many-to-one.
If you have situation where a table is storing a numeric value, and an index is based on the string representation of that numeric value, it is critical that the string representation used by the index be the canonical representation of the numeric value, and not one of the other strings that happen to map into that numeric value. Otherwise, the index will be inconsistent with the table.
This bug arose when a non-canonical string representation was used to build the index.
The fix was to add logic to SQLite that ensures that when the same Mem object holds both a string representation and a numeric representation, that the string rep is the canonical string rep.