This document is a collection of notes describing the proposed SQLite v3 type affinity and collation sequence features.
Version 2 of SQLite stores all column values as ASCII text. Version 3 enhances this by providing the ability to store integer and real numbers in a more compact format and the capability to store BLOB data.
Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
NULL. The value is a NULL value.
INTEGER. The value is a signed integer, stored in 1, 2, 4 or 8 bytes depending on the magnitude of the value.
REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16-LE).
BLOB. The value is a blob of data, stored exactly as it was input.
As in SQLite v2, normally any SQLite v3 column except an INTEGER PRIMARY KEY may be used to store any type of value. The exception to this rule is described below under 'Strict Affinity Mode'.
All values supplied to SQLite, whether as literals embedded in SQL statements, values bound to pre-compiled SQL statements or data read using the COPY command are assigned a storage class before the SQL statement is executed. Under circumstances described below, the database engine may convert values between numeric storage classes (INTEGER and REAL) and TEXT during query execution.
Storage classes are initially assigned as follows:
Values read using the COPY command are assigned the storage class TEXT or NULL.
Values specified as literals as part of SQL statements are assigned storage class TEXT if they are enclosed by single or double quotes, INTEGER if the literal is specified as an unquoted number with no decimal point or exponent, REAL if the literal is an unquoted number with a decimal point or exponent and NULL if the value is a NULL.
Values supplied using the sqlite3_bind_* APIs are assigned the storage class that most closely matches the native type bound (i.e. sqlite3_bind_blob() binds a value with storage class BLOB).
The storage class of a value that is the result of an SQL scalar operator depends on the outermost operator of the expression. User-defined functions may return values with any storage class. It is not generally possible to determine the storage class of the result of an expression at compile time.
Each column in an SQLite 3 database is assigned one of the following type affinities:
The affinity of a column determines the storage class used by values inserted into the column.
A column with TEXT affinity stores all data using storage classes NULL, TEXT or BLOB. If numerical data is inserted into a column with TEXT affinity it is converted to text form before being stored.
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, an attempt is made to convert it to an integer or real number before it is stored. If the conversion is successful, then the value is stored using the INTEGER or REAL storage class. If the conversion cannot be performed the value is stored using the TEXT storage class. No attempt is made to convert NULL or blob values.
A column that uses INTEGER affinity behaves in the same way as a column with NUMERIC affinity, except that if a real value with no floating point component (or text value that converts to such) is inserted it is converted to an integer and stored using the INTEGER storage class.
A column with affinity NONE makes no attempt to coerce data before it is inserted.
The type affinity of a column is determined by the declared type of the column, according to the following rules:
If the datatype of the column contains any of the strings "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity.
If the datatype contains the string "INT" then it is assigned INTEGER affinity.
If the datatype contains the string "BLOB" is is given an affinity of NONE.
Any column that does not matches the rules above, including columns that have no datatype specified, are given NUMERIC affinity.
If a table is created using a “CREATE TABLE <table> AS SELECT...” statement, then all columns have no datatype specified and they are given no affinity.
CREATE TABLE t1( t AFFINITY TEXT, nu AFFINITY NUMERIC, i AFFINITY INTEGER, no AFFINITY NONE ); -- Storage classes for the following row: -- TEXT, REAL, INTEGER, TEXT INSERT INTO t1 VALUES('500.0', '500.0', '500.0', '500.0'); -- Storage classes for the following row: -- TEXT, REAL, INTEGER, REAL INSERT INTO t1 VALUES(500.0, 500.0, 500.0, 500.0);
Like SQLite v2, v3 features the binary comparison operators '=', '<', '<=', '>=' and '!=', an operation to test for set membership, 'IN', and the ternary comparison operator 'BETWEEN'.
The results of a comparison depend on the storage classes of the two values being compared, according to the following rules:
A value with storage class NULL is considered less than any other value (including another value with storage class NULL).
An INTEGER or REAL value is less than any TEXT or BLOB value. When an INTEGER or REAL is compared to another INTEGER or REAL, a numerical comparison is performed.
A TEXT value is less than a BLOB value. When two TEXT values are compared, the C library function memcmp() is usually used to determine the result. However this can be overriden, as described under 'User-defined collation Sequences' below.
When two BLOB values are compared, the result is always determined using memcmp().
SQLite may attempt to convert values between the numeric storage classes (INTEGER and REAL) and TEXT before performing a comparison. For binary comparisons, this is done in the cases enumerated below. The term “expression” used in the bullet points below means any SQL scalar expression or literal other than a column value.
When a column value is compared to the result of an expression, the affinity of the column is applied to the result of the expression before the comparison takes place.
When two column values are compared, if one column has INTEGER or NUMERIC affinity and the other does not, the NUMERIC affinity is applied to any values with storage class TEXT extracted from the non-NUMERIC column.
When the results of two expressions are compared, the NUMERIC affinity is applied to both values before the comparison takes place.
CREATE TABLE t1( a AFFINITY TEXT, b AFFINITY NUMERIC, c AFFINITY NONE ); -- Storage classes for the following row: -- TEXT, REAL, TEXT INSERT INTO t1 VALUES('500', '500', '500'); -- 60 and 40 are converted to “60” and “40” and values are compared as TEXT. SELECT a < 60, a < 40 FROM t1; 1|0 -- Comparisons are numeric. No conversions are required. SELECT b < 60, b < 600 FROM t1; 0|1 -- Both 60 and 600 (storage class NUMERIC) are less than '500' (storage class TEXT). SELECT c < 60, c < 600 FROM t1; 0|0
In SQLite, the expression “a BETWEEN b AND c” is currently equivalent to “a >= b AND a <= c”. SQLite will continue to treat the two as exactly equivalent, even if this means that different affinities are applied to 'a' in each of the comparisons required to evaluate the expression.
Expressions of the type “a IN (SELECT b ....)” are handled by the three rules enumerated above for binary comparisons (e.g. in a similar manner to “a = b”). For example if 'b' is a column value and 'a' is an expression, then the affinity of 'b' is applied to 'a' before any comparisons take place.
SQLite currently treats the expression “a IN (x, y, z)” as equivalent to “a = z OR a = y OR a = z”. SQLite will continue to treat the two as exactly equivalent, even if this means that different affinities are applied to 'a' in each of the comparisons required to evaluate the expression.
All mathematical operators (which is to say, all operators other than the concatenation operator "||") apply NUMERIC affinity to all operands prior to being carried out. If one or both operands cannot be converted to NUMERIC then the result of the operation is NULL.
For the concatenation operator, TEXT affinity is applied to both operands. If either operand cannot be converted to TEXT (because it is NULL or a BLOB) then the result of the concatenation is NULL.
When values are sorted by an ORDER by clause, values with storage class NULL come first, followed by INTEGER and REAL values interspersed in numeric order, followed by TEXT values usually in memcmp() order, and finally BLOB values in memcmp() order. No storage class conversions occur before the sort.
When grouping values with the GROUP BY clause values with different storage classes are considered distinct, except for INTEGER and REAL values which are considered equal if they are numerically equal. No affinities are applied to any values as the result of a GROUP by clause.
The compound SELECT operators UNION, INTERSECT and EXCEPT perform implicit comparisons between values. Before these comparisons are performed an affinity may be applied to each value. The same affinity, if any, is applied to all values that may be returned in a single column of the compound SELECT result set. The affinity applied is the affinity of the column returned by the left most component SELECTs that has a column value (and not some other kind of expression) in that position. If for a given compound SELECT column none of the component SELECTs return a column value, no affinity is applied to the values from that column before they are compared.
The above sections describe the operation of the database engine in 'normal' affinity mode. SQLite v3 will feature two other affinity modes, as follows:
Strict affinity mode. In this mode if a conversion between storage classes is ever required, the database engine returns an error and the current statement is rolled back.
No affinity mode. In this mode no conversions between storage classes are ever performed. Comparisons between values of different storage classes (except for INTEGER and REAL) are always false.
By default, when SQLite compares two text values, the result of the comparison is determined using memcmp(), regardless of the encoding of the string. SQLite v3 provides the ability for users to supply arbitrary comparison functions, known as user-defined collation sequences, to be used instead of memcmp().