*** DRAFT ***

SQL As Understood By SQLite

[Top]

expression

expr:

syntax diagram expr

literal-value:

raise-function:

select-stmt:

type-name:

This section is different from the others. Most other sections of this document talks about a particular SQL command. This section does not talk about a standalone command but about "expressions" which are subcomponents of most other commands.

Operators

SQLite understands the following binary operators, in order from highest to lowest precedence:

||
*    /    %
+    -
<<   >>   &    |
<    <=   >    >=
=    ==   !=   <>   IS   IS NOT   IN   LIKE   GLOB   MATCH   REGEXP
AND   
OR

Supported unary prefix operators are these:

-    +    ~    NOT

The COLLATE operator is a unary postfix operator that assigns a collating sequence to an expression. The COLLATE operator has a higher precedence (binds more tightly) than any binary operator and any unary prefix operator except "~". (COLLATE and "~" are associative so their binding order does not matter.) The collating sequence set by the COLLATE operator overrides the collating sequence determined by the COLLATE clause in a table column definition. See the detailed discussion on collating sequences in the Datatype In SQLite3 document for additional information.

The unary operator + is a no-op. It can be applied to strings, numbers, blobs or NULL and it always returns a result with the same value as the operand.

Note that there are two variations of the equals and not equals operators. Equals can be either = or ==. The non-equals operator can be either != or <>. The || operator is "concatenate" - it joins together the two strings of its operands. The operator % outputs the value of its left operand modulo its right operand.

The result of any binary operator is either a numeric value or NULL, except for the || concatenation operator which always evaluates to either NULL or a text value.

The IS and IS NOT operators work like = and != except when one or both of the operands are NULL. In this case, if both operands are NULL, then the IS operator evaluates to 1 (true) and the IS NOT operator evaluates to 0 (false). If one operand is NULL and the other is not, then the IS operator evaluates to 0 (false) and the IS NOT operator is 1 (true). It is not possible for an IS or IS NOT expression to evaluate to NULL. Operators IS and IS NOT have the same precedence as =.

Literal Values (Constants)

A literal value represents a constant. Literal values may be integers, floating point numbers, strings, BLOBs, or NULLs.

The syntax for integer and floating point literals (collectively "numeric literals") is shown by the following diagram:

numeric-literal:

syntax diagram numeric-literal

If a numeric literal has a decimal point or an exponentiation clause or if its magnitude is less than -9223372036854775808 or greater than 9223372036854775807, then it is a floating point literal. Otherwise is it is an integer literal. The "E" character that begins the exponentiation clause of a floating point literal can be either upper or lower case. The "." character is always used as the decimal point even if the locale setting specifies "," for this role - the use of "," for the decimal point would result in syntactic ambiguity.

Hexadecimal integer literals follow the C-language notation of "0x" or "0X" followed by hexadecimal digits. For example, 0x1234 means the same as 4660 and 0x8000000000000000 means the same as -9223372036854775808. Hexadecimal integer literals are interpreted as 64-bit two's-complement integers and are thus limited to sixteen significant digits of precision. Support for hexadecimal integers was added to SQLite version 3.8.6. For backwards compatibility, the "0x" hexadecimal integer notation is only understood by the SQL language parser, not by the type conversions routines. String variables that contain text formatted like hexadecimal integers are not interpreted as hexadecimal integers when coercing the string value into an integer due to a CAST expression or for a column affinity transformation or prior to performing a numeric operation or for any other run-time conversions. When coercing a string value in the format of a hexadecimal integer into an integer value, the conversion process stops when the 'x' character is seen so the resulting integer value is always zero. SQLite only understands the hexadecimal integer notation when it appears in the SQL statement text, not when it appears as part of the content of the database.

A string constant is formed by enclosing the string in single quotes ('). A single quote within the string can be encoded by putting two single quotes in a row - as in Pascal. C-style escapes using the backslash character are not supported because they are not standard SQL.

BLOB literals are string literals containing hexadecimal data and preceded by a single "x" or "X" character. Example: X'53514C697465'

A literal value can also be the token "NULL".

Parameters

A "variable" or "parameter" token specifies a placeholder in the expression for a value that is filled in at runtime using the sqlite3_bind() family of C/C++ interfaces. Parameters can take several forms:

?NNN A question mark followed by a number NNN holds a spot for the NNN-th parameter. NNN must be between 1 and SQLITE_MAX_VARIABLE_NUMBER.
? A question mark that is not followed by a number creates a parameter with a number one greater than the largest parameter number already assigned. If this means the parameter number is greater than SQLITE_MAX_VARIABLE_NUMBER, it is an error. This parameter format is provided for compatibility with other database engines. But because it is easy to miscount the question marks, the use of this parameter format is discouraged. Programmers are encouraged to use one of the symbolic formats below or the ?NNN format above instead.
:AAAA A colon followed by an identifier name holds a spot for a named parameter with the name :AAAA. Named parameters are also numbered. The number assigned is one greater than the largest parameter number already assigned. If this means the parameter would be assigned a number greater than SQLITE_MAX_VARIABLE_NUMBER, it is an error. To avoid confusion, it is best to avoid mixing named and numbered parameters.
@AAAA An "at" sign works exactly like a colon, except that the name of the parameter created is @AAAA.
$AAAA A dollar-sign followed by an identifier name also holds a spot for a named parameter with the name $AAAA. The identifier name in this case can include one or more occurrences of "::" and a suffix enclosed in "(...)" containing any text at all. This syntax is the form of a variable name in the Tcl programming language. The presence of this syntax results from the fact that SQLite is really a Tcl extension that has escaped into the wild.

Parameters that are not assigned values using sqlite3_bind() are treated as NULL. The sqlite3_bind_parameter_index() interface can be used to translate a symbolic parameter name into its equivalent numeric index.

The maximum parameter number is set at compile-time by the SQLITE_MAX_VARIABLE_NUMBER macro. An individual database connection D can reduce its maximum parameter number below the compile-time maximum using the sqlite3_limit(D, SQLITE_LIMIT_VARIABLE_NUMBER,...) interface.

The LIKE and GLOB operators

The LIKE operator does a pattern matching comparison. The operand to the right of the LIKE operator contains the pattern and the left hand operand contains the string to match against the pattern. A percent symbol ("%") in the LIKE pattern matches any sequence of zero or more characters in the string. An underscore ("_") in the LIKE pattern matches any single character in the string. Any other character matches itself or its lower/upper case equivalent (i.e. case-insensitive matching). (A bug: SQLite only understands upper/lower case for ASCII characters by default. The LIKE operator is case sensitive by default for unicode characters that are beyond the ASCII range. For example, the expression 'a' LIKE 'A' is TRUE but 'æ' LIKE 'Æ' is FALSE.)

If the optional ESCAPE clause is present, then the expression following the ESCAPE keyword must evaluate to a string consisting of a single character. This character may be used in the LIKE pattern to include literal percent or underscore characters. The escape character followed by a percent symbol (%), underscore (_), or a second instance of the escape character itself matches a literal percent symbol, underscore, or a single escape character, respectively.

The infix LIKE operator is implemented by calling the application-defined SQL functions like(Y,X) or like(Y,X,Z).

The LIKE operator can be made case sensitive using the case_sensitive_like pragma.

The GLOB operator is similar to LIKE but uses the Unix file globbing syntax for its wildcards. Also, GLOB is case sensitive, unlike LIKE. Both GLOB and LIKE may be preceded by the NOT keyword to invert the sense of the test. The infix GLOB operator is implemented by calling the function glob(Y,X) and can be modified by overriding that function.

The REGEXP operator is a special syntax for the regexp() user function. No regexp() user function is defined by default and so use of the REGEXP operator will normally result in an error message. If an application-defined SQL function named "regexp" is added at run-time, then the "X REGEXP Y" operator will be implemented as a call to "regexp(Y,X)".

The MATCH operator is a special syntax for the match() application-defined function. The default match() function implementation raises an exception and is not really useful for anything. But extensions can override the match() function with more helpful logic.

The BETWEEN operator

The BETWEEN operator is logically equivalent to a pair of comparisons. "x BETWEEN y AND z" is equivalent to "x>=y AND x<=z" except that with BETWEEN, the x expression is only evaluated once. The precedence of the BETWEEN operator is the same as the precedence as operators == and != and LIKE and groups left to right.

The CASE expression

A CASE expression serves a role similar to IF-THEN-ELSE in other programming languages.

The optional expression that occurs in between the CASE keyword and the first WHEN keyword is called the "base" expression. There are two basic forms of the CASE expression: those with a base expression and those without.

In a CASE without a base expression, each WHEN expression is evaluated and the result treated as a boolean, starting with the leftmost and continuing to the right. The result of the CASE expression is the evaluation of the THEN expression that corresponds to the first WHEN expression that evaluates to true. Or, if none of the WHEN expressions evaluate to true, the result of evaluating the ELSE expression, if any. If there is no ELSE expression and none of the WHEN expressions are true, then the overall result is NULL.

A NULL result is considered untrue when evaluating WHEN terms.

In a CASE with a base expression, the base expression is evaluated just once and the result is compared against the evaluation of each WHEN expression from left to right. The result of the CASE expression is the evaluation of the THEN expression that corresponds to the first WHEN expression for which the comparison is true. Or, if none of the WHEN expressions evaluate to a value equal to the base expression, the result of evaluating the ELSE expression, if any. If there is no ELSE expression and none of the WHEN expressions produce a result equal to the base expression, the overall result is NULL.

When comparing a base expression against a WHEN expression, the same collating sequence, affinity, and NULL-handling rules apply as if the base expression and WHEN expression are respectively the left- and right-hand operands of an = operator.

If the base expression is NULL then the result of the CASE is always the result of evaluating the ELSE expression if it exists, or NULL if it does not.

Both forms of the CASE expression use lazy, or short-circuit, evaluation.

The only difference between the following two CASE expressions is that the x expression is evaluated exactly once in the first example but might be evaluated multiple times in the second:

The IN and NOT IN operators

The IN and NOT IN operators take a single scalar operand on the left and a vector operand on the right formed by an explicit list of zero or more scalars or by a single subquery. When the right operand of an IN or NOT IN operator is a subquery, the subquery must have a single result column. When the right operand is an empty set, the result of IN is false and the result of NOT IN is true, regardless of the left operand and even if the left operand is NULL. The result of an IN or NOT IN operator is determined by the following matrix:

Left operand
is NULL
Right operand
contains NULL
Right operand
is an empty set
Left operand found
within right operand
Result of
IN operator
Result of
NOT IN operator
no no no no false true
does not matter no yes no false true
no does not matter no yes true false
no yes no no NULL NULL
yes does not matter no does not matter NULL NULL

Note that SQLite allows the parenthesized list of scalar values on the right-hand side of an IN or NOT IN operator to be an empty list but most other SQL database database engines and the SQL92 standard require the list to contain at least one element.

The EXISTS operator

The EXISTS operator always evaluates to one of the integer values 0 and 1. If executing the SELECT statement specified as the right-hand operand of the EXISTS operator would return one or more rows, then the EXISTS operator evaluates to 1. If executing the SELECT would return no rows at all, then the EXISTS operator evaluates to 0.

The number of columns in each row returned by the SELECT statement (if any) and the specific values returned have no effect on the results of the EXISTS operator. In particular, rows containing NULL values are not handled any differently from rows without NULL values.

Scalar Subqueries

A SELECT statement enclosed in parentheses may appear as a scalar quantity. A SELECT used as a scalar quantity must return a result set with a single column. The result of the expression is the value of the only column in the first row returned by the SELECT statement. If the SELECT yields more than one result row, all rows after the first are ignored. If the SELECT yields no rows, then the value of the expression is NULL. The LIMIT of a scalar subquery is always 1. Any other LIMIT value given in the SQL text is ignored.

All types of SELECT statement, including aggregate and compound SELECT queries (queries with keywords like UNION or EXCEPT) are allowed as scalar subqueries.

Table Column Names

A column name can be any of the names defined in the CREATE TABLE statement or one of the following special identifiers: "ROWID", "OID", or "_ROWID_". The three special identifiers describe the unique integer key (the rowid) associated with every row of every table and so are not available on WITHOUT ROWID tables. The special identifiers only refer to the row key if the CREATE TABLE statement does not define a real column with the same name. The rowid can be used anywhere a regular column can be used.

A SELECT statement used as either a scalar subquery or as the right-hand operand of an IN, NOT IN or EXISTS expression may contain references to columns in the outer query. Such a subquery is known as a correlated subquery. A correlated subquery is reevaluated each time its result is required. An uncorrelated subquery is evaluated only once and the result reused as necessary.

CAST expressions

A CAST expression of the form "CAST(<expr> AS <type-name>)" is used to convert the value of <expr> to a different storage class specified by <type-name>. A CAST conversion is similar to the conversion that takes place when a column affinity is applied to a value except that with the CAST operator the conversion always takes place even if the conversion lossy and irreversible, whereas column affinity only changes the data type of a value if the change is lossless and reversible.

If the value of <expr> is NULL, then the result of the CAST expression is also NULL. Otherwise, the storage class of the result is determined by applying the rules for determining column affinity to the <type-name>.

Affinity of <type-name> Conversion Processing
NONE Casting a value to a <type-name> with no affinity causes the value to be converted into a BLOB. Casting to a BLOB consists of first casting the value to TEXT in the encoding of the database connection, then interpreting the resulting byte sequence as a BLOB instead of as TEXT.
TEXT To cast a BLOB value to TEXT, the sequence of bytes that make up the BLOB is interpreted as text encoded using the database encoding.

Casting an INTEGER or REAL value into TEXT renders the value as if via sqlite3_snprintf() except that the resulting TEXT uses the encoding of the database connection.

REAL When casting a BLOB value to a REAL, the value is first converted to TEXT.

When casting a TEXT value to REAL, the longest possible prefix of the value that can be interpreted as a real number is extracted from the TEXT value and the remainder ignored. Any leading spaces in the TEXT value are ignored when converging from TEXT to REAL. If there is no prefix that can be interpreted as a real number, the result of the conversion is 0.0.

INTEGER When casting a BLOB value to INTEGER, the value is first converted to TEXT.

When casting a TEXT value to INTEGER, the longest possible prefix of the value that can be interpreted as an integer number is extracted from the TEXT value and the remainder ignored. Any leading spaces in the TEXT value when converting from TEXT to INTEGER are ignored. If there is no prefix that can be interpreted as an integer number, the result of the conversion is 0. The CAST operator understands decimal integers only — conversion of hexadecimal integers stops at the "x" in the "0x" prefix of the hexadecimal integer string and thus result of the CAST is always zero.

A cast of a REAL value into an INTEGER results in the integer between the REAL value and zero that is closest to the REAL value. If a REAL is greater than the greatest possible signed integer (+9223372036854775807) then the result is the greatest possible signed integer and if the REAL is less than the least possible signed integer (-9223372036854775808) then the result is the least possible signed integer.

Prior to SQLite version 3.8.2, casting a REAL value greater than +9223372036854775807.0 into an integer resulted in the most negative integer, -9223372036854775808. This behavior was meant to emulate the behavior of x86/x64 hardware when doing the equivalent cast.

NUMERIC Casting a TEXT or BLOB value into NUMERIC first does a forced conversion into REAL but then further converts the result into INTEGER if and only if the conversion from REAL to INTEGER is lossless and reversible. This is the only context in SQLite where the NUMERIC and INTEGER affinities behave differently.

Casting a REAL or INTEGER value to NUMERIC is a no-op, even if a real value could be losslessly converted to an integer.

Note that the result from casting any non-BLOB value into a BLOB and the result from casting any BLOB value into a non-BLOB value may be different depending on whether the database encoding is UTF-8, UTF-16be, or UTF-16le.

Boolean Expressions

The SQL language features several contexts where an expression is evaluated and the result converted to a boolean (true or false) value. These contexts are:

To convert the results of an SQL expression to a boolean value, SQLite first casts the result to a NUMERIC value in the same way as a CAST expression. A numeric zero value (integer value 0 or real value 0.0) is considered to be false. A NULL value is still NULL. All other values are considered true.

For example, the values NULL, 0.0, 0, 'english' and '0' are all considered to be false. Values 1, 1.0, 0.1, -0.1 and '1english' are considered to be true.

Functions

SQLite supports many simple and aggregate SQL functions. For presentation purposes, simple functions are further subdivided into core functions and date-time functions. Applications can add new functions, written in C/C++, using the sqlite3_create_function() interface.

It is possible to have an aggregate function with the same name as a simple function, as long as the number of arguments for the two forms of the function are different. For example, the max() function with a single argument is an aggregate and the max() function with two or more arguments is a simple function.

*** DRAFT ***