By default, SQLite supports twenty-nine functions and two operators for dealing with JSON values. There are also two table-valued functions that can be used to decompose a JSON string.
There are 25 scalar functions and operators:
There are four aggregate SQL functions:
The two table-valued functions are:
The JSON functions and operators are built into SQLite by default, as of SQLite version 3.38.0 (2022-02-22). They can be omitted by adding the -DSQLITE_OMIT_JSON compile-time option. Prior to version 3.38.0, the JSON functions were an extension that would only be included in builds if the -DSQLITE_ENABLE_JSON1 compile-time option was included. In other words, the JSON functions went from being opt-in with SQLite version 3.37.2 and earlier to opt-out with SQLite version 3.38.0 and later.
SQLite stores JSON as ordinary text. Backwards compatibility constraints mean that SQLite is only able to store values that are NULL, integers, floating-point numbers, text, and BLOBs. It is not possible to add a new "JSON" type.
For functions that accept JSON as their first argument, that argument can be a JSON object, array, number, string, or null. SQLite numeric values and NULL values are interpreted as JSON numbers and nulls, respectively. SQLite text values can be understood as JSON objects, arrays, or strings. If an SQLite text value that is not a well-formed JSON object, array, or string is passed into JSON function, that function will usually throw an error. (Exceptions to this rule are json_valid(), json_quote(), and json_error_position().)
These routines understand all rfc-8259 JSON syntax and also JSON5 extensions. JSON text generated by these routines always strictly conforms to the canonical JSON definition and does not contain any JSON5 or other extensions. The ability to read and understand JSON5 was added in version 3.42.0 (2023-05-16). Prior versions of SQLite would only read canonical JSON.
Beginning with version 3.45.0 (2023-11-01), SQLite supports an alternative binary encoding of JSON which we call "JSONB". The JSONB format is stored as a BLOB. JSONB is analogous in structure to canonical RFC-8259 text JSON. JSONB just happens to be slightly more compact and much easier to parse, so it uses fewer CPU cycles to process.
Any SQL function parameter that accepts text JSON as an input will also accept a BLOB in the JSONB format. The function will operate the same in either case, except that it will run faster when the input is JSONB.
Most SQL functions that return JSON text have a corresponding function that returns the equivalent JSONB. The functions that return JSON in the text format begin with "json_" and functions that return the JSONB format begin with "jsonb_".
The core idea behind SQLite's JSONB is that each element begins with a header that includes the size and type of that element. This makes reading faster. For example, when reading a string literal, it is no longer necessary to search forward looking for the closing double-quote, reading byte by byte and taking care to avoid escaped double-quotes. The size of the literal is right there in the header, and so the process can jump ahead to the next element without having to scrutinize each intervening byte. Since the size and type of each element is identified in the header, punctuation characters such as string, object, and array delimiters and comma and colon separators can all be omitted. The payload for JSONB is the same as the corresponding text JSON. The only difference is that JSONB omits punctuation and replaces it with a header on each element.
The "JSONB" name is inspired by PostgreSQL, but the on-disk format for SQLite's JSONB is not the same as PostgreSQL's. The two formats have the same name, but are not binary compatible. The PostgreSQL JSONB format claims to offer O(1) lookup of elements in objects and arrays. SQLite's JSONB format makes no such claim. SQLite's JSONB has O(N) time complexity for most operations in SQLite, just like text JSON. The advantage of JSONB in SQLite is that it is smaller and faster than text JSON - potentially several times faster. There is space in the on-disk JSONB format to add enhancements and future versions of SQLite might include options to provide O(1) lookup of elements in JSONB, but no such capability is currently available.
The SQLite JSONB format is intended to be private to SQLite and is for use by the built-in SQLite functions only. The JSONB format is not intended as an interchange format. Nevertheless, JSONB is stored in database files which are intended to be readable and writable for many decades into the future. To that end, the JSONB format is well-defined and stable. The separate SQLite JSONB format document provides details of the JSONB format for the curious reader.
The JSONB that is generated by SQLite will always be well-formed. If you treat JSONB as an opaque BLOB that is generated by some JSON functions and consumed by others, then you will not have any problems. But JSONB is just a BLOB, so a mischievious programmer could devise BLOBs that are similar to JSONB but that are technically malformed. When misformatted JSONB is feed into JSON functions, any of the following might happen:
The SQL statement might abort with a "malformed JSON" error.
If the error is in a part of the JSONB that is not required to obtain the correct answer, then the correct answer might be returned.
A goofy or nonsensical answer might be returned.
The way in which SQLite handles invalid JSONB might change from one version of SQLite to the next. The system follows the garbage-in/garbage-out rule: If you feed the JSON functions invalid JSONB, you get back an invalid answer. If you are in doubt about the validity of our JSONB, use the json_valid() function to verify it.
The implementation does make this one promise: Malformed JSONB will never cause a memory error or similar problem that might lead to a vulnerability. Invalid JSONB might lead to crazy answers, or it might cause queries to abort, but it won't cause a crash.
For functions that accept PATH arguments, that PATH must be well-formed or else the function will throw an error. A well-formed PATH is a text value that begins with exactly one '$' character followed by zero or more instances of ".objectlabel" or "[arrayindex]".
The arrayindex is usually a non-negative integer N. In that case, the array element selected is the N-th element of the array, starting with zero on the left. The arrayindex can also be of the form "#-N" in which case the element selected is the N-th from the right. The last element of the array is "#-1". Think of the "#" characters as the "number of elements in the array". Then the expression "#-1" evaluates to the integer that corresponds to the last entry in the array. It is sometimes useful for the array index to be just the # character, for example when appending a value to an existing JSON array:
For functions that accept "value" arguments (also shown as "value1" and "value2"), those arguments are usually understood to be literal strings that are quoted and become JSON string values in the result. Even if the input value strings look like well-formed JSON, they are still interpreted as literal strings in the result.
However, if a value argument comes directly from the result of another JSON function or from the -> operator (but not the ->> operator), then the argument is understood to be actual JSON and the complete JSON is inserted rather than a quoted string.
For example, in the following call to json_object(), the value argument looks like a well-formed JSON array. However, because it is just ordinary SQL text, it is interpreted as a literal string and added to the result as a quoted string:
To be clear: "json" arguments are always interpreted as JSON regardless of where the value for that argument comes from. But "value" arguments are only interpreted as JSON if those arguments come directly from another JSON function or the -> operator.
Within JSON value arguments interpreted as JSON strings, Unicode escape sequences are not treated as equivalent to the characters or escaped control characters represented by the expressed Unicode code point. Such escape sequences are not translated or specially treated; they are treated as plain text by SQLite's JSON functions.
The current implementation of this JSON library uses a recursive descent parser. In order to avoid using excess stack space, any JSON input that has more than 1000 levels of nesting is considered invalid. Limits on nesting depth are allowed for compatible implementations of JSON by RFC-8259 section 9.
Beginning in version 3.42.0 (2023-05-16), these routines will read and interpret input JSON text that includes JSON5 extensions. However, JSON text generated by these routines will always be strictly conforming to the canonical definition of JSON.
Here is a synopsis of JSON5 extensions (adapted from the JSON5 specification):
To convert string X from JSON5 into canonical JSON, invoke "json(X)". The output of the "json()" function will be canonical JSON regardless of any JSON5 extensions that are present in the input. For backwards compatibility, the json_valid(X) function without a "flags" argument continues to report false for inputs that are not canonical JSON, even if the input is JSON5 that the function is able to understand. To determine whether or not an input string is valid JSON5, include the 0x02 bit in the "flags" argument to json_valid: "json_valid(X,2)".
These routines understand all of JSON5, plus a little more. SQLite extends the JSON5 syntax in these two ways:
Strict JSON5 requires that unquoted object keys must be ECMAScript 5.1 IdentifierNames. But large unicode tables and lots of code is required in order to determine whether or not a key is an ECMAScript 5.1 IdentifierName. For this reason, SQLite allows object keys to include any unicode characters greater than U+007f that are not whitespace characters. This relaxed definition of "identifier" greatly simplifies the implementation and allows the JSON parser to be smaller and run faster.
JSON5 allows floating-point infinities to be expressed as "Infinity", "-Infinity", or "+Infinity" in exactly that case - the initial "I" is capitalized and all other characters are lower case. SQLite also allows the abbreviation "Inf" to be used in place of "Infinity" and it allows both keywords to appear in any combination of upper and lower case letters. Similarly, JSON5 allows "NaN" for not-a-number. SQLite extends this to also allow "QNaN" and "SNaN" in any combination of upper and lower case letters. Note that SQLite interprets NaN, QNaN, and SNaN as just an alternative spellings for "null". This extension has been added because (we are told) there exists a lot of JSON in the wild that includes these non-standard representations for infinity and not-a-number.
The json(X) function verifies that its argument X is a valid JSON string and returns a minified version of that JSON string (with all unnecessary whitespace removed). If X is not a well-formed JSON string, then this routine throws an error.
In other words, this function converts raw text that looks like JSON into actual JSON so that it may be passed into the value argument of some other json function and will be interpreted as JSON rather than a string. This function is not appropriate for testing whether or not a particular string is well-formed JSON - use the json_valid() for that task.
If the argument X to json(X) contains JSON objects with duplicate labels, then it is undefined whether or not the duplicates are preserved. The current implementation preserves duplicates. However, future enhancements to this routine may choose to silently remove duplicates.
The jsonb(X) function returns the binary JSONB representation of the JSON provided as argument X. An error is raised if X is TEXT that does not have valid JSON syntax. If X is a BLOB and superficially appears to be a well-formed JSONB, then this routine simply returns a copy of X. The deep structure of the JSONB is not validated.
The json_array() SQL function accepts zero or more arguments and returns a well-formed JSON array that is composed from those arguments. If any argument to json_array() is a BLOB then an error is thrown.
An argument with SQL type TEXT is normally converted into a quoted JSON string. However, if the argument is the output from another json1 function, then it is stored as JSON. This allows calls to json_array() and json_object() to be nested. The json() function can also be used to force strings to be recognized as JSON.
The jsonb_array() SQL function works just like the json_array() function except that it returns the constructed JSON array in the SQLite's private JSONB format rather than in the standard RFC 8259 text format.
The json_array_length(X) function returns the number of elements in the JSON array X, or 0 if X is some kind of JSON value other than an array. The json_array_length(X,P) locates the array at path P within X and returns the length of that array, or 0 if path P locates an element in X that is not a JSON array, and NULL if path P does not locate any element of X. Errors are thrown if either X is not well-formed JSON or if P is not a well-formed path.
The json_error_positionf(X) function returns 0 if the input X is a well-formed JSON or JSON5 string. If the input X contains one or more syntax errors, then this function returns the character position of the first syntax error. The left-most character is position 1.
If the input X is a BLOB, then this routine returns 0 if X appears to be a well-formed JSONB blob. If the input X is a BLOB that is clearly not valid JSONB, then some non-zero value is returned. The positive value returned by the json_error_position() function with a BLOB input does not necessarily indicate the position in the BLOB where it deviates from the JSONB spec. Note also that json_error_position() does not do a thorough check of the BLOB and it might miss errors and return 0 even though the BLOB is not a strictly conforming JSONB.
The json_extract(X,P1,P2,...) extracts and returns one or more values from the well-formed JSON at X. If only a single path P1 is provided, then the SQL datatype of the result is NULL for a JSON null, INTEGER or REAL for a JSON numeric value, an INTEGER zero for a JSON false value, an INTEGER one for a JSON true value, the dequoted text for a JSON string value, and a text representation for JSON object and array values. If there are multiple path arguments (P1, P2, and so forth) then this routine returns SQLite text which is a well-formed JSON array holding the various values.
There is a subtle incompatibility between the json_extract() function in SQLite and the json_extract() function in MySQL. The MySQL version of json_extract() always returns JSON. The SQLite version of json_extract() only returns JSON if there are two or more PATH arguments (because the result is then a JSON array) or if the single PATH argument references an array or object. In SQLite, if json_extract() has only a single PATH argument and that PATH references a JSON null or a string or a numeric value, then json_extract() returns the corresponding SQL NULL, TEXT, INTEGER, or REAL value.
The difference between MySQL json_extract() and SQLite json_extract() really only stands out when accessing individual values within the JSON that are strings or NULLs. The following table demonstrates the difference:
|Operation||SQLite Result||MySQL Result|
The jsonb_extract() function works the same as the json_extract() function, except in cases where json_extract() would normally return a text JSON array object, this routine returns the array or object in the JSONB format. For the common case where a text, numeric, null, or boolean JSON element is returned, this routine works exactly the same as json_extract().
Beginning with SQLite version 3.38.0 (2022-02-22), the -> and ->> operators are available for extracting subcomponents of JSON. The SQLite implementation of -> and ->> strives to be compatible with both MySQL and PostgreSQL. The -> and ->> operators take a JSON string as their left operand and a PATH expression or object field label or array index as their right operand. The -> operator returns a JSON representation of the selected subcomponent or NULL if that subcomponent does not exist. The ->> operator returns an SQL TEXT, INTEGER, REAL, or NULL value that represents the selected subcomponent, or NULL if the subcomponent does not exist.
Both the -> and ->> operators select the same subcomponent of the JSON to their left. The difference is that -> always returns a JSON representation of that subcomponent and the ->> operator always returns an SQL representation of that subcomponent. Thus, these operators are subtly different from a two-argument json_extract() function call. A call to json_extract() with two arguments will return a JSON representation of the subcomponent if and only if the subcomponent is a JSON array or object, and will return an SQL representation of the subcomponent if the subcomponent is a JSON null, string, or numeric value.
When the -> operator returns JSON, it always returns the RFC 8565 text representation of that JSON, not JSONB. Use the jsonb_extract() function if you need a subcomponent in the JSONB format.
The right-hand operand to the -> and ->> operators can be a well-formed JSON path expression. This is the form used by MySQL. For compatibility with PostgreSQL, the -> and ->> operators also accept a text label or integer as their right-hand operand. If the right operand is a text label X, then it is interpreted as the JSON path '$.X'. If the right operand is an integer value N, then it is interpreted as the JSON path '$[N]'.
The json_insert(), json_replace, and json_set() functions all take a single JSON value as their first argument followed by zero or more pairs of path and value arguments, and return a new JSON string formed by updating the input JSON by the path/value pairs. The functions differ only in how they deal with creating new values and overwriting preexisting values.
|Function||Overwrite if already exists?||Create if does not exist?|
The json_insert(), json_replace(), and json_set() functions always take an odd number of arguments. The first argument is always the original JSON to be edited. Subsequent arguments occur in pairs with the first element of each pair being a path and the second element being the value to insert or replace or set on that path.
Edits occur sequentially from left to right. Changes caused by prior edits can affect the path search for subsequent edits.
If the value of a path/value pair is an SQLite TEXT value, then it is normally inserted as a quoted JSON string, even if the string looks like valid JSON. However, if the value is the result of another json function (such as json() or json_array() or json_object()) or if it is the result of the -> operator, then it is interpreted as JSON and is inserted as JSON retaining all of its substructure. Values that are the result of the ->> operator are always interpreted as TEXT and are inserted as a JSON string even if they look like valid JSON.
These routines throw an error if the first JSON argument is not well-formed or if any PATH argument is not well-formed or if any argument is a BLOB.
To append an element onto the end of an array, using json_insert() with an array index of "#". Examples:
The jsonb_insert(), jsonb_replace(), and jsonb_set() functions work the same as json_insert(), json_replace(), and json_set(), respectively, except that "jsonb_" versions return their result in the binary JSONB format.
The json_object() SQL function accepts zero or more pairs of arguments and returns a well-formed JSON object that is composed from those arguments. The first argument of each pair is the label and the second argument of each pair is the value. If any argument to json_object() is a BLOB then an error is thrown.
The json_object() function currently allows duplicate labels without complaint, though this might change in a future enhancement.
An argument with SQL type TEXT it is normally converted into a quoted JSON string even if the input text is well-formed JSON. However, if the argument is the direct result from another JSON function or the -> operator (but not the ->> operator), then it is treated as JSON and all of its JSON type information and substructure is preserved. This allows calls to json_object() and json_array() to be nested. The json() function can also be used to force strings to be recognized as JSON.
The jsonb_object() function works just like the jsonb_object() function except that the generated object is returned in the binary JSONB format.
The json_patch(T,P) SQL function runs the RFC-7396 MergePatch algorithm to apply patch P against input T. The patched copy of T is returned.
MergePatch can add, modify, or delete elements of a JSON Object, and so for JSON Objects, the json_patch() routine is a generalized replacement for json_set() and json_remove(). However, MergePatch treats JSON Array objects as atomic. MergePatch cannot append to an Array nor modify individual elements of an Array. It can only insert, replace, or delete the whole Array as a single unit. Hence, json_patch() is not as useful when dealing with JSON that includes Arrays, especially Arrays with lots of substructure.
The jsonb_patch() function works just like the jsonb_patch() function except that the patched JSON is returned in the binary JSONB format.
The json_remove(X,P,...) function takes a single JSON value as its first argument followed by zero or more path arguments. The json_remove(X,P,...) function returns a copy of the X parameter with all the elements identified by path arguments removed. Paths that select elements not found in X are silently ignored.
Removals occurs sequentially from left to right. Changes caused by prior removals can affect the path search for subsequent arguments.
If the json_remove(X) function is called with no path arguments, then it returns the input X reformatted, with excess whitespace removed.
The json_remove() function throws an error if the first argument is not well-formed JSON or if any later argument is not a well-formed path, or if any argument is a BLOB.
The jsonb_remove() function works just like the jsonb_remove() function except that the edited JSON result is returned in the binary JSONB format.
The json_type(X) function returns the "type" of the outermost element of X. The json_type(X,P) function returns the "type" of the element in X that is selected by path P. The "type" returned by json_type() is one of the following SQL text values: 'null', 'true', 'false', 'integer', 'real', 'text', 'array', or 'object'. If the path P in json_type(X,P) selects an element that does not exist in X, then this function returns NULL.
The json_type() function throws an error if any of its arguments is not well-formed or is a BLOB.
The json_valid(X,Y) function return 1 if the argument X is well-formed JSON, or returns 0 if X is not well-formed. The Y parameter is an integer bitmask that defines what is meant by "well-formed". The following bits of Y are currently defined:
By combining bits, the following useful values of Y can be derived:
The Y parameter is optional. If omitted, it defaults to 1, which means that the default behavior is to return true only if the input X is strictly conforming RFC-8259 JSON text without any extensions. This makes the one-argument version of json_valid() compatible with older versions of SQLite, prior to the addition of support for JSON5 and JSONB.
The difference between 0x04 and 0x08 bits in the Y parameter is that 0x04 only examines the outer wrapper of the BLOB to see if it superficially looks like JSONB. This is sufficient for must purposes and is very fast. The 0x08 bit does a thorough examination of all internal details of the BLOB. The 0x08 bit takes time that is linear in the size of the X input and is much slower. The 0x04 bit is recommended for most purposes.
If you just want to know if a value is a plausible input to one of the other JSON functions, a Y value of 6 is probably what you want to use.
Any Y value less than 1 or greater than 15 raises an error, for the latest version of json_valid(). However, future versions of json_valid() might be enhanced to accept flag values outside of this range, having new meanings that we have not yet thought of.
If either X or Y inputs to json_valid() are NULL, then the function returns NULL.
The json_quote(X) function converts the SQL value X (a number or a string) into its corresponding JSON representation. If X is a JSON value returned by another JSON function, then this function is a no-op.
The json_group_array(X) function is an aggregate SQL function that returns a JSON array comprised of all X values in the aggregation. Similarly, the json_group_object(NAME,VALUE) function returns a JSON object comprised of all NAME/VALUE pairs in the aggregation. The "jsonb_" variants are the same except that they return their result in the binary JSONB format.
The json_each(X) and json_tree(X) table-valued functions walk the JSON value provided as their first argument and return one row for each element. The json_each(X) function only walks the immediate children of the top-level array or object, or just the top-level element itself if the top-level element is a primitive value. The json_tree(X) function recursively walks through the JSON substructure starting with the top-level element.
The json_each(X,P) and json_tree(X,P) functions work just like their one-argument counterparts except that they treat the element identified by path P as the top-level element.
The schema for the table returned by json_each() and json_tree() is as follows:
CREATE TABLE json_tree( key ANY, -- key for current element relative to its parent value ANY, -- value for the current element type TEXT, -- 'object','array','string','integer', etc. atom ANY, -- value for primitive types, null for array & object id INTEGER, -- integer ID for this element parent INTEGER, -- integer ID for the parent of this element fullkey TEXT, -- full path describing the current element path TEXT, -- path to the container of the current row json JSON HIDDEN, -- 1st input parameter: the raw JSON root TEXT HIDDEN -- 2nd input parameter: the PATH at which to start );
The "key" column is the integer array index for elements of a JSON array and the text label for elements of a JSON object. The key column is NULL in all other cases.
The "atom" column is the SQL value corresponding to primitive elements - elements other than JSON arrays and objects. The "atom" column is NULL for a JSON array or object. The "value" column is the same as the "atom" column for primitive JSON elements but takes on the text JSON value for arrays and objects.
The "type" column is an SQL text value taken from ('null', 'true', 'false', 'integer', 'real', 'text', 'array', 'object') according to the type of the current JSON element.
The "id" column is an integer that identifies a specific JSON element within the complete JSON string. The "id" integer is an internal housekeeping number, the computation of which might change in future releases. The only guarantee is that the "id" column will be different for every row.
The "parent" column is always NULL for json_each(). For json_tree(), the "parent" column is the "id" integer for the parent of the current element, or NULL for the top-level JSON element or the element identified by the root path in the second argument.
The "fullkey" column is a text path that uniquely identifies the current row element within the original JSON string. The complete key to the true top-level element is returned even if an alternative starting point is provided by the "root" argument.
The "path" column is the path to the array or object container that holds the current row, or the path to the current row in the case where the iteration starts on a primitive type and thus only provides a single row of output.
Suppose the table "CREATE TABLE user(name,phone)" stores zero or more phone numbers as a JSON array object in the user.phone field. To find all users who have any phone number with a 704 area code:
SELECT DISTINCT user.name FROM user, json_each(user.phone) WHERE json_each.value LIKE '704-%';
Now suppose the user.phone field contains plain text if the user has only a single phone number and a JSON array if the user has multiple phone numbers. The same question is posed: "Which users have a phone number in the 704 area code?" But now the json_each() function can only be called for those users that have two or more phone numbers since json_each() requires well-formed JSON as its first argument:
SELECT name FROM user WHERE phone LIKE '704-%' UNION SELECT user.name FROM user, json_each(user.phone) WHERE json_valid(user.phone) AND json_each.value LIKE '704-%';
Consider a different database with "CREATE TABLE big(json JSON)". To see a complete line-by-line decomposition of the data:
SELECT big.rowid, fullkey, value FROM big, json_tree(big.json) WHERE json_tree.type NOT IN ('object','array');
In the previous, the "type NOT IN ('object','array')" term of the WHERE clause suppresses containers and only lets through leaf elements. The same effect could be achieved this way:
SELECT big.rowid, fullkey, atom FROM big, json_tree(big.json) WHERE atom IS NOT NULL;
Suppose each entry in the BIG table is a JSON object with a '$.id' field that is a unique identifier and a '$.partlist' field that can be a deeply nested object. You want to find the id of every entry that contains one or more references to uuid '6fa5181e-5721-11e5-a04e-57f3d7b32808' anywhere in its '$.partlist'.
SELECT DISTINCT json_extract(big.json,'$.id') FROM big, json_tree(big.json, '$.partlist') WHERE json_tree.key='uuid' AND json_tree.value='6fa5181e-5721-11e5-a04e-57f3d7b32808';
This page last modified on 2023-12-05 19:57:45 UTC
*** DRAFT ***