Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Further refinement and cleanup of the WITH and SELECT documentation. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
2941cf658c80e7b301fe17a6eda91891 |
User & Date: | drh 2014-01-24 04:05:30.013 |
Context
2014-01-24
| ||
13:20 | Add examples to WITH documentation showing how to control depth-first versus breath-first search of a tree using an ORDER BY clause on the recursive query. (check-in: fb11eda20b user: drh tags: trunk) | |
04:05 | Further refinement and cleanup of the WITH and SELECT documentation. (check-in: 2941cf658c user: drh tags: trunk) | |
01:20 | First cut at WITH documentation. (check-in: 2636279279 user: drh tags: trunk) | |
Changes
Changes to pages/lang.in.
︙ | ︙ | |||
331 332 333 334 335 336 337 | transactions. ^An attempt to invoke the BEGIN command within a transaction will fail with an error, regardless of whether the transaction was started by [SAVEPOINT] or a prior BEGIN. ^The COMMIT command and the ROLLBACK command without the TO clause work the same on [SAVEPOINT] transactions as they do with transactions started by BEGIN.</p> | | | 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 | transactions. ^An attempt to invoke the BEGIN command within a transaction will fail with an error, regardless of whether the transaction was started by [SAVEPOINT] or a prior BEGIN. ^The COMMIT command and the ROLLBACK command without the TO clause work the same on [SAVEPOINT] transactions as they do with transactions started by BEGIN.</p> <tcl>hd_fragment immediate {BEGIN IMMEDIATE} {BEGIN EXCLUSIVE}</tcl> <p> ^Transactions can be deferred, immediate, or exclusive. ^The default transaction behavior is deferred. ^Deferred means that no locks are acquired on the database until the database is first accessed. ^Thus with a deferred transaction, the BEGIN statement itself does nothing to the filesystem. ^Locks |
︙ | ︙ | |||
2265 2266 2267 2268 2269 2270 2271 | funcdef {printf(FORMAT,...)} {} { ^(The printf(FORMAT,...) SQL function works like the [sqlite3_mprintf()] C-language function and the printf() function from the standard C library.)^ The first argument is a format string that specifies how to construct the output string using values taken from subsequent arguments. ^If the FORMAT argument is missing or NULL then the result is NULL. ^The %n format is silently ignored and does not consume an argument. ^The %p format is an alias for %X. ^The %z format | | | 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 | funcdef {printf(FORMAT,...)} {} { ^(The printf(FORMAT,...) SQL function works like the [sqlite3_mprintf()] C-language function and the printf() function from the standard C library.)^ The first argument is a format string that specifies how to construct the output string using values taken from subsequent arguments. ^If the FORMAT argument is missing or NULL then the result is NULL. ^The %n format is silently ignored and does not consume an argument. ^The %p format is an alias for %X. ^The %z format is interchangeable with %s. ^(If there are too few arguments in the argument list, missing arguments are assumed to have a NULL value, which is translated into 0 or 0.0 for numeric formats or an empty string for %s.)^ } funcdef {quote(X)} {} { ^The quote(X) function returns the text of an SQL literal which |
︙ | ︙ | |||
3035 3036 3037 3038 3039 3040 3041 | RecursiveBubbleDiagram with-clause </tcl> <p>Common Table Expressions or CTEs are temporary views or tables that exist only for the duration of a single SQL statement. There are two kinds of common table expressions: "ordinary" and "recursive". Ordinary common table expressions are helpful for making | | | | | | 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 | RecursiveBubbleDiagram with-clause </tcl> <p>Common Table Expressions or CTEs are temporary views or tables that exist only for the duration of a single SQL statement. There are two kinds of common table expressions: "ordinary" and "recursive". Ordinary common table expressions are helpful for making queries easier to understand by factoring subqueries out of the main SQL statement. Recursive common table expression provide the ability to do a hierarchical or recursive queries of trees and graphs, a capability that is not otherwise available in the SQL language. All common table expressions (ordinary and recursive) are created by prepending a WITH clause in front of a [SELECT], [INSERT], [DELETE], or [UPDATE] statement. A single WITH clause can specify one or more common table expressions. <tcl>hd_fragment ordinarycte {ordinary common table expressions}</tcl> |
︙ | ︙ | |||
3086 3087 3088 3089 3090 3091 3092 | <p>Call the cte-table-name for a recursive common table expression the "recursive table". In the bubble diagram above, the recursive table must appear exactly once in the FROM clause of the recursive-select and must not appear anywhere else in either the initial-select or the recursive-select, including subqueries. The initial-select may be a compound-select, but it may not include an ORDER BY, LIMIT, or OFFSET. | | | 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 | <p>Call the cte-table-name for a recursive common table expression the "recursive table". In the bubble diagram above, the recursive table must appear exactly once in the FROM clause of the recursive-select and must not appear anywhere else in either the initial-select or the recursive-select, including subqueries. The initial-select may be a compound-select, but it may not include an ORDER BY, LIMIT, or OFFSET. The recursive-select must be a simple select, not a compound. The recursive-select is allowed to include an ORDER BY, LIMIT, and/or OFFSET. <p>The basic algorithm for computing the content of the recursive table is as follows: <ol> <li> Run the initial-select and add the results to a queue. |
︙ | ︙ | |||
3127 3128 3129 3130 3131 3132 3133 | with LIMIT, a limit of zero means that no rows are ever added to the recursive table, and a negative limit means an unlimited number of rows may be added to the recursive table.) <li><p> The OFFSET clause, if it is present and has a positive value N, prevents the first N rows from being added to the recursive table. The first N rows are still processed by the recursive-select; they | | > | | 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 | with LIMIT, a limit of zero means that no rows are ever added to the recursive table, and a negative limit means an unlimited number of rows may be added to the recursive table.) <li><p> The OFFSET clause, if it is present and has a positive value N, prevents the first N rows from being added to the recursive table. The first N rows are still processed by the recursive-select; they just are not added to the recursive table. Rows are not counted toward fulfilling the LIMIT until all OFFSET rows have been skipped. <li><p> If an ORDER BY clause is present, it determines the order in which rows are extracted from the queue in step 2a. If there is no ORDER BY clause, then the order in which rows are extracted is undefined. (In the current implementation, the queue becomes a FIFO if the ORDER BY clause is omitted, but applications should not depend on that fact since it might change.) </ul> <tcl>hd_fragment rcex1</tcl> <h4>Recursive Query Examples</h4> <p>The following query returns all integers between 1 and 1000000: <blockquote><pre> WITH RECURSIVE cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000) SELECT x FROM cnt; </pre></blockquote> <p>Consider how this query works. The initial-select, which is this case is just "VALUES(1)", runs first and returns a single row with a single column "1". This one row is added to the queue. In step 2a, that one row is extracted from the queue and added to "cnt". Then the recursive query is run in accordance with step 2c generating a single new row with value "2" to add to the queue. The queue still has one row, so step 2 repeats. The 2 is extracted and added to the recursive table by steps 2a and 2b. Then the row containing 2 is used as if where the complete content of the recursive table and the |
︙ | ︙ | |||
3202 3203 3204 3205 3206 3207 3208 | recursion is stopped by a LIMIT rather than a WHERE clause. The use of LIMIT means that when the one-millionth row is added to the "cnt" table (and immediately returned by the SELECT, thanks to the query optimizer) then the recursion stops immediately regardless of how many rows might be left in the queue. On more complex queries, it can sometimes be difficult to ensure that the WHERE clause will eventually cause the queue to drain and the recursion to terminate. But the LIMIT clause will | | | | | | | | | | 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 | recursion is stopped by a LIMIT rather than a WHERE clause. The use of LIMIT means that when the one-millionth row is added to the "cnt" table (and immediately returned by the SELECT, thanks to the query optimizer) then the recursion stops immediately regardless of how many rows might be left in the queue. On more complex queries, it can sometimes be difficult to ensure that the WHERE clause will eventually cause the queue to drain and the recursion to terminate. But the LIMIT clause will always stop the recursion. So it is good practice to always include a LIMIT clause as a safety if an upper bound on the size of the recursion is known. <tcl>hd_fragment rcex2</tcl> <h4>Hierarchical Query Examples</h4> <p>Consider a table that describes the members of an organization as well as the chain-of-command within that organization. <blockquote><pre> CREATE TABLE org( name TEXT PRIMARY KEY, boss TEXT REFERENCES org, height INT, -- other content omitted ); </pre></blockquote> <p>Every member in the organization has a name, and most members have a single boss. The head of the organization has no boss and so the boss field would be NULL for that one row. The rows of the "org" table form a tree. <p>Here is a query that computes the average height over everyone in Alice's organization, including Alice: <blockquote><pre> WITH RECURSIVE works_for_alice(n) AS ( VALUES('Alice') UNION SELECT name FROM org, works_for_alice WHERE org.boss=works_for_alice.n ) SELECT avg(height) FROM org WHERE org.name IN works_for_alice; </pre></blockquote> <p>Next we consider an example that uses two common table expressions in a single WITH clause. The following table records a family tree: <blockquote><pre> CREATE TABLE family( name TEXT PRIMARY KEY, mom TEXT REFERENCES family, dad TEXT REFERENCES family, born DATETIME, |
︙ | ︙ | |||
3331 3332 3333 3334 3335 3336 3337 | ORDER BY checkin.mtime DESC LIMIT 20 ) SELECT * FROM checkin JOIN ancestor ON id=xfrom; </pre></blockquote> <p> | | | | | 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 | ORDER BY checkin.mtime DESC LIMIT 20 ) SELECT * FROM checkin JOIN ancestor ON id=xfrom; </pre></blockquote> <p> The "ORDERBY checkin.mtime DESC" term in the recursive-select makes the query run much faster by preventing it from following branches that merge checkins from long ago. The ORDER BY forces the recursive-select to focus on the most recent checkins, the ones we want. Without the ORDER BY on the recursive-select, we would be forced to find all ancestors of the desired check-in, sort them all by mtime, then take the top twenty. The ORDER BY essentially sets up a priority queue that forces the recursive query to look at the most recent ancestors first, allowing us to insert a LIMIT close and restrict the scope of the query to just the checkins of interest. <tcl>hd_fragment rcex3</tcl> <h4>Outlandish Recursive Query Examples</h4> |
︙ | ︙ | |||
3369 3370 3371 3372 3373 3374 3375 | a(t) AS ( SELECT group_concat( substr(' .+*#', 1+min(iter/7,4), 1), '') FROM m2 GROUP BY cy ) SELECT group_concat(rtrim(t),x'0a') FROM a; </pre></blockquote> | | | | 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 | a(t) AS ( SELECT group_concat( substr(' .+*#', 1+min(iter/7,4), 1), '') FROM m2 GROUP BY cy ) SELECT group_concat(rtrim(t),x'0a') FROM a; </pre></blockquote> <p>In this query, the "xaxis" and "yaxis" CTEs define the grid of points for which the Mandelbrot Set will be approximated. Each row in the "m(iter,cx,cy,x,y)" CTE means that after "iter" iterations, the Mandelbrot iteration starting at cx,cy has reached point x,y. The number of iterations in this example is limited to 28 (which severely limits the resolution of the computation, but this is just an example, after all). The "m2(iter,cx,cy)" CTE holds the maximum number of iterations reached when starting at point cx,cy. Finally, the entry in the "a(t)" CTE holds a string |
︙ | ︙ | |||
3414 3415 3416 3417 3418 3419 3420 | +. </pre></blockquote> <tcl>hd_fragment mandelbrot</tcl> <p>This next query solves a Sudoku puzzle. The state of the puzzle is defined by an 81-character string formed by reading entries from the puzzle box row by row from left to right and then from top to bottom. | > | | 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 | +. </pre></blockquote> <tcl>hd_fragment mandelbrot</tcl> <p>This next query solves a Sudoku puzzle. The state of the puzzle is defined by an 81-character string formed by reading entries from the puzzle box row by row from left to right and then from top to bottom. Blank squares in the puzzle are denoted by a "." character. Thus the input string <blockquote> 53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79 </blockquote> <p>Corresponds to a puzzle like this: |
︙ | ︙ | |||
3487 3488 3489 3490 3491 3492 3493 | position. The complicated "NOT EXISTS" subquery is the magic that figures out whether or not each candidate "s" string is a valid sudoku puzzle or not. <p>The final answer is found by looking for a string with ind==0. If the original sudoku problem did not have a unique solution, then the query will return all possible solutions. If the original problem | | > > > > > > > | 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 | position. The complicated "NOT EXISTS" subquery is the magic that figures out whether or not each candidate "s" string is a valid sudoku puzzle or not. <p>The final answer is found by looking for a string with ind==0. If the original sudoku problem did not have a unique solution, then the query will return all possible solutions. If the original problem was unsolvable, then no rows will be returned. In this case, the answer of: <blockquote> 534678912672195348198342567859761423426853791713924856961537284287419635345286179 </blockquote> <p>Is computed in about 300 milliseconds. <h3>Limitations And Caveats</h3> <ul> <li><p> The WITH clause cannot be used within a [CREATE TRIGGER]. <li><p> |
︙ | ︙ | |||
3525 3526 3527 3528 3529 3530 3531 | the way the data returned by a SELECT statement is determined as a series of steps. It is important to keep in mind that this is purely illustrative - in practice neither SQLite nor any other SQL engine is required to follow this or any other specific process. <h3>Simple Select Processing</h3> | < | | < | | 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 | the way the data returned by a SELECT statement is determined as a series of steps. It is important to keep in mind that this is purely illustrative - in practice neither SQLite nor any other SQL engine is required to follow this or any other specific process. <h3>Simple Select Processing</h3> <p>Generating the results of a simple SELECT statement is presented as a four step process in the description below: <ol> <li> <p>[FROM clause] processing: The input data for the simple SELECT is determined. The input data is either implicitly a single row with 0 columns (if there is no FROM clause) or is determined by the FROM clause. <li> <p>[WHERE clause] processing: The input data is filtered using the WHERE clause expression. <li> <p>[GROUP BY|GROUP BY, HAVING and result-column expression] processing: The set of result rows is computed by aggregating the data according to any GROUP BY clause and calculating the result-set expressions for the rows of the filtered input dataset. <li> <p>[DISTINCT|DISTINCT/ALL keyword] processing: If the query is a "SELECT |
︙ | ︙ | |||
3564 3565 3566 3567 3568 3569 3570 | <p>^(If the FROM clause is omitted from a simple SELECT statement, then the input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and <i>M</i>=0). <p>If a FROM clause is specified, the data on which a simple SELECT query operates comes from the one or more tables or subqueries (SELECT statements | | > | | | | | | | | | | | | | | | | | | | > | 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 | <p>^(If the FROM clause is omitted from a simple SELECT statement, then the input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and <i>M</i>=0). <p>If a FROM clause is specified, the data on which a simple SELECT query operates comes from the one or more tables or subqueries (SELECT statements in parenthesis) specified following the FROM keyword. ^A subquery specified in the table-or-subquery following the FROM clause in a simple SELECT statement is handled as if it was a table containing the data returned by executing the subquery statement. ^Each column of the subquery has the [collation|collation sequence] and [affinity] of the corresponding expression in the subquery statement. <p>^If there is only a single table or subquery in the FROM clause, then the input data used by the SELECT statement is the contents of the named table. ^If there is more than one table or subquery in FROM clause then the contents of all tables and/or subqueries are joined into a single dataset for the simple SELECT statement to operate on. Exactly how the data is combined depends on the specific [join-operator] and [join-constraint] used to connect the tables or subqueries together. <p>All joins in SQLite are based on the cartesian product of the left and right-hand datasets. ^The columns of the cartesian product dataset are, in order, all the columns of the left-hand dataset followed by all the columns of the right-hand dataset. ^There is a row in the cartesian product dataset formed by combining each unique combination of a row from the left-hand and right-hand datasets. ^(In other words, if the left-hand dataset consists of <i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of <i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^ <p>^If the join-operator is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma (",") and there is no ON or USING clause, then the result of the join is simply the cartesian product of the left and right-hand datasets. If join-operator does have ON or USING clauses, those are handled according to the following bullet points: <ul> <li> <p>^(If there is an ON clause then the ON expression is evaluated for each row of the cartesian product as a [boolean expression]. Only rows for which the expression evaluates to true are included from the dataset.)^ <li> <p>^If there is a USING clause then each of the column names specified must exist in the datasets to both the left and right of the join-operator. ^(For each pair of named columns, the expression "lhs.X = rhs.X" is evaluated for each row of the cartesian product as a [boolean expression]. Only rows for which all such expressions evaluates to true are included from the result set.)^ ^When comparing values as a result of a USING clause, the normal rules for handling affinities, collation sequences and NULL values in comparisons apply. ^The column from the dataset on the left-hand side of the join-operator is considered to be on the left-hand side of the comparison operator (=) for the purposes of collation sequence and affinity precedence. <p>^For each pair of columns identified by a USING clause, the column from the right-hand dataset is omitted from the joined dataset. ^This is the only difference between a USING clause and its equivalent ON constraint. <li> <p>^(If the NATURAL keyword is in the join-operator then an implicit USING clause is added to the join-constraints. The implicit USING clause contains each of the column names that appear in both the left and right-hand input datasets.)^ ^If the left and right-hand input datasets feature no common column names, then the NATURAL keyword has no effect on the results of the join. ^A USING or ON clause may not be added to a join that specifies the NATURAL keyword. <li> <p>^(If the join-operator is a "LEFT JOIN" or "LEFT OUTER JOIN", then after the ON or USING filtering clauses have been applied, an extra row is added to the output for each row in the original left-hand input dataset that corresponds to no rows at all in the composite dataset (if any).)^ ^The added rows contain NULL values in the columns that would normally contain values copied from the right-hand input dataset. </ul> |
︙ | ︙ | |||
3897 3898 3899 3900 3901 3902 3903 3904 3905 3906 3907 3908 3909 3910 | <p>^Instead of a separate OFFSET clause, the LIMIT clause may specify two scalar expressions separated by a comma. ^In this case, the first expression is used as the OFFSET expression and the second as the LIMIT expression. This is counter-intuitive, as when using the OFFSET clause the second of the two expressions is the OFFSET and the first the LIMIT. This is intentional - it maximizes compatibility with other SQL database systems. <tcl> ############################################################################## Section UPDATE update {UPDATE *UPDATEs} RecursiveBubbleDiagram update-stmt </tcl> | > > > > > > > > > > > > > > > > > > | 3906 3907 3908 3909 3910 3911 3912 3913 3914 3915 3916 3917 3918 3919 3920 3921 3922 3923 3924 3925 3926 3927 3928 3929 3930 3931 3932 3933 3934 3935 3936 3937 | <p>^Instead of a separate OFFSET clause, the LIMIT clause may specify two scalar expressions separated by a comma. ^In this case, the first expression is used as the OFFSET expression and the second as the LIMIT expression. This is counter-intuitive, as when using the OFFSET clause the second of the two expressions is the OFFSET and the first the LIMIT. This is intentional - it maximizes compatibility with other SQL database systems. <tcl>hd_fragment values {VALUES clause}</tcl> <h3>VALUES clauses</h3> <p>The phrase "VALUES(<i>expr-list</i>)" means the same thing as "SELECT <i>expr-list</i>". The phrase "VALUES(<i>expr-list-1</i>),...,(<i>expr-list-N</i>)" means the same thing as "SELECT <i>expr-list-1</i> UNION ALL ... UNION ALL SELECT <i>expr-list-N</i>". There is no advantage to using one form over the other. Both forms yield the same result and both forms use the same amount of memory and processing time. <h3>The WITH Clause</h3> <p>SELECT statements may be optional preceded by a single [WITH clause] that defines one or more [common table expressions] for using within the SELECT statement. <tcl> ############################################################################## Section UPDATE update {UPDATE *UPDATEs} RecursiveBubbleDiagram update-stmt </tcl> |
︙ | ︙ | |||
4061 4062 4063 4064 4065 4066 4067 | particular named index on a [DELETE], [SELECT], or [UPDATE] statement. The INDEXED BY phrase is an extension that is particular to SQLite and is not portable to other SQL database engines. The INDEXED BY phrase can be seen in the following syntax diagrams:</p> <tcl> | | < | 4088 4089 4090 4091 4092 4093 4094 4095 4096 4097 4098 4099 4100 4101 4102 | particular named index on a [DELETE], [SELECT], or [UPDATE] statement. The INDEXED BY phrase is an extension that is particular to SQLite and is not portable to other SQL database engines. The INDEXED BY phrase can be seen in the following syntax diagrams:</p> <tcl> RecursiveBubbleDiagram qualified-table-name </tcl> <p>^The "INDEXED BY index-name" phrase specifies that the named index must be used in order to look up values on the preceding table. ^If index-name does not exist or cannot be used for the query, then the preparation of the SQL statement fails. ^(The "NOT INDEXED" clause specifies that no index shall be used when |
︙ | ︙ |