Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | First cut at WITH documentation. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
2636279279f89c8c8e3bfa226206df09 |
User & Date: | drh 2014-01-24 01:20:47.935 |
Context
2014-01-24
| ||
04:05 | Further refinement and cleanup of the WITH and SELECT documentation. (check-in: 2941cf658c user: drh tags: trunk) | |
01:20 | First cut at WITH documentation. (check-in: 2636279279 user: drh tags: trunk) | |
2014-01-23
| ||
22:25 | Incremental checkin of work on the common-table-expression documentation. (check-in: 52484ca596 user: drh tags: trunk) | |
Changes
Changes to pages/lang.in.
︙ | ︙ | |||
3034 3035 3036 3037 3038 3039 3040 | RecursiveBubbleDiagram with-clause </tcl> <p>Common Table Expressions or CTEs are temporary views or tables that exist only for the duration of a single SQL statement. There are two kinds of common table expressions: "ordinary" and "recursive". Ordinary | | | > | | > | 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 3053 3054 | RecursiveBubbleDiagram with-clause </tcl> <p>Common Table Expressions or CTEs are temporary views or tables that exist only for the duration of a single SQL statement. There are two kinds of common table expressions: "ordinary" and "recursive". Ordinary common table expressions are helpful for making queries easier for humans to read and understand by factoring subqueries out of the main SQL statement. Recursive common table expression provide the ability to do a hiearchical or recursive query of a tree or graph, a capability that is not otherwise avaiable in the SQL language. All common table expressions (ordinary and recursive) are created by prepending a WITH clause in front of a [SELECT], [INSERT], [DELETE], or [UPDATE] statement. A single WITH clause can specify one or more common table expressions. <tcl>hd_fragment ordinarycte {ordinary common table expressions}</tcl> |
︙ | ︙ | |||
3076 3077 3078 3079 3080 3081 3082 | exactly once in the FROM clause of the right-most SELECT statement of the compound select, and nowhere else. </ol> <p>To put it another way, a recursive common table expression must look like the following: | | | 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 | exactly once in the FROM clause of the right-most SELECT statement of the compound select, and nowhere else. </ol> <p>To put it another way, a recursive common table expression must look like the following: <tcl>RecursiveBubbleDiagram recursive-cte</tcl> <p>Call the cte-table-name for a recursive common table expression the "recursive table". In the bubble diagram above, the recursive table must appear exactly once in the FROM clause of the recursive-select and must not appear anywhere else in either the initial-select or the recursive-select, including subqueries. The initial-select may be a compound-select, but it may not include an ORDER BY, LIMIT, or OFFSET. |
︙ | ︙ | |||
3114 3115 3116 3117 3118 3119 3120 | UNION, then only add rows to the queue if no identical row has been previously added to the queue. Repeated rows are discarded before being added to the queue even if the repeated rows have already been extracted from the queue by the recursion stop. If the operator is UNION ALL, then all rows generated by both the initial-select and the recursive-select are always added to the queue even if there are repeats. <li><p> | | | | > | | | | < | 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 3139 3140 3141 3142 3143 3144 3145 3146 3147 3148 3149 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 | UNION, then only add rows to the queue if no identical row has been previously added to the queue. Repeated rows are discarded before being added to the queue even if the repeated rows have already been extracted from the queue by the recursion stop. If the operator is UNION ALL, then all rows generated by both the initial-select and the recursive-select are always added to the queue even if there are repeats. <li><p> The LIMIT clause, if present, restricts which rows are added to the recursive table in step 2b. If a LIMIT clause is present, it restricts the total number of rows that will be added to the recursive table. Once the limit is reached, the recursion stops. (As is always the case with LIMIT, a limit of zero means that no rows are ever added to the recursive table, and a negative limit means an unlimited number of rows may be added to the recursive table.) <li><p> The OFFSET clause, if it is present and has a positive value N, prevents the first N rows from being added to the recursive table. The first N rows are still processed by the recursive-select; they just are not added to the recursive table. <li><p> If an ORDER BY clause is present, it determines the order in which rows are extracted from the queue in step 2a. If there is no ORDER BY clause, then the order in which rows are extracted is undefined. (In the current implementation, the queue becomes a FIFO if the ORDER BY clause is omitted, but applications should not depend on that fact since it might change.) </ul> <tcl>hd_fragment rcex1</tcl> <h4>Recursive Query Examples</h4> <p>The following query returns all integers between 1 and 1000000: <blockquote><pre> WITH RECURSIVE cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000) SELECT x FROM cnt; </pre></blockquote> <p>Consider how this query works. The initial-select, which is this case is just "VALUES(1") runs first and returns a single row with a single column "1". This one row is added to the queue. In step 2a, that one row is extracted from the queue and added to "cnt". Then the recursive query is run in accordance with step 2c generating a single new row with value "2" to add to the queue. The queue still has one row, so step 2 repeats. The 2 is extracted and added to the recursive table by steps 2a and 2b. Then the row containing 2 is used as if where the complete content of the recursive table and the recursive-select is run again, resulting in a value of 3 being added to the queue. This repeats 999999 times until finally at step 2a the only value on the queue is a row containing 1000000. That row is extracted and added to the recursive table. But this time, the WHERE clause causes the recursive-select to return no rows, so the queue remains empty and the recursion stops. <p><b>Optimization note:</b> In the discussion above, when we say things like "write the row into the recursive table", we mean than conceptually, not in reality. Read literally, it sounds as if SQLite is accumulating a huge table containing one million rows, then going back and scanning that table from top to bottom to generate the result. What really happens though, |
︙ | ︙ | |||
3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 | left in the queue. On more complex queries, it can sometimes be difficult to ensure that the WHERE clause will eventually cause the queue to drain and the recursion to terminate. But the LIMIT clause will aways stop the recursions. So it is good practice to always include a LIMIT clause as a safety if an upper bound on the size of the recursion is known. <h3>Limitations And Caveats</h3> <ul> <li><p> The WITH clause cannot be used within a [CREATE TRIGGER]. <li><p> The WITH clause must appear at the beginning of a top-level [SELECT] statement or at the beginning of a subquery. The WITH clause cannot be prepended to the second or subsequent SELECT statement of a [compound select]. <li><p> The SQL:1999 spec requires that the RECURSIVE keyword follow WITH in any | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 3274 3275 3276 3277 3278 3279 3280 3281 3282 3283 3284 3285 3286 3287 3288 3289 3290 3291 3292 3293 3294 3295 3296 3297 3298 3299 3300 3301 3302 3303 3304 3305 3306 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 3443 3444 3445 3446 3447 3448 3449 3450 3451 3452 3453 3454 3455 3456 3457 3458 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 3483 3484 3485 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 3514 | left in the queue. On more complex queries, it can sometimes be difficult to ensure that the WHERE clause will eventually cause the queue to drain and the recursion to terminate. But the LIMIT clause will aways stop the recursions. So it is good practice to always include a LIMIT clause as a safety if an upper bound on the size of the recursion is known. <tcl>hd_fragment rcex2</tcl> <h4>A Hierarchical Query</h4> <p>Consider a table that describes the members of some organization as well as the chain-of-command: <blockquote><pre> CREATE TABLE org( name TEXT PRIMARY KEY, boss TEXT REFERENCES org, height INT, -- other content omitted ); </pre></blockquote> <p>Every member in the organization has a name, and most members have a single boss. The head of the organization has no boss and so the boss field would be NULL for that one row. The rows of the table form a tree. <p>Here is a query that computes the average height over everyone in Alice's organization: <blockquote><pre> WITH RECURSIVE works_for_alice(n) AS ( VALUES('Alice') UNION SELECT name FROM org, works_for_alice WHERE org.boss=works_for_alice.n ) SELECT avg(height) FROM org WHERE org.name IN works_for_alice; </pre></blockquote> <p>Next we considerer a more complicated example that uses two common table expressions in a single WITH clause. Consider a table that records a family tree: <blockquote><pre> CREATE TABLE family( name TEXT PRIMARY KEY, mom TEXT REFERENCES family, dad TEXT REFERENCES family, born DATETIME, died DATETIME, -- NULL if still alive -- other content ); </pre></blockquote> <p>The "family" table is similar to the earlier "org" table except that now there are two parents to each member (as there normally is in real life). We want to know all living ancestors of Alice, from oldest to youngest. An ordinary common table expression, "parent_of", is defined first. That ordinary CTE is a view that can be used to find all parents of any individual. That ordinary CTE is then used in the "ancestor_of_alice" recursive CTE. The recursive CTE is then used in the final query: <blockquote><pre> WITH RECURSIVE parent_of(name, parent) AS (SELECT name, mom FROM family UNION SELECT name, dad FROM family), ancestor_of_alice(name) AS (SELECT parent FROM parent_of WHERE name='Alice' UNION ALL SELECT parent FROM parent_of JOIN ancestor_of_alice USING(name)) SELECT family.name FROM ancestor_of_alice, family WHERE ancestor_of_alice.name=family.name AND died IS NULL ORDER BY born; </pre></blockquote> <tcl>hd_fragment rcex2</tcl> <h4>Queries Against A Graph</h4> <p>A version control system (VCS) will typically store the evolving versions of a project as a directed acyclic graph (DAG). Call each version of the project a "checkin". A single checkin can have zero or more parents. Most checkins (except the first) have a single parent, but in the case of a merge, a checkin might have two or three or more parents. A schema to keep track of checkins and the order in which they occur might look something like this: <blockquote><pre> CREATE TABLE checkin( id INTEGER PRIMARY KEY, mtime INTEGER -- timestamp when this checkin occurred ); CREATE TABLE derivedfrom( xfrom INTEGER NOT NULL REFERENCES checkin, -- parent checkin xto INTEGER NOT NULL REFERENCES checkin, -- derived checkin PRIMARY KEY(xfrom,xto) ); CREATE INDEX derivedfrom_back ON derivedfrom(xto,xfrom); </pre></blockquote> <p>This graph is acyclic. And we assume that the mtime of every child checkin is no less than the mtime of all its parents. But unlike the earlier examples, there might be multiple paths of differing lengths between any two checkins. <p>We want to know the twenty most recent ancestors in time (out of the thousands and thousands of ancestors in the whole DAG) for checkin "@BASELINE". (A query similar to this is used by the <a href="http://www.fossil-scm.org/">Fossil</a> VCS to show the 20 most recent ancestors of a check. For example: <a href="http://www.sqlite.org/src/timeline?p=trunk&n=20">http://www.sqlite.org/src/timeline?p=trunk&n=20</a>.) <blockquote><pre> WITH RECURSIVE ancestor(id,mtime) AS ( SELECT id, mtime FROM checkin WHERE id=@BASELINE UNION SELECT derivedfrom.xfrom, checkin.mtime FROM ancestor, derivedfrom, checkin WHERE ancestor.id=derivedfrom.xto AND checkin.id=derivedfrom.xfrom ORDER BY checkin.mtime DESC LIMIT 20 ) SELECT * FROM checkin JOIN ancestor ON id=xfrom; </pre></blockquote> <p> The "ORDERBY checkin.mtime DESC" term in the recursive query makes the query run much faster by preventing it from following branches that merge checkins from long ago. The ORDER BY forces the recursive query to focus on the most recent checkins, the ones we want. Without the ORDER BY on the recursive-select, we would be forced to find all ancestors of the desired check-in, sort them all by mtime, then take the top twenty. The ORDER BY essentially sets up a priorty queue that forces the recursive query to look at the most recent ancestors first, allowing us to insert a LIMIT close and restrict the scope of the query to just the checkins of interest. <tcl>hd_fragment rcex3</tcl> <h4>Outlandish Recursive Query Examples</h4> <p>The following query computes an approximation of the Mandelbrot Set and outputs the result as ASCII-art: <blockquote><pre> WITH RECURSIVE xaxis(x) AS (VALUES(-2.0) UNION ALL SELECT x+0.05 FROM xaxis WHERE x<1.2), yaxis(y) AS (VALUES(-1.0) UNION ALL SELECT y+0.1 FROM yaxis WHERE y<1.0), m(iter, cx, cy, x, y) AS ( SELECT 0, x, y, 0.0, 0.0 FROM xaxis, yaxis UNION ALL SELECT iter+1, cx, cy, x*x-y*y + cx, 2.0*x*y + cy FROM m WHERE (x*x + y*y) < 4.0 AND iter<28 ), m2(iter, cx, cy) AS ( SELECT max(iter), cx, cy FROM m GROUP BY cx, cy ), a(t) AS ( SELECT group_concat( substr(' .+*#', 1+min(iter/7,4), 1), '') FROM m2 GROUP BY cy ) SELECT group_concat(rtrim(t),x'0a') FROM a; </pre></blockquote> <p>In theis query, the "xaxis" and "yaxis" CTEs the grid of points for which the Mandelbrot Set will be approximated. The "m(iter,cx,cy,x,y)" CTE means that after "iter" iterations, the Mandelbrot iteration starting at cx,cy has reached point x,y. The number of iterations in this example is limited to 28 (which severely limits the resolution of the computation, but this is just an example, after all). The "m2(iter,cx,cy)" CTE holds the maximum number of iterations reached when starting at point cx,cy. Finally, the entry in the "a(t)" CTE holds a string which is a single line of the output ASCII-art. The SELECT statement at the end just queries the "a" CTE to retrieve all lines of ASCII-art, one by one. <p>If you copy/paste the query above into an SQLite [command-line shell], the output should be the following: <blockquote><pre> ....# ..#*.. ..+####+. .......+####.... + ..##+*##########+.++++ .+.##################+. .............+###################+.+ ..++..#.....*#####################+. ...+#######++#######################. ....+*################################. #############################################... ....+*################################. ...+#######++#######################. ..++..#.....*#####################+. .............+###################+.+ .+.##################+. ..##+*##########+.++++ .......+####.... + ..+####+. ..#*.. ....# +. </pre></blockquote> <tcl>hd_fragment mandelbrot</tcl> <p>This next query solves a Sudoku puzzle. The state of the puzzle is defined by an 81-character string formed by reading entries from the puzzle box row by row from left to right and then from top to bottom. Block spots are denoted by a "." character. Thus the input string <blockquote> 53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79 </blockquote> <p>Corresponds to a puzzle like this: <blockquote> <table border="1" cellpadding="5"> <tr><td>5<td>3<td> <td> <td>7<td> <td> <td> <td> <tr><td>6<td> <td> <td>1<td>9<td>5<td> <td> <td> <tr><td> <td>9<td>8<td> <td> <td> <td> <td>6<td> <tr><td>8<td> <td> <td> <td>6<td> <td> <td> <td>3 <tr><td>4<td> <td> <td>8<td> <td>3<td> <td> <td>1 <tr><td>7<td> <td> <td> <td>2<td> <td> <td> <td>6 <tr><td> <td>6<td> <td> <td> <td> <td>2<td>8<td> <tr><td> <td> <td> <td>4<td>1<td>9<td> <td> <td>5 <tr><td> <td> <td> <td> <td>8<td> <td> <td>7<td>9 </table> </blockquote> <p>This is the query that solves the puzzle: <blockquote><pre> WITH RECURSIVE input(sud) AS ( VALUES('53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79') ), digits(z, lp) AS ( VALUES('1', 1) UNION ALL SELECT CAST(lp+1 AS TEXT), lp+1 FROM digits WHERE lp<9 ), x(s, ind) AS ( SELECT sud, instr(sud, '.') FROM input UNION ALL SELECT substr(s, 1, ind-1) || z || substr(s, ind+1), instr( substr(s, 1, ind-1) || z || substr(s, ind+1), '.' ) FROM x, digits AS z WHERE ind>0 AND NOT EXISTS ( SELECT 1 FROM digits AS lp WHERE z.z = substr(s, ((ind-1)/9)*9 + lp, 1) OR z.z = substr(s, ((ind-1)%9) + (lp-1)*9 + 1, 1) OR z.z = substr(s, (((ind-1)/3) % 3) * 3 + ((ind-1)/27) * 27 + lp + ((lp-1) / 3) * 6, 1) ) ) SELECT s FROM x WHERE ind=0; </pre></blockquote> <p>The "input" CTE defines the input puzzle. The "digits" CTE defines a table that holds all digits between 1 and 9. The work of solving the puzzle is undertaking by the "x" CTE. An entry in x(s,ind) means that the 81-character string "s" is a valid sudoku puzzle (it has no conflicts) and that the first unknown character is at position "ind", or ind==0 if all character positions are filled in. The goal, then, is to compute entries for "x" with an "ind" of 0. <p>The solver works by adding new entries to the "x" recursive table. Given prior entries, the recursive-select tries to fill in a single new position with all values between 1 and 9 that actually work in that position. The complicated "NOT EXISTS" subquery is the magic that figures out whether or not each candidate "s" string is a valid sudoku puzzle or not. <p>The final answer is found by looking for a string with ind==0. If the original sudoku problem did not have a unique solution, then the query will return all possible solutions. If the original problem was unsolvable, then no rows will be returned. <h3>Limitations And Caveats</h3> <ul> <li><p> The WITH clause cannot be used within a [CREATE TRIGGER]. <li><p> The WITH clause must appear at the beginning of a top-level [SELECT] statement or at the beginning of a subquery. The WITH clause cannot be prepended to the second or subsequent SELECT statement of a [compound select]. <li><p> The SQL:1999 spec requires that the RECURSIVE keyword follow WITH in any WITH clause that includes a recursive common table expression. However, for compatibility with SqlServer and Oracle, SQLite does not enforce this rule. </ul> <tcl> ############################################################################### Section SELECT select {SELECT query} |
︙ | ︙ |