Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add examples to WITH documentation showing how to control depth-first versus breath-first search of a tree using an ORDER BY clause on the recursive query. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
fb11eda20b22f82e164c237016167699 |
User & Date: | drh 2014-01-24 13:20:25.547 |
Context
2014-01-24
| ||
23:43 | Fix typos in the WITH clause documentation reported on the mailing list. (check-in: 5415be54c0 user: drh tags: trunk) | |
13:20 | Add examples to WITH documentation showing how to control depth-first versus breath-first search of a tree using an ORDER BY clause on the recursive query. (check-in: fb11eda20b user: drh tags: trunk) | |
04:05 | Further refinement and cleanup of the WITH and SELECT documentation. (check-in: 2941cf658c user: drh tags: trunk) | |
Changes
Changes to pages/lang.in.
︙ | ︙ | |||
3031 3032 3033 3034 3035 3036 3037 | <tcl> ############################################################################### Section {WITH clause} with {{common table expressions}} RecursiveBubbleDiagram with-clause </tcl> | | | | 3031 3032 3033 3034 3035 3036 3037 3038 3039 3040 3041 3042 3043 3044 3045 3046 3047 3048 3049 3050 3051 3052 | <tcl> ############################################################################### Section {WITH clause} with {{common table expressions}} RecursiveBubbleDiagram with-clause </tcl> <p>Common Table Expressions or CTEs act like temporary [views] that exist only for the duration of a single SQL statement. There are two kinds of common table expressions: "ordinary" and "recursive". Ordinary common table expressions are helpful for making queries easier to understand by factoring subqueries out of the main SQL statement. Recursive common table expression provide the ability to do hierarchical or recursive queries of trees and graphs, a capability that is not otherwise available in the SQL language. All common table expressions (ordinary and recursive) are created by prepending a WITH clause in front of a [SELECT], [INSERT], [DELETE], or [UPDATE] statement. A single WITH clause can specify one or more common table expressions. |
︙ | ︙ | |||
3067 3068 3069 3070 3071 3072 3073 | <p>A recursive common table expression can be used to write a query that walks a tree or graph. A recursive common table expression has the same basic syntax as an ordinary common table expression, but with the following additional features: <ol> | | | > | | | | | | | | | > > | | < | | | | 3067 3068 3069 3070 3071 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125 3126 3127 3128 3129 3130 3131 3132 3133 3134 3135 3136 3137 3138 | <p>A recursive common table expression can be used to write a query that walks a tree or graph. A recursive common table expression has the same basic syntax as an ordinary common table expression, but with the following additional features: <ol> <li> The "select-stmt" after the AS keyword must be a [compound select] where the right-most [compound-operator] is either UNION or UNION ALL. <li> The table named on the left-hand side of the AS keyword must appear exactly once in the FROM clause of the right-most SELECT statement of the compound select, and nowhere else. </ol> <p>To put it another way, a recursive common table expression must look like the following: <tcl>RecursiveBubbleDiagram recursive-cte</tcl> <p>We refer to the table named by the cte-table-name in a recursive common table expression as the "recursive table". In the recursive-cte bubble diagram above, the recursive table must appear exactly once in the FROM clause of the recursive-select and must not appear anywhere else in either the initial-select or the recursive-select, including subqueries. The initial-select may be a [compound select], but it may not include an ORDER BY, LIMIT, or OFFSET. The recursive-select must be a simple select, not a compound. The recursive-select is allowed to include an ORDER BY, LIMIT, and/or OFFSET. <p>The basic algorithm for computing the content of the recursive table is as follows: <ol> <li> Run the initial-select and add the results to a queue. <li> While the queue is not empty: <ol type="a"> <li> Extract a single row from the queue. <li> Insert that single row into the recursive table <li> Pretend that the single row just extracted is the only row in the recursive table and run the recursive-select, adding all results to the queue. </ol> </ol> <p>The basic procedure above may modified by the following additional rules: <ul> <li><p> If a UNION operator connects the the initial-select with the recursive-select, then only add rows to the queue if no identical row has been previously added to the queue. Repeated rows are discarded before being added to the queue even if the repeated rows have already been extracted from the queue by the recursion step. If the operator is UNION ALL, then all rows generated by both the initial-select and the recursive-select are always added to the queue even if they are repeats. When determining if a row is repeated, NULL values compare equal to one another and not equal to any other value. <li><p> The LIMIT clause, if present, determines the maximum number of rows that will ever be added to the recursive table in step 2b. Once the limit is reached, the recursion stops. A limit of zero means that no rows are ever added to the recursive table, and a negative limit means an unlimited number of rows may be added to the recursive table. <li><p> The OFFSET clause, if it is present and has a positive value N, prevents the first N rows from being added to the recursive table. The first N rows are still processed by the recursive-select; they just are not added to the recursive table. Rows are not counted toward fulfilling the LIMIT until all OFFSET rows have been skipped. <li><p> |
︙ | ︙ | |||
3148 3149 3150 3151 3152 3153 3154 | <blockquote><pre> WITH RECURSIVE cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000) SELECT x FROM cnt; </pre></blockquote> | | | | | | | | | | | | | | | | 3150 3151 3152 3153 3154 3155 3156 3157 3158 3159 3160 3161 3162 3163 3164 3165 3166 3167 3168 3169 3170 3171 3172 3173 3174 3175 3176 3177 3178 3179 3180 3181 3182 3183 3184 3185 3186 3187 3188 3189 3190 3191 3192 3193 3194 3195 3196 3197 3198 3199 3200 3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 | <blockquote><pre> WITH RECURSIVE cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000) SELECT x FROM cnt; </pre></blockquote> <p>Consider how this query works. The initial-select runs first and returns a single row with a single column "1". This one row is added to the queue. In step 2a, that one row is extracted from the queue and added to "cnt". Then the recursive-select is run in accordance with step 2c generating a single new row with value "2" to add to the queue. The queue still has one row, so step 2 repeats. The "2" row is extracted and added to the recursive table by steps 2a and 2b. Then the row containing 2 is used as if where the complete content of the recursive table and the recursive-select is run again, resulting in a row with value "3" being added to the queue. This repeats 999999 times until finally at step 2a the only value on the queue is a row containing 1000000. That row is extracted and added to the recursive table. But this time, the WHERE clause causes the recursive-select to return no rows, so the queue remains empty and the recursion stops. <p><b>Optimization note:</b> In the discussion above, statements like "insert the row into the recursive table" should be understood conceptually, not literally. It sounds as if SQLite is accumulating a huge table containing one million rows, then going back and scanning that table from top to bottom to generate the result. What really happens is that the query optimizer sees that values in the "cnt" recursive table are only used once. So as each row is added to the recursive table, that row is immediately returned as a result of the main SELECT statement and then discarded. SQLite does <em>not</em> accumulate a temporary table containing a million rows. Very little memory is needed to run the above example. However, if the example had used UNION instead of UNION ALL, then SQLite would have had to keep around all previously generated content in order to check for duplicates. For this reason, programmers should strive to use UNION ALL instead of UNION when feasible. <p>Here is a variation on the previous example: <blockquote><pre> WITH RECURSIVE cnt(x) AS ( SELECT 1 UNION ALL SELECT x+1 FROM cnt LIMIT 1000000 ) SELECT x FROM cnt; </pre></blockquote> <p>There are two differences in this variation. The initial-select is "SELECT 1" instead of "VALUES(1)". But those are just different syntaxes for saying exactly the same thing. The other change is that the recursion is stopped by a LIMIT rather than a WHERE clause. The use of LIMIT means that when the one-millionth row is added to the "cnt" table (and returned by the main SELECT, thanks to the query optimizer) then the recursion stops immediately regardless of how many rows might be left in the queue. On more complex queries, it can sometimes be difficult to ensure that the WHERE clause will eventually cause the queue to drain and the recursion to terminate. But the LIMIT clause will always stop the recursion. So it is good practice to always include a LIMIT clause as a safety if an upper bound on the size of the recursion is known. |
︙ | ︙ | |||
3223 3224 3225 3226 3227 3228 3229 | boss TEXT REFERENCES org, height INT, -- other content omitted ); </pre></blockquote> <p>Every member in the organization has a name, and most members have | | | < | | | 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 3235 3236 3237 3238 3239 3240 3241 3242 3243 3244 3245 3246 3247 3248 3249 3250 3251 3252 3253 3254 3255 3256 3257 3258 3259 3260 3261 3262 3263 3264 3265 3266 3267 3268 3269 3270 3271 3272 3273 | boss TEXT REFERENCES org, height INT, -- other content omitted ); </pre></blockquote> <p>Every member in the organization has a name, and most members have a single boss. (The head of the whole organization has a NULL "boss" field.) The rows of the "org" table form a tree. <p>Here is a query that computes the average height over everyone in Alice's organization, including Alice: <blockquote><pre> WITH RECURSIVE works_for_alice(n) AS ( VALUES('Alice') UNION SELECT name FROM org, works_for_alice WHERE org.boss=works_for_alice.n ) SELECT avg(height) FROM org WHERE org.name IN works_for_alice; </pre></blockquote> <p>The next example uses two common table expressions in a single WITH clause. The following table records a family tree: <blockquote><pre> CREATE TABLE family( name TEXT PRIMARY KEY, mom TEXT REFERENCES family, dad TEXT REFERENCES family, born DATETIME, died DATETIME, -- NULL if still alive -- other content ); </pre></blockquote> <p>The "family" table is similar to the earlier "org" table except that now there are two parents to each member. We want to know all living ancestors of Alice, from oldest to youngest. An ordinary common table expression, "parent_of", is defined first. That ordinary CTE is a view that can be used to find all parents of any individual. That ordinary CTE is then used in the "ancestor_of_alice" recursive CTE. The recursive CTE is then used in the final query: <blockquote><pre> |
︙ | ︙ | |||
3306 3307 3308 3309 3310 3311 3312 | PRIMARY KEY(xfrom,xto) ); CREATE INDEX derivedfrom_back ON derivedfrom(xto,xfrom); </pre></blockquote> <p>This graph is acyclic. And we assume that the mtime of every child checkin is no less than the mtime of all its parents. But | | | | | | | | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | 3307 3308 3309 3310 3311 3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 3391 3392 3393 3394 3395 3396 3397 3398 3399 3400 3401 3402 3403 3404 3405 3406 3407 3408 3409 3410 3411 3412 3413 3414 3415 3416 3417 3418 3419 3420 3421 3422 3423 3424 3425 3426 3427 3428 3429 3430 3431 3432 3433 3434 3435 3436 3437 3438 3439 3440 3441 3442 | PRIMARY KEY(xfrom,xto) ); CREATE INDEX derivedfrom_back ON derivedfrom(xto,xfrom); </pre></blockquote> <p>This graph is acyclic. And we assume that the mtime of every child checkin is no less than the mtime of all its parents. But unlike the earlier examples, this graph might have multiple paths of differing lengths between any two checkins. <p>We want to know the twenty most recent ancestors in time (out of the thousands and thousands of ancestors in the whole DAG) for checkin "@BASELINE". (A query similar to this is used by the <a href="http://www.fossil-scm.org/">Fossil</a> VCS to show the N most recent ancestors of a check. For example: <a href="http://www.sqlite.org/src/timeline?p=trunk&n=30">http://www.sqlite.org/src/timeline?p=trunk&n=30</a>.) <blockquote><pre> WITH RECURSIVE ancestor(id,mtime) AS ( SELECT id, mtime FROM checkin WHERE id=@BASELINE UNION SELECT derivedfrom.xfrom, checkin.mtime FROM ancestor, derivedfrom, checkin WHERE ancestor.id=derivedfrom.xto AND checkin.id=derivedfrom.xfrom ORDER BY checkin.mtime DESC LIMIT 20 ) SELECT * FROM checkin JOIN ancestor USING(id); </pre></blockquote> <p> The "ORDERBY checkin.mtime DESC" term in the recursive-select makes the query run much faster by preventing it from following branches that merge checkins from long ago. The ORDER BY forces the recursive-select to focus on the most recent checkins, the ones we want. Without the ORDER BY on the recursive-select, one would be forced to compute the complete set of thousands of ancestors, sort them by all mtime, then take the top twenty. The ORDER BY essentially sets up a priority queue that forces the recursive query to look at the most recent ancestors first, allowing the use of a LIMIT clause to restrict the scope of the query to just the checkins of interest. <tcl>hd_fragment withorderby</tcl> <h4>Controlling Depth-First Versus Breath-First Search Of a Tree Using ORDER BY</h4> <p>An ORDER BY clause on the recursive-select can be used to control whether the search of a tree is depth-first or breath-first. To illustrate, we will use a variation on the "org" table from an example above, without the "height" column, and with some real data inserted: <blockquote><pre> CREATE TABLE org( name TEXT PRIMARY KEY, boss TEXT REFERENCES org ) WITHOUT ROWID; INSERT INTO org VALUES('Alice',NULL); INSERT INTO org VALUES('Bob','Alice'); INSERT INTO org VALUES('Cindy','Alice'); INSERT INTO org VALUES('Dave','Bob'); INSERT INTO org VALUES('Emma','Bob'); INSERT INTO org VALUES('Fred','Cindy'); INSERT INTO org VALUES('Gail','Cindy'); </pre></blockquote> <p>Here is a query to show the tree structure in a breath-first pattern: <blockquote><pre> WITH RECURSIVE under_alice(name,level) AS ( VALUES('Alice',0) UNION ALL SELECT org.name, under_alice.level+1 FROM org JOIN under_alice ON org.name=under_alice.boss ORDER BY 2 ) SELECT substr('..........',1,level*3) || name FROM under_alice; </pre></blockquote> <p>The "ORDER BY 2" (which means the same as "ORDER BY under_alice.level+1") causes higher levels in the organization chart (with smaller "level" values) to be processed first, resulting in a breath-first search. The output is: <blockquote><pre> Alice ...Bob ...Cindy ......Dave ......Emma ......Fred ......Gail </pre></blockquote> <p>But if we change the ORDER BY clause to add the "DESC" modifier, that will cause lower levels in the organization (with larger "level" values) to be processed first by the recursive-select, resulting in a depth-first search: <blockquote><pre> WITH RECURSIVE under_alice(name,level) AS ( VALUES('Alice',0) UNION ALL SELECT org.name, under_alice.level+1 FROM org JOIN under_alice ON org.name=under_alice.boss ORDER BY 2 <b>DESC</b> ) SELECT substr('..........',1,level*3) || name FROM under_alice; </pre></blockquote> <p>The output of this revised query is: <blockquote><pre> Alice ...Bob ......Dave ......Emma ...Cindy ......Fred ......Gail </pre></blockquote> <p>When the ORDER BY clause is omitted from the recursive-select, the queue behaves as a FIFO, which results in a breath-first search. <tcl>hd_fragment mandelbrot</tcl> <h4>Outlandish Recursive Query Examples</h4> <p>The following query computes an approximation of the Mandelbrot Set and outputs the result as ASCII-art: <blockquote><pre> WITH RECURSIVE |
︙ | ︙ | |||
3375 3376 3377 3378 3379 3380 3381 | </pre></blockquote> <p>In this query, the "xaxis" and "yaxis" CTEs define the grid of points for which the Mandelbrot Set will be approximated. Each row in the "m(iter,cx,cy,x,y)" CTE means that after "iter" iterations, the Mandelbrot iteration starting at cx,cy has reached point x,y. The number of iterations in this example is limited to 28 (which severely limits the resolution of | | | | | | 3459 3460 3461 3462 3463 3464 3465 3466 3467 3468 3469 3470 3471 3472 3473 3474 3475 3476 3477 3478 3479 3480 3481 3482 | </pre></blockquote> <p>In this query, the "xaxis" and "yaxis" CTEs define the grid of points for which the Mandelbrot Set will be approximated. Each row in the "m(iter,cx,cy,x,y)" CTE means that after "iter" iterations, the Mandelbrot iteration starting at cx,cy has reached point x,y. The number of iterations in this example is limited to 28 (which severely limits the resolution of the computation, but is sufficient for low-resolution ASCII-art output). The "m2(iter,cx,cy)" CTE holds the maximum number of iterations reached when starting at point cx,cy. Finally, each row in the "a(t)" CTE holds a string which is a single line of the output ASCII-art. The SELECT statement at the end just queries the "a" CTE to retrieve all lines of ASCII-art, one by one. <p>Running the query above into an SQLite [command-line shell] results in the following output: <blockquote><pre> ....# ..#*.. ..+####+. .......+####.... + ..##+*##########+.++++ |
︙ | ︙ | |||
3411 3412 3413 3414 3415 3416 3417 | .......+####.... + ..+####+. ..#*.. ....# +. </pre></blockquote> | | | 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 | .......+####.... + ..+####+. ..#*.. ....# +. </pre></blockquote> <tcl>hd_fragment sudoku</tcl> <p>This next query solves a Sudoku puzzle. The state of the puzzle is defined by an 81-character string formed by reading entries from the puzzle box row by row from left to right and then from top to bottom. Blank squares in the puzzle are denoted by a "." character. Thus the input string <blockquote> |
︙ | ︙ | |||
3489 3490 3491 3492 3493 3494 3495 | position. The complicated "NOT EXISTS" subquery is the magic that figures out whether or not each candidate "s" string is a valid sudoku puzzle or not. <p>The final answer is found by looking for a string with ind==0. If the original sudoku problem did not have a unique solution, then the query will return all possible solutions. If the original problem | | | | > | 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 3589 3590 3591 3592 3593 3594 3595 | position. The complicated "NOT EXISTS" subquery is the magic that figures out whether or not each candidate "s" string is a valid sudoku puzzle or not. <p>The final answer is found by looking for a string with ind==0. If the original sudoku problem did not have a unique solution, then the query will return all possible solutions. If the original problem was unsolvable, then no rows will be returned. In this case, the unique answer is: <blockquote> 534678912672195348198342567859761423426853791713924856961537284287419635345286179 </blockquote> <p>The solution was computed in less than 300 milliseconds on a modern workstation. <h3>Limitations And Caveats</h3> <ul> <li><p> The WITH clause cannot be used within a [CREATE TRIGGER]. <li><p> |
︙ | ︙ |