Documentation Source Text

Check-in [52484ca596]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Incremental checkin of work on the common-table-expression documentation.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 52484ca596d71ae91758706b648004c9892a177e
User & Date: drh 2014-01-23 22:25:07
Context
2014-01-24
01:20
First cut at WITH documentation. check-in: 2636279279 user: drh tags: trunk
2014-01-23
22:25
Incremental checkin of work on the common-table-expression documentation. check-in: 52484ca596 user: drh tags: trunk
18:37
Refactoring many of the bubble syntax diagrams, especially related to SELECT. Attempting to make the bubble diagrams easier to follow. check-in: 9aa1b8df9b user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/lang.in.

    26     26       {{DROP TABLE} droptable}
    27     27       {{DROP INDEX} dropindex}
    28     28       {INSERT insert}
    29     29       {REPLACE replace}
    30     30       {DELETE delete}
    31     31       {UPDATE update}
    32     32       {SELECT select}
           33  +    {{WITH clause} withclause}
    33     34       {comment comment}
    34     35       {EXPLAIN explain}
    35     36       {expression expr}
    36     37       {{BEGIN TRANSACTION} transaction}
    37     38       {{COMMIT TRANSACTION} transaction COMMIT}
    38     39       {{END TRANSACTION} transaction COMMIT}
    39     40       {{ROLLBACK TRANSACTION} transaction ROLLBACK}
................................................................................
  1509   1510   The EXPLAIN QUERY PLAN command is described in 
  1510   1511   [explain query plan|more detail here].
  1511   1512   
  1512   1513   <tcl>
  1513   1514   ##############################################################################
  1514   1515   Section expression expr {*expression {expression syntax}}
  1515   1516   
  1516         -RecursiveBubbleDiagram expr 1
         1517  +RecursiveBubbleDiagram expr
  1517   1518   </tcl>
  1518   1519   
  1519   1520   <p>This section is different from the others.  Most other sections of
  1520   1521   this document talks about a particular SQL command.  This section does
  1521   1522   not talk about a standalone command but about "expressions" which are 
  1522   1523   subcomponents of most other commands.</p>
  1523   1524   
................................................................................
  3023   3024   </tcl>
  3024   3025   
  3025   3026   <p>^The REPLACE command is an alias for the "[ON CONFLICT | INSERT OR REPLACE]"
  3026   3027   variant of the [INSERT] command.  
  3027   3028   This alias is provided for compatibility other SQL database engines.  See the 
  3028   3029   [INSERT] command documentation for additional information.</p>  
  3029   3030   
         3031  +<tcl>
         3032  +###############################################################################
         3033  +Section {WITH clause} with {{common table expressions}}
         3034  +
         3035  +RecursiveBubbleDiagram with-clause
         3036  +</tcl>
         3037  +
         3038  +<p>Common Table Expressions or CTEs are temporary views or tables that exist
         3039  +only for the duration of a single SQL statement.  There are two kinds of
         3040  +common table expressions: "ordinary" and "recursive". Ordinary 
         3041  +common table expressions are
         3042  +a syntactic convenience that make queries easier to read by separating out
         3043  +subqueries.  Recursive common table expression
         3044  +provide the ability to do a hiearchical or
         3045  +recursive query of a tree or graph.
         3046  +
         3047  +All common table expressions (ordinary and recursive) are 
         3048  +created by prepending a WITH clause in front of a [SELECT], [INSERT], [DELETE],
         3049  +or [UPDATE] statement.  A single WITH clause can specify one or more
         3050  +common table expressions.
         3051  +
         3052  +<tcl>hd_fragment ordinarycte {ordinary common table expressions}</tcl>
         3053  +<h3>Ordinary Common Table Expressions</h3>
         3054  +
         3055  +<p>An ordinary common table expression works as if it where a [view] that
         3056  +exists for the duration of a single statement.  Ordinary common table
         3057  +expressions are useful for factoring out subqueries and making the overall
         3058  +SQL statement easier to read and understand.
         3059  +
         3060  +<tcl>
         3061  +hd_fragment recursivecte {recursive common table expressions} \
         3062  +{recursive query}
         3063  +</tcl>
         3064  +<h3>Recursive Common Table Expressions</h3>
         3065  +
         3066  +<p>A recursive common table expression can be used to write a query that
         3067  +walks a tree or graph.  A recursive common table expression has the same
         3068  +basic syntax as an ordinary common table expression, but with the following
         3069  +additional features:
         3070  +
         3071  +<ol>
         3072  +<li> The "select-stmt" after the AS keyword in the common table expression
         3073  +     must be a [compound select] where the right-most compound operator is
         3074  +     either UNION or UNION ALL.
         3075  +<li> The table named on the left-hand side of the AS keyword must appear
         3076  +     exactly once in the FROM clause of the right-most SELECT statement
         3077  +     of the compound select, and nowhere else.
         3078  +</ol>
         3079  +
         3080  +<p>To put it another way, a recursive common table expression must
         3081  +look like the following:
         3082  +
         3083  +RecursiveBubbleDiagram recursive-cte
         3084  +
         3085  +<p>Call the cte-table-name for a recursive common table expression
         3086  +the "recursive table".  In the bubble diagram above, the recursive
         3087  +table must appear exactly once in the FROM clause of the recursive-select
         3088  +and must not appear anywhere else in either the initial-select or the
         3089  +recursive-select, including subqueries.  The initial-select may be
         3090  +a compound-select, but it may not include an ORDER BY, LIMIT, or OFFSET.
         3091  +The recursive-select must be a simple select (not a compound).  The
         3092  +recursive-select is allowed to include an ORDER BY, LIMIT, and/or OFFSET.
         3093  +
         3094  +<p>The basic algorithm for computing the content of the recursive table
         3095  +is as follows:
         3096  +
         3097  +<ol>
         3098  +<li> Run the initial-select and add the results to a queue.
         3099  +<li> While the queue is not empty:
         3100  +<ol type="a">
         3101  +<li> Extract a single row from the queue.
         3102  +<li> Write that single row into the recursive table
         3103  +<li> Pretend that the single row just extracted is the only
         3104  +     row in the recursive table and run the recursive-select,
         3105  +     adding all results to the queue.
         3106  +</ol>
         3107  +</ol>
         3108  +
         3109  +<p>The basic procedure above may modified by the following additional rules:
         3110  +
         3111  +<ul>
         3112  +<li><p>
         3113  +  If the the initial-select and the recursive-select are connected by
         3114  +  UNION, then only add rows to the queue if no identical row has been
         3115  +  previously added to the queue.  Repeated rows are discarded before being
         3116  +  added to the queue even if the repeated rows have already been extracted
         3117  +  from the queue by the recursion stop.  If the operator is UNION ALL,
         3118  +  then all rows generated by both the initial-select and the
         3119  +  recursive-select are always added to the queue even if there are repeats.
         3120  +<li><p>
         3121  +  The LIMIT clause, if present, restrict which rows are added
         3122  +  to the recursive table in step 2a.  If a LIMIT clause is present, it 
         3123  +  restricts the total number of rows that will be added to the recursive table.
         3124  +  Once the limit is reached, the recursion stops.  (As is always the case 
         3125  +  with LIMIT, a limit of zero means that no rows are ever added to the
         3126  +  recursive table, and a negative limit means an unlimited number of rows
         3127  +  may be added to the recursive table.)
         3128  +<li><p>
         3129  +  The OFFSET clause, if it present and has a positive value N, prevents the
         3130  +  first N rows from being added to the recursive table.
         3131  +  The first N rows are still processed by the recursive-select; they
         3132  +  just are not added to the recursive table.
         3133  +<li><p>
         3134  +  If an ORDER BY clause is present, it determines the order in which rows
         3135  +  are extracted from the queue in step 2a.  If there is no ORDER BY clause,
         3136  +  then the order in which rows are extracted is undefined.  (In the current
         3137  +  implementation, the queue becomes a FIFO if the ORDER BY clause is omitted,
         3138  +  but applications should not depend on that fact since it might change.)
         3139  +</ul>
         3140  +
         3141  +<h4>Recursive Query Examples</h4>
         3142  +
         3143  +<p>The following query returns all integers between 1 and 1000000:
         3144  +
         3145  +<blockquote><pre>
         3146  +WITH RECURSIVE
         3147  +  cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000)
         3148  +SELECT x FROM cnt;
         3149  +</pre></blockquote>
         3150  +
         3151  +<p>Consider how this query works.  The initial-select, which is this
         3152  +case is just "VALUES(1") runs first (step 1) and returns a single row
         3153  +with a single column "1".  This one row is added to the queue.  In
         3154  +step 2a, that one row is extracted from the queue and added to "cnt".
         3155  +Then the recursive query is run in accordance with step 2c generating
         3156  +a single new row with value "2" to add to the queue.  The queue still
         3157  +has one row, so step 2 repeats.  The 2 is extracted and added to the
         3158  +recursive table by steps 2a and 2b.  Then the row containing 2 is used 
         3159  +as if where the complete content of the recursive table and the 
         3160  +recursive select is run again, resulting in a value of 3 being added
         3161  +to the queue.  This repeats 999999 times until finally at step 2a the
         3162  +only value on the queue is a row containing 1000000.  That row is
         3163  +extracted and added to the recursive table.  But this time, the
         3164  +WHERE clause of the recursive select prevents it from generating any
         3165  +new rows.  On the next cycle, the queue is empty and the recursion
         3166  +stops.
         3167  +
         3168  +<p><b>Optimization note:</b>
         3169  +In the discussion above, when we say things like "write the row into
         3170  +the recursive table", we mean than conceptually, not in reality.
         3171  +Read literally, it sounds as if SQLite is accumulating a huge table
         3172  +containing one million rows, then going back and scanning that table
         3173  +from top to bottom to generate the result.  What really happens though,
         3174  +in this case, is that the query optimizer realizes that values in the
         3175  +"cnt" recursive table are only used once.  So as each row is added to
         3176  +the recursive table, it is immediately returned as a result of the
         3177  +SELECT statement and then discarded.  SQLite does <em>not</em> accumulate
         3178  +a temporary table containing a million rows.  Very little memory is
         3179  +needed to run the above example.  However, if the example had used
         3180  +UNION instead of UNION ALL, then SQLite would have had to keep around
         3181  +all previously generated content in order to check for duplicates.
         3182  +For this reason, programmers should strive to use UNION ALL instead
         3183  +of UNION when feasible.
         3184  +
         3185  +<p>Here is a variation on the same example:
         3186  +
         3187  +<blockquote><pre>
         3188  +WITH RECURSIVE
         3189  +  cnt(x) AS (
         3190  +     SELECT 1
         3191  +     UNION ALL
         3192  +     SELECT x+1 FROM cnt LIMIT 1000000
         3193  +  )
         3194  +SELECT x FROM cnt;
         3195  +</pre></blockquote>
         3196  +
         3197  +<p>There are two differences in this variation.  The initial-select is
         3198  +"SELECT 1" instead of "VALUES(1)".  But those are really just different
         3199  +syntaxes for saying exactly the same thing.  The other change is that the
         3200  +recursion is stopped by a LIMIT rather than a WHERE clause.  The use of
         3201  +LIMIT means that when the one-millionth row is added to the "cnt" table
         3202  +(and immediately returned by the SELECT, thanks to the query optimizer)
         3203  +then the recursion stops immediately regardless of how many rows might be
         3204  +left in the queue.  On more complex queries, it can sometimes be
         3205  +difficult to ensure that the WHERE clause will eventually cause the
         3206  +queue to drain and the recursion to terminate.  But the LIMIT clause will
         3207  +aways stop the recursions.  So it is good practice to always include a
         3208  +LIMIT clause as a safety if an upper bound on the size of the recursion 
         3209  +is known.
         3210  +
         3211  +
         3212  +
         3213  +
         3214  +<h3>Limitations And Caveats</h3>
         3215  +
         3216  +<ul>
         3217  +<li><p>
         3218  +The WITH clause cannot be used within a [CREATE TRIGGER].
         3219  +<li><p>
         3220  +The WITH clause must appear at the beginning of a top-level [SELECT] statement
         3221  +or at the beginning of a subquery.  The WITH clause cannot be prepended to
         3222  +the second or subsequent SELECT statement of a [compound select].
         3223  +<li><p>
         3224  +The SQL:1999 spec requires that the RECURSIVE keyword follow WITH in any
         3225  +WITH clause that includs a recursive common table expression.  However, for
         3226  +compatibility with SqlServer and Oracle, SQLite does not enforce this rule.
         3227  +</ul>
         3228  +
  3030   3229   <tcl>
  3031   3230   ###############################################################################
  3032   3231   Section SELECT select {SELECT query}
  3033   3232   
  3034   3233   RecursiveBubbleDiagram select-stmt
  3035   3234   </tcl>
  3036   3235   
................................................................................
  3667   3866      CAST
  3668   3867      CHECK
  3669   3868      COLLATE
  3670   3869      COLUMN
  3671   3870      COMMIT
  3672   3871      CONFLICT
  3673   3872      CONSTRAINT
  3674         -   COVERING
  3675   3873      CREATE
  3676   3874      CROSS
  3677   3875      CURRENT_DATE
  3678   3876      CURRENT_TIME
  3679   3877      CURRENT_TIMESTAMP
  3680   3878      DATABASE
  3681   3879      DEFAULT
................................................................................
  3734   3932      ORDER
  3735   3933      OUTER
  3736   3934      PLAN
  3737   3935      PRAGMA
  3738   3936      PRIMARY
  3739   3937      QUERY
  3740   3938      RAISE
         3939  +   RECURSIVE
  3741   3940      REFERENCES
  3742   3941      REGEXP
  3743   3942      REINDEX
  3744   3943      RELEASE
  3745   3944      RENAME
  3746   3945      REPLACE
  3747   3946      RESTRICT
................................................................................
  3762   3961      UNIQUE
  3763   3962      UPDATE
  3764   3963      USING
  3765   3964      VACUUM
  3766   3965      VALUES
  3767   3966      VIEW
  3768   3967      VIRTUAL
         3968  +   WITH
         3969  +   WITHOUT
  3769   3970      WHEN
  3770   3971      WHERE
  3771         -   WITH
  3772         -   WITHOUT
  3773   3972   }]
  3774   3973   
  3775   3974   hd_puts {<DIV class="pdf_section">}
  3776   3975   Section {SQLite Keywords} keywords {{*SQL keyword} {SQL keywords}}
  3777   3976   hd_puts {</DIV>}
  3778   3977   </tcl>
  3779   3978   
................................................................................
  3827   4026   is allowed, then the token is understood to be a string literal instead
  3828   4027   of an identifier.</p></li>
  3829   4028   </ul>
  3830   4029   
  3831   4030   <p>Programmers are cautioned not to use the two exceptions described in
  3832   4031   the previous bullets.  We emphasize that they exist only so that old
  3833   4032   and ill-formed SQL statements will run correctly.  Future versions of
  3834         -SQLite might change to raise errors instead of accepting the malformed
         4033  +SQLite might raise errors instead of accepting the malformed
  3835   4034   statements covered by the exceptions above.</p>
  3836   4035   
  3837   4036   <p>
  3838   4037   SQLite adds new keywords from time to time when it takes on new features.
  3839   4038   So to prevent your code from being broken by future enhancements, you should
  3840   4039   normally quote any identifier that is an English language word, even if
  3841   4040   you do not have to.