Documentation Source Text

Check-in [2941cf658c]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Further refinement and cleanup of the WITH and SELECT documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 2941cf658c80e7b301fe17a6eda91891dd5191e0
User & Date: drh 2014-01-24 04:05:30.013
Context
2014-01-24
13:20
Add examples to WITH documentation showing how to control depth-first versus breath-first search of a tree using an ORDER BY clause on the recursive query. (check-in: fb11eda20b user: drh tags: trunk)
04:05
Further refinement and cleanup of the WITH and SELECT documentation. (check-in: 2941cf658c user: drh tags: trunk)
01:20
First cut at WITH documentation. (check-in: 2636279279 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/lang.in.
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
transactions.  ^An attempt to invoke the BEGIN command within
a transaction will fail with an error, regardless of whether
the transaction was started by [SAVEPOINT] or a prior BEGIN.
^The COMMIT command and the ROLLBACK command without the TO clause
work the same on [SAVEPOINT] transactions as they do with transactions
started by BEGIN.</p>

<tcl>hd_fragment immmediate {BEGIN IMMEDIATE} {BEGIN EXCLUSIVE}</tcl>
<p>
^Transactions can be deferred, immediate, or exclusive.  
^The default transaction behavior is deferred.
^Deferred means that no locks are acquired
on the database until the database is first accessed.  ^Thus with a
deferred transaction, the BEGIN statement itself does nothing to the
filesystem.  ^Locks







|







331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
transactions.  ^An attempt to invoke the BEGIN command within
a transaction will fail with an error, regardless of whether
the transaction was started by [SAVEPOINT] or a prior BEGIN.
^The COMMIT command and the ROLLBACK command without the TO clause
work the same on [SAVEPOINT] transactions as they do with transactions
started by BEGIN.</p>

<tcl>hd_fragment immediate {BEGIN IMMEDIATE} {BEGIN EXCLUSIVE}</tcl>
<p>
^Transactions can be deferred, immediate, or exclusive.  
^The default transaction behavior is deferred.
^Deferred means that no locks are acquired
on the database until the database is first accessed.  ^Thus with a
deferred transaction, the BEGIN statement itself does nothing to the
filesystem.  ^Locks
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
funcdef {printf(FORMAT,...)} {} {
  ^(The printf(FORMAT,...) SQL function works like the [sqlite3_mprintf()] C-language
  function and the printf() function from the standard C library.)^
  The first argument is a format string that specifies how to construct the output
  string using values taken from subsequent arguments.  ^If the FORMAT argument is
  missing or NULL then the result is NULL.  ^The %n format is silently ignored and
  does not consume an argument.  ^The %p format is an alias for %X.  ^The %z format
  is interchangable with %s.  ^(If there are too few arguments in the argument list,
  missing arguments are assumed to have a NULL value, which is translated into
  0 or 0.0 for numeric formats or an empty string for %s.)^
}
  

funcdef {quote(X)} {} {
  ^The quote(X) function returns the text of an SQL literal which







|







2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
funcdef {printf(FORMAT,...)} {} {
  ^(The printf(FORMAT,...) SQL function works like the [sqlite3_mprintf()] C-language
  function and the printf() function from the standard C library.)^
  The first argument is a format string that specifies how to construct the output
  string using values taken from subsequent arguments.  ^If the FORMAT argument is
  missing or NULL then the result is NULL.  ^The %n format is silently ignored and
  does not consume an argument.  ^The %p format is an alias for %X.  ^The %z format
  is interchangeable with %s.  ^(If there are too few arguments in the argument list,
  missing arguments are assumed to have a NULL value, which is translated into
  0 or 0.0 for numeric formats or an empty string for %s.)^
}
  

funcdef {quote(X)} {} {
  ^The quote(X) function returns the text of an SQL literal which
3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
RecursiveBubbleDiagram with-clause
</tcl>

<p>Common Table Expressions or CTEs are temporary views or tables that exist
only for the duration of a single SQL statement.  There are two kinds of
common table expressions: "ordinary" and "recursive". Ordinary 
common table expressions are helpful for making
queries easier for humans to read and understand by factoring
subqueries out of the main SQL statement.
Recursive common table expression
provide the ability to do a hiearchical or
recursive query of a tree or graph, a capability
that is not otherwise avaiable in the SQL language.

All common table expressions (ordinary and recursive) are 
created by prepending a WITH clause in front of a [SELECT], [INSERT], [DELETE],
or [UPDATE] statement.  A single WITH clause can specify one or more
common table expressions.

<tcl>hd_fragment ordinarycte {ordinary common table expressions}</tcl>







|


|
|
|







3035
3036
3037
3038
3039
3040
3041
3042
3043
3044
3045
3046
3047
3048
3049
3050
3051
3052
3053
3054
RecursiveBubbleDiagram with-clause
</tcl>

<p>Common Table Expressions or CTEs are temporary views or tables that exist
only for the duration of a single SQL statement.  There are two kinds of
common table expressions: "ordinary" and "recursive". Ordinary 
common table expressions are helpful for making
queries easier to understand by factoring
subqueries out of the main SQL statement.
Recursive common table expression
provide the ability to do a hierarchical or
recursive queries of trees and graphs, a capability
that is not otherwise available in the SQL language.

All common table expressions (ordinary and recursive) are 
created by prepending a WITH clause in front of a [SELECT], [INSERT], [DELETE],
or [UPDATE] statement.  A single WITH clause can specify one or more
common table expressions.

<tcl>hd_fragment ordinarycte {ordinary common table expressions}</tcl>
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100

<p>Call the cte-table-name for a recursive common table expression
the "recursive table".  In the bubble diagram above, the recursive
table must appear exactly once in the FROM clause of the recursive-select
and must not appear anywhere else in either the initial-select or the
recursive-select, including subqueries.  The initial-select may be
a compound-select, but it may not include an ORDER BY, LIMIT, or OFFSET.
The recursive-select must be a simple select (not a compound).  The
recursive-select is allowed to include an ORDER BY, LIMIT, and/or OFFSET.

<p>The basic algorithm for computing the content of the recursive table
is as follows:

<ol>
<li> Run the initial-select and add the results to a queue.







|







3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100

<p>Call the cte-table-name for a recursive common table expression
the "recursive table".  In the bubble diagram above, the recursive
table must appear exactly once in the FROM clause of the recursive-select
and must not appear anywhere else in either the initial-select or the
recursive-select, including subqueries.  The initial-select may be
a compound-select, but it may not include an ORDER BY, LIMIT, or OFFSET.
The recursive-select must be a simple select, not a compound.  The
recursive-select is allowed to include an ORDER BY, LIMIT, and/or OFFSET.

<p>The basic algorithm for computing the content of the recursive table
is as follows:

<ol>
<li> Run the initial-select and add the results to a queue.
3127
3128
3129
3130
3131
3132
3133
3134

3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
  with LIMIT, a limit of zero means that no rows are ever added to the
  recursive table, and a negative limit means an unlimited number of rows
  may be added to the recursive table.)
<li><p>
  The OFFSET clause, if it is present and has a positive value N, prevents the
  first N rows from being added to the recursive table.
  The first N rows are still processed by the recursive-select; they
  just are not added to the recursive table.

<li><p>
  If an ORDER BY clause is present, it determines the order in which rows
  are extracted from the queue in step 2a.  If there is no ORDER BY clause,
  then the order in which rows are extracted is undefined.  (In the current
  implementation, the queue becomes a FIFO if the ORDER BY clause is omitted,
  but applications should not depend on that fact since it might change.)
</ul>

<tcl>hd_fragment rcex1</tcl>
<h4>Recursive Query Examples</h4>

<p>The following query returns all integers between 1 and 1000000:

<blockquote><pre>
WITH RECURSIVE
  cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000)
SELECT x FROM cnt;
</pre></blockquote>

<p>Consider how this query works.  The initial-select, which is this
case is just "VALUES(1") runs first and returns a single row
with a single column "1".  This one row is added to the queue.  In
step 2a, that one row is extracted from the queue and added to "cnt".
Then the recursive query is run in accordance with step 2c generating
a single new row with value "2" to add to the queue.  The queue still
has one row, so step 2 repeats.  The 2 is extracted and added to the
recursive table by steps 2a and 2b.  Then the row containing 2 is used 
as if where the complete content of the recursive table and the 







|
>




















|







3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
3147
3148
3149
3150
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
  with LIMIT, a limit of zero means that no rows are ever added to the
  recursive table, and a negative limit means an unlimited number of rows
  may be added to the recursive table.)
<li><p>
  The OFFSET clause, if it is present and has a positive value N, prevents the
  first N rows from being added to the recursive table.
  The first N rows are still processed by the recursive-select; they
  just are not added to the recursive table.  Rows are not counted toward
  fulfilling the LIMIT until all OFFSET rows have been skipped.
<li><p>
  If an ORDER BY clause is present, it determines the order in which rows
  are extracted from the queue in step 2a.  If there is no ORDER BY clause,
  then the order in which rows are extracted is undefined.  (In the current
  implementation, the queue becomes a FIFO if the ORDER BY clause is omitted,
  but applications should not depend on that fact since it might change.)
</ul>

<tcl>hd_fragment rcex1</tcl>
<h4>Recursive Query Examples</h4>

<p>The following query returns all integers between 1 and 1000000:

<blockquote><pre>
WITH RECURSIVE
  cnt(x) AS (VALUES(1) UNION ALL SELECT x+1 FROM cnt WHERE x<1000000)
SELECT x FROM cnt;
</pre></blockquote>

<p>Consider how this query works.  The initial-select, which is this
case is just "VALUES(1)", runs first and returns a single row
with a single column "1".  This one row is added to the queue.  In
step 2a, that one row is extracted from the queue and added to "cnt".
Then the recursive query is run in accordance with step 2c generating
a single new row with value "2" to add to the queue.  The queue still
has one row, so step 2 repeats.  The 2 is extracted and added to the
recursive table by steps 2a and 2b.  Then the row containing 2 is used 
as if where the complete content of the recursive table and the 
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
recursion is stopped by a LIMIT rather than a WHERE clause.  The use of
LIMIT means that when the one-millionth row is added to the "cnt" table
(and immediately returned by the SELECT, thanks to the query optimizer)
then the recursion stops immediately regardless of how many rows might be
left in the queue.  On more complex queries, it can sometimes be
difficult to ensure that the WHERE clause will eventually cause the
queue to drain and the recursion to terminate.  But the LIMIT clause will
aways stop the recursions.  So it is good practice to always include a
LIMIT clause as a safety if an upper bound on the size of the recursion 
is known.

<tcl>hd_fragment rcex2</tcl>
<h4>A Hierarchical Query</h4>

<p>Consider a table that describes the members of some organization as
well as the chain-of-command:

<blockquote><pre>
CREATE TABLE org(
  name TEXT PRIMARY KEY,
  boss TEXT REFERENCES org,
  height INT,
  -- other content omitted
);
</pre></blockquote>

<p>Every member in the organization has a name, and most members have
a single boss.  The head of the organization has no boss and so the 
boss field would be NULL for that one row.  The rows of the table
form a tree.

<p>Here is a query that computes the average height over everyone
in Alice's organization:

<blockquote><pre>
WITH RECURSIVE
  works_for_alice(n) AS (
    VALUES('Alice')
    UNION
    SELECT name FROM org, works_for_alice
     WHERE org.boss=works_for_alice.n
  )
SELECT avg(height) FROM org
 WHERE org.name IN works_for_alice;
</pre></blockquote>

<p>Next we considerer a more complicated example that uses two 
common table expressions in a single WITH clause.  
Consider a table that records a family tree:

<blockquote><pre>
CREATE TABLE family(
  name TEXT PRIMARY KEY,
  mom TEXT REFERENCES family,
  dad TEXT REFERENCES family,
  born DATETIME,







|




|

|
|












|



|













|

|







3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
3216
3217
3218
3219
3220
3221
3222
3223
3224
3225
3226
3227
3228
3229
3230
3231
3232
3233
3234
3235
3236
3237
3238
3239
3240
3241
3242
3243
3244
3245
3246
3247
3248
3249
3250
3251
3252
3253
3254
3255
3256
3257
3258
recursion is stopped by a LIMIT rather than a WHERE clause.  The use of
LIMIT means that when the one-millionth row is added to the "cnt" table
(and immediately returned by the SELECT, thanks to the query optimizer)
then the recursion stops immediately regardless of how many rows might be
left in the queue.  On more complex queries, it can sometimes be
difficult to ensure that the WHERE clause will eventually cause the
queue to drain and the recursion to terminate.  But the LIMIT clause will
always stop the recursion.  So it is good practice to always include a
LIMIT clause as a safety if an upper bound on the size of the recursion 
is known.

<tcl>hd_fragment rcex2</tcl>
<h4>Hierarchical Query Examples</h4>

<p>Consider a table that describes the members of an organization as
well as the chain-of-command within that organization.

<blockquote><pre>
CREATE TABLE org(
  name TEXT PRIMARY KEY,
  boss TEXT REFERENCES org,
  height INT,
  -- other content omitted
);
</pre></blockquote>

<p>Every member in the organization has a name, and most members have
a single boss.  The head of the organization has no boss and so the 
boss field would be NULL for that one row.  The rows of the "org" table
form a tree.

<p>Here is a query that computes the average height over everyone
in Alice's organization, including Alice:

<blockquote><pre>
WITH RECURSIVE
  works_for_alice(n) AS (
    VALUES('Alice')
    UNION
    SELECT name FROM org, works_for_alice
     WHERE org.boss=works_for_alice.n
  )
SELECT avg(height) FROM org
 WHERE org.name IN works_for_alice;
</pre></blockquote>

<p>Next we consider an example that uses two 
common table expressions in a single WITH clause.  
The following table records a family tree:

<blockquote><pre>
CREATE TABLE family(
  name TEXT PRIMARY KEY,
  mom TEXT REFERENCES family,
  dad TEXT REFERENCES family,
  born DATETIME,
3331
3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
     ORDER BY checkin.mtime DESC
     LIMIT 20
  )
SELECT * FROM checkin JOIN ancestor ON id=xfrom;
</pre></blockquote>

<p>
The "ORDERBY checkin.mtime DESC" term in the recursive query makes
the query run much faster by preventing it from following
branches that merge checkins
from long ago.  The ORDER BY forces the recursive query to focus
on the most recent checkins, the ones we want.  Without the ORDER BY
on the recursive-select, we would be forced to find all ancestors
of the desired check-in, sort them all by mtime, then take the top
twenty.  The ORDER BY essentially sets up a priorty queue that
forces the recursive query to look at the most recent ancestors first,
allowing us to insert a LIMIT close and restrict the scope of the
query to just the checkins of interest.

<tcl>hd_fragment rcex3</tcl>
<h4>Outlandish Recursive Query Examples</h4>








|


|



|







3332
3333
3334
3335
3336
3337
3338
3339
3340
3341
3342
3343
3344
3345
3346
3347
3348
3349
3350
3351
3352
3353
     ORDER BY checkin.mtime DESC
     LIMIT 20
  )
SELECT * FROM checkin JOIN ancestor ON id=xfrom;
</pre></blockquote>

<p>
The "ORDERBY checkin.mtime DESC" term in the recursive-select makes
the query run much faster by preventing it from following
branches that merge checkins
from long ago.  The ORDER BY forces the recursive-select to focus
on the most recent checkins, the ones we want.  Without the ORDER BY
on the recursive-select, we would be forced to find all ancestors
of the desired check-in, sort them all by mtime, then take the top
twenty.  The ORDER BY essentially sets up a priority queue that
forces the recursive query to look at the most recent ancestors first,
allowing us to insert a LIMIT close and restrict the scope of the
query to just the checkins of interest.

<tcl>hd_fragment rcex3</tcl>
<h4>Outlandish Recursive Query Examples</h4>

3369
3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
  a(t) AS (
    SELECT group_concat( substr(' .+*#', 1+min(iter/7,4), 1), '') 
    FROM m2 GROUP BY cy
  )
SELECT group_concat(rtrim(t),x'0a') FROM a;
</pre></blockquote>

<p>In theis query, the "xaxis" and "yaxis" CTEs the grid of points for
which the Mandelbrot Set will be approximated.  The
"m(iter,cx,cy,x,y)" CTE means that after "iter" iterations, the Mandelbrot
iteration starting at cx,cy has reached point x,y.  The number of iterations
in this example is limited to 28 (which severely limits the resolution of
the computation, but this is just an example, after all).
The "m2(iter,cx,cy)" CTE holds the maximum number of iterations reached when
starting at point cx,cy.
Finally, the entry in the "a(t)" CTE holds a string 







|
|







3370
3371
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
  a(t) AS (
    SELECT group_concat( substr(' .+*#', 1+min(iter/7,4), 1), '') 
    FROM m2 GROUP BY cy
  )
SELECT group_concat(rtrim(t),x'0a') FROM a;
</pre></blockquote>

<p>In this query, the "xaxis" and "yaxis" CTEs define the grid of points for
which the Mandelbrot Set will be approximated.  Each row in the
"m(iter,cx,cy,x,y)" CTE means that after "iter" iterations, the Mandelbrot
iteration starting at cx,cy has reached point x,y.  The number of iterations
in this example is limited to 28 (which severely limits the resolution of
the computation, but this is just an example, after all).
The "m2(iter,cx,cy)" CTE holds the maximum number of iterations reached when
starting at point cx,cy.
Finally, the entry in the "a(t)" CTE holds a string 
3414
3415
3416
3417
3418
3419
3420

3421
3422
3423
3424
3425
3426
3427
3428
                                    +.
</pre></blockquote>

<tcl>hd_fragment mandelbrot</tcl>
<p>This next query solves a Sudoku puzzle.  The state of the puzzle is
defined by an 81-character string formed by reading entries from the
puzzle box row by row from left to right and then from top to bottom.

Block spots are denoted by a "." character.  Thus the input string

<blockquote>
53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79
</blockquote>

<p>Corresponds to a puzzle like this:








>
|







3415
3416
3417
3418
3419
3420
3421
3422
3423
3424
3425
3426
3427
3428
3429
3430
                                    +.
</pre></blockquote>

<tcl>hd_fragment mandelbrot</tcl>
<p>This next query solves a Sudoku puzzle.  The state of the puzzle is
defined by an 81-character string formed by reading entries from the
puzzle box row by row from left to right and then from top to bottom.
Blank squares in the puzzle are denoted by a "." character.  
Thus the input string

<blockquote>
53..7....6..195....98....6.8...6...34..8.3..17...2...6.6....28....419..5....8..79
</blockquote>

<p>Corresponds to a puzzle like this:

3487
3488
3489
3490
3491
3492
3493
3494







3495
3496
3497
3498
3499
3500
3501
position.  The complicated "NOT EXISTS" subquery is the magic that
figures out whether or not each candidate "s" string is a valid
sudoku puzzle or not.

<p>The final answer is found by looking for a string with ind==0.
If the original sudoku problem did not have a unique solution, then
the query will return all possible solutions.  If the original problem
was unsolvable, then no rows will be returned.








<h3>Limitations And Caveats</h3>

<ul>
<li><p>
The WITH clause cannot be used within a [CREATE TRIGGER].
<li><p>







|
>
>
>
>
>
>
>







3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
position.  The complicated "NOT EXISTS" subquery is the magic that
figures out whether or not each candidate "s" string is a valid
sudoku puzzle or not.

<p>The final answer is found by looking for a string with ind==0.
If the original sudoku problem did not have a unique solution, then
the query will return all possible solutions.  If the original problem
was unsolvable, then no rows will be returned.  In this case, the
answer of:

<blockquote>
534678912672195348198342567859761423426853791713924856961537284287419635345286179
</blockquote>

<p>Is computed in about 300 milliseconds.

<h3>Limitations And Caveats</h3>

<ul>
<li><p>
The WITH clause cannot be used within a [CREATE TRIGGER].
<li><p>
3525
3526
3527
3528
3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
the way the data returned by a SELECT statement is determined as a series of
steps. It is important to keep in mind that this is purely illustrative -
in practice neither SQLite nor any other SQL engine is required to follow 
this or any other specific process.

<h3>Simple Select Processing</h3>

<p>The syntax for a simple SELECT statement is depicted in the 
[simple-query-stmt syntax diagram]. Generating the results of a simple SELECT
statement is presented as a four step process in the description below:

<ol>
  <li> <p>[FROM clause] processing: The input data for the simple SELECT is
       determined. The input data is either implicitly a single row with 0
       columns (if there is no FROM clause) or is determined by analyzing the
       [join-source syntax diagram|join-source] specification that follows 
       an explicit FROM clause.
  <li> <p>[WHERE clause] processing: The input data is filtered using the WHERE
       clause expression.  
  <li> <p>[GROUP BY|GROUP BY, HAVING and result-column expression] processing: 
       The set of result rows is computed by aggregating the data according to
       any GROUP BY clause and calculating the result-set expressions for the
       rows of the filtered input dataset.  
  <li> <p>[DISTINCT|DISTINCT/ALL keyword] processing: If the query is a "SELECT







<
|





|
<
|







3534
3535
3536
3537
3538
3539
3540

3541
3542
3543
3544
3545
3546
3547

3548
3549
3550
3551
3552
3553
3554
3555
the way the data returned by a SELECT statement is determined as a series of
steps. It is important to keep in mind that this is purely illustrative -
in practice neither SQLite nor any other SQL engine is required to follow 
this or any other specific process.

<h3>Simple Select Processing</h3>


<p>Generating the results of a simple SELECT
statement is presented as a four step process in the description below:

<ol>
  <li> <p>[FROM clause] processing: The input data for the simple SELECT is
       determined. The input data is either implicitly a single row with 0
       columns (if there is no FROM clause) or is determined by the FROM

       clause.
  <li> <p>[WHERE clause] processing: The input data is filtered using the WHERE
       clause expression.  
  <li> <p>[GROUP BY|GROUP BY, HAVING and result-column expression] processing: 
       The set of result rows is computed by aggregating the data according to
       any GROUP BY clause and calculating the result-set expressions for the
       rows of the filtered input dataset.  
  <li> <p>[DISTINCT|DISTINCT/ALL keyword] processing: If the query is a "SELECT
3564
3565
3566
3567
3568
3569
3570
3571

3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634

3635
3636
3637
3638
3639
3640
3641

<p>^(If the FROM clause is omitted from a simple SELECT statement, then the 
input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and
<i>M</i>=0).

<p>If a FROM clause is specified, the data on which a simple SELECT query
operates comes from the one or more tables or subqueries (SELECT statements
in parenthesis) specified following the FROM keyword. ^A sub-select specified

in the join-source following the FROM clause in a simple SELECT statement is
handled as if it was a table containing the data returned by executing the
sub-select statement. ^Each column of the sub-select dataset inherits the
[collation|collation sequence] and [affinity] of the corresponding expression
in the sub-select statement.

<p>^If there is only a single table in the join-source following the FROM
clause, then the input data used by the SELECT statement is the contents of the
named table. ^If there is more than one table specified as part of the
join-source following the FROM keyword, then the contents of each named table
are joined into a single dataset for the simple SELECT statement to operate on.
Exactly how the data is combined depends on the specific [join-op] and
[join-constraint] used to connect the tables or subqueries together.

<p>All joins in SQLite are based on the cartesian product of the left and
right-hand datasets. ^The columns of the cartesian product dataset are, in 
order, all the columns of the left-hand dataset followed by all the columns
of the right-hand dataset. ^There is a row in the cartesian product dataset
formed by combining each unique combination of a row from the left-hand 
and right-hand datasets. ^(In other words, if the left-hand dataset consists of
<i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of
<i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a
dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^

<p>^If the join-op is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
If join-op does have ON or USING clauses, those are handled according to
the following bullet points:

<ul>
  <li> <p>^(If there is an ON clause specified, then the ON expression is
       evaluated for each row of the cartesian product as a 
       [boolean expression]. All rows for which the expression evaluates to 
       false are excluded from the dataset.)^

  <li> <p>^If there is a USING clause specified as part of the join-constraint,
       then each of the column names specified must exist in the datasets to 
       both the left and right of the join-op. ^(For each pair of namesake
       columns, the expression "lhs.X = rhs.X" is evaluated for each row of
       the cartesian product as a [boolean expression]. All rows for which one
       or more of the expressions evaluates to false are excluded from the
       result set.)^ ^When comparing values as a result of a USING clause, the
       normal rules for handling affinities, collation sequences and NULL
       values in comparisons apply. ^The column from the dataset on the
       left-hand side of the join operator is considered to be on the left-hand
       side of the comparison operator (=) for the purposes of collation 
       sequence and affinity precedence.

       <p>^For each pair of columns identified by a USING clause, the column
       from the right-hand dataset is omitted from the joined dataset. ^This 
       is the only difference between a USING clause and its equivalent ON
       constraint.

  <li> <p>^(If the NATURAL keyword is added to any of the join-ops, then an
       implicit USING clause is added to the join-constraints. The implicit
       USING clause contains each of the column names that appear in both
       the left and right-hand input datasets.)^ ^If the left and right-hand
       input datasets feature no common column names, then the NATURAL keyword
       has no effect on the results of the join. ^A USING or ON clause may
       not be added to a join that specifies the NATURAL keyword.

  <li> <p>^(If the join-op is a "LEFT JOIN" or "LEFT OUTER JOIN", then after

       the ON or USING filtering clauses have been applied, an extra row is 
       added to the output for each row in the original left-hand input 
       dataset that corresponds to no rows at all in the composite
       dataset (if any).)^ ^The added rows contain NULL values in the columns
       that would normally contain values copied from the right-hand input
       dataset.  
</ul>







|
>
|

|

|

|

|
|

|












|


|



|

|
|

|

|

|
|



|








|







|
>







3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
3596
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650

<p>^(If the FROM clause is omitted from a simple SELECT statement, then the 
input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and
<i>M</i>=0).

<p>If a FROM clause is specified, the data on which a simple SELECT query
operates comes from the one or more tables or subqueries (SELECT statements
in parenthesis) specified following the FROM keyword. ^A subquery specified
in the table-or-subquery following the FROM clause in a 
simple SELECT statement is
handled as if it was a table containing the data returned by executing the
subquery statement. ^Each column of the subquery has the
[collation|collation sequence] and [affinity] of the corresponding expression
in the subquery statement.

<p>^If there is only a single table or subquery in the FROM
clause, then the input data used by the SELECT statement is the contents of the
named table. ^If there is more than one table or subquery in FROM clause
then the contents of all tables and/or subqueries
are joined into a single dataset for the simple SELECT statement to operate on.
Exactly how the data is combined depends on the specific [join-operator] and
[join-constraint] used to connect the tables or subqueries together.

<p>All joins in SQLite are based on the cartesian product of the left and
right-hand datasets. ^The columns of the cartesian product dataset are, in 
order, all the columns of the left-hand dataset followed by all the columns
of the right-hand dataset. ^There is a row in the cartesian product dataset
formed by combining each unique combination of a row from the left-hand 
and right-hand datasets. ^(In other words, if the left-hand dataset consists of
<i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of
<i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a
dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^

<p>^If the join-operator is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
If join-operator does have ON or USING clauses, those are handled according to
the following bullet points:

<ul>
  <li> <p>^(If there is an ON clause then the ON expression is
       evaluated for each row of the cartesian product as a 
       [boolean expression]. Only rows for which the expression evaluates to 
       true are included from the dataset.)^

  <li> <p>^If there is a USING clause
       then each of the column names specified must exist in the datasets to 
       both the left and right of the join-operator. ^(For each pair of named
       columns, the expression "lhs.X = rhs.X" is evaluated for each row of
       the cartesian product as a [boolean expression]. Only rows for which
       all such expressions evaluates to true are included from the
       result set.)^ ^When comparing values as a result of a USING clause, the
       normal rules for handling affinities, collation sequences and NULL
       values in comparisons apply. ^The column from the dataset on the
       left-hand side of the join-operator is considered to be on the left-hand
       side of the comparison operator (=) for the purposes of collation 
       sequence and affinity precedence.

       <p>^For each pair of columns identified by a USING clause, the column
       from the right-hand dataset is omitted from the joined dataset. ^This 
       is the only difference between a USING clause and its equivalent ON
       constraint.

  <li> <p>^(If the NATURAL keyword is in the join-operator then an
       implicit USING clause is added to the join-constraints. The implicit
       USING clause contains each of the column names that appear in both
       the left and right-hand input datasets.)^ ^If the left and right-hand
       input datasets feature no common column names, then the NATURAL keyword
       has no effect on the results of the join. ^A USING or ON clause may
       not be added to a join that specifies the NATURAL keyword.

  <li> <p>^(If the join-operator is a "LEFT JOIN" or "LEFT OUTER JOIN", then
       after
       the ON or USING filtering clauses have been applied, an extra row is 
       added to the output for each row in the original left-hand input 
       dataset that corresponds to no rows at all in the composite
       dataset (if any).)^ ^The added rows contain NULL values in the columns
       that would normally contain values copied from the right-hand input
       dataset.  
</ul>
3897
3898
3899
3900
3901
3902
3903


















3904
3905
3906
3907
3908
3909
3910
<p>^Instead of a separate OFFSET clause, the LIMIT clause may specify two
scalar expressions separated by a comma. ^In this case, the first expression
is used as the OFFSET expression and the second as the LIMIT expression.
This is counter-intuitive, as when using the OFFSET clause the second of
the two expressions is the OFFSET and the first the LIMIT. This is intentional
- it maximizes compatibility with other SQL database systems.



















<tcl>
##############################################################################
Section UPDATE update {UPDATE *UPDATEs}

RecursiveBubbleDiagram update-stmt
</tcl>








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







3906
3907
3908
3909
3910
3911
3912
3913
3914
3915
3916
3917
3918
3919
3920
3921
3922
3923
3924
3925
3926
3927
3928
3929
3930
3931
3932
3933
3934
3935
3936
3937
<p>^Instead of a separate OFFSET clause, the LIMIT clause may specify two
scalar expressions separated by a comma. ^In this case, the first expression
is used as the OFFSET expression and the second as the LIMIT expression.
This is counter-intuitive, as when using the OFFSET clause the second of
the two expressions is the OFFSET and the first the LIMIT. This is intentional
- it maximizes compatibility with other SQL database systems.

<tcl>hd_fragment values {VALUES clause}</tcl>
<h3>VALUES clauses</h3>

<p>The phrase "VALUES(<i>expr-list</i>)" means the same thing
as "SELECT <i>expr-list</i>".  The phrase
"VALUES(<i>expr-list-1</i>),...,(<i>expr-list-N</i>)" means the same
thing as "SELECT <i>expr-list-1</i> UNION ALL ... UNION ALL
SELECT <i>expr-list-N</i>".  There is no advantage to using one form
over the other.  Both forms yield the same result and both forms use
the same amount of memory and processing time.


<h3>The WITH Clause</h3>

<p>SELECT statements may be optional preceded by a single
[WITH clause] that defines one or more [common table expressions]
for using within the SELECT statement.

<tcl>
##############################################################################
Section UPDATE update {UPDATE *UPDATEs}

RecursiveBubbleDiagram update-stmt
</tcl>

4061
4062
4063
4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
particular named index on a [DELETE], [SELECT], or [UPDATE] statement.
The INDEXED BY phrase is an extension that is particular to SQLite and
is not portable to other SQL database engines.
The INDEXED BY phrase can be seen in the following syntax
diagrams:</p>

<tcl>
BubbleDiagram qualified-table-name
BubbleDiagram simple-join-source
</tcl>

<p>^The "INDEXED BY index-name" phrase specifies that the named index
must be used in order to look up values on the preceding table.
^If index-name does not exist or cannot be used for the query, then
the preparation of the SQL statement fails.
^(The "NOT INDEXED" clause specifies that no index shall be used when







|
<







4088
4089
4090
4091
4092
4093
4094
4095

4096
4097
4098
4099
4100
4101
4102
particular named index on a [DELETE], [SELECT], or [UPDATE] statement.
The INDEXED BY phrase is an extension that is particular to SQLite and
is not portable to other SQL database engines.
The INDEXED BY phrase can be seen in the following syntax
diagrams:</p>

<tcl>
RecursiveBubbleDiagram qualified-table-name

</tcl>

<p>^The "INDEXED BY index-name" phrase specifies that the named index
must be used in order to look up values on the preceding table.
^If index-name does not exist or cannot be used for the query, then
the preparation of the SQL statement fails.
^(The "NOT INDEXED" clause specifies that no index shall be used when