Documentation Source Text

Check-in [979599b0b6]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Further enhancements to the SELECT documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 979599b0b68fa9343a5421f5e31c7deff07b784f
User & Date: drh 2014-01-30 15:03:22.485
Context
2014-01-30
15:32
Additional clarification on syntax restrictions in SELECT statements. (check-in: 4c5d78209a user: drh tags: trunk)
15:03
Further enhancements to the SELECT documentation. (check-in: 979599b0b6 user: drh tags: trunk)
14:11
Add the extended-simple-select-stmt syntax diagram. Update the documentation on SELECT statements to better explain limitations on the use of ORDER BY and LIMIT in compound SELECTs. (check-in: c14dc7c23b user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/lang.in.
3609
3610
3611
3612
3613
3614
3615
3616














3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642

RecursiveBubbleDiagram select-stmt
</tcl>


<p>The SELECT statement is used to query the database.  The
result of a SELECT is zero or more rows of data where each row
has a fixed number of columns.  















<p>The SELECT statement is the most complicated command in the SQL language.
To make the description easier to follow, some of the passages below describe
the way the data returned by a SELECT statement is determined as a series of
steps. It is important to keep in mind that this is purely illustrative -
in practice neither SQLite nor any other SQL engine is required to follow 
this or any other specific process.

<p>The "[select-stmt]" syntax diagram above attempts to show as much of the
SELECT statement syntax as possible in a single diagram, as some readers
find this helpful.  Alternative syntax diagrams that break the syntax down
into smaller chunks are shown elsewhere in this document.

<p>Note that there are paths through the "select-stmt" syntax diagram that
are not allowed in practice.  For example, an [ORDER BY] or [LIMIT] clause
can only occur on the right-most SELECT of a [compound select], even though
the diagram shows that they can occur anywhere.  Restrictions such as this
are described in the subsequent text.

<tcl>hd_fragment simpleselect {simple SELECT}</tcl>
<h3>Simple Select Processing</h3>

<p>The core of a SELECT statement is a "simple SELECT" shown by the
[simple-select-stmt] syntax diagram below.  
In practice, most SELECT statements are simple SELECT statements, 
though with the possible addition of an [ORDER BY] clause,







|
>
>
>
>
>
>
>
>
>
>
>
>
>
>








<
<
<
<
<
<
<
<
<
<
<







3609
3610
3611
3612
3613
3614
3615
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638











3639
3640
3641
3642
3643
3644
3645

RecursiveBubbleDiagram select-stmt
</tcl>


<p>The SELECT statement is used to query the database.  The
result of a SELECT is zero or more rows of data where each row
has a fixed number of columns.  A SELECT statement does not make
any changes to the database.

<p>The "[select-stmt]" syntax diagram above attempts to show as much of the
SELECT statement syntax as possible in a single diagram, as some readers
find that helpful.  Alternative syntax diagrams that break the syntax down
into smaller chunks are shown elsewhere in this document.

<p>Note that there are paths through the syntax diagrams for SELECT that
are not allowed in practice.  For example, an [ORDER BY] or [LIMIT] clause
can only occur on the right-most SELECT of a [compound select], even though
the [select-stmt] syntax diagram shows that they can occur anywhere. 
And, a [VALUES] clause may not immediately follow a [WITH] clause in
a [simple SELECT] even though the [extended-simple-select-stmt] diagram
shows that it can.  Restrictions such as these are described in the text.

<p>The SELECT statement is the most complicated command in the SQL language.
To make the description easier to follow, some of the passages below describe
the way the data returned by a SELECT statement is determined as a series of
steps. It is important to keep in mind that this is purely illustrative -
in practice neither SQLite nor any other SQL engine is required to follow 
this or any other specific process.












<tcl>hd_fragment simpleselect {simple SELECT}</tcl>
<h3>Simple Select Processing</h3>

<p>The core of a SELECT statement is a "simple SELECT" shown by the
[simple-select-stmt] syntax diagram below.  
In practice, most SELECT statements are simple SELECT statements, 
though with the possible addition of an [ORDER BY] clause,
3707
3708
3709
3710
3711
3712
3713

3714

3715


3716

3717
3718
3719
3720
3721
3722
3723

<p>All joins in SQLite are based on the cartesian product of the left and
right-hand datasets. ^The columns of the cartesian product dataset are, in 
order, all the columns of the left-hand dataset followed by all the columns
of the right-hand dataset. ^There is a row in the cartesian product dataset
formed by combining each unique combination of a row from the left-hand 
and right-hand datasets. ^(In other words, if the left-hand dataset consists of

<i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of

<i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a


dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^


<p>^If the join-operator is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
If join-operator does have ON or USING clauses, those are handled according to
the following bullet points:








>
|
>
|
>
>
|
>







3710
3711
3712
3713
3714
3715
3716
3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728
3729
3730
3731

<p>All joins in SQLite are based on the cartesian product of the left and
right-hand datasets. ^The columns of the cartesian product dataset are, in 
order, all the columns of the left-hand dataset followed by all the columns
of the right-hand dataset. ^There is a row in the cartesian product dataset
formed by combining each unique combination of a row from the left-hand 
and right-hand datasets. ^(In other words, if the left-hand dataset consists of
<i>N<sub><small>left</small></sub></i> rows of 
<i>M<sub><small>left</small></sub></i> columns, and the right-hand dataset of
<i>N<sub><small>right</small></sub></i> rows of
<i>M<sub><small>right</small></sub></i> columns, then the cartesian product is a
dataset of 
<i>N<sub><small>left</small></sub>&times;N<sub><small>right</small></sub></i>
rows, each containing 
<i>M<sub><small>left</small></sub>+M<sub><small>right</small></sub></i> columns.)^

<p>^If the join-operator is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
If join-operator does have ON or USING clauses, those are handled according to
the following bullet points:

3787
3788
3789
3790
3791
3792
3793
3794
3795
3796












3797
3798
3799
3800
3801
3802
3803
       

<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>
<p><b>2. WHERE clause filtering.</b>

<p>^(If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the input data as a [boolean expression]. All rows for which the
WHERE clause expression evaluates to false are excluded from the dataset before
continuing.)^













<p><b>3. Generation of the set of result rows.</b>
<tcl>hd_fragment resultset</tcl>
<tcl>hd_keywords {result-set expressions} {GROUP BY}</tcl>

<p>Once the input data from the FROM clause has been filtered by the
WHERE clause expression (if any), the set of result rows for the simple 







|
|
|
>
>
>
>
>
>
>
>
>
>
>
>







3795
3796
3797
3798
3799
3800
3801
3802
3803
3804
3805
3806
3807
3808
3809
3810
3811
3812
3813
3814
3815
3816
3817
3818
3819
3820
3821
3822
3823
       

<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>
<p><b>2. WHERE clause filtering.</b>

<p>^(If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the input data as a [boolean expression]. Only rows for which the
WHERE clause expression evaluates to true are included from the dataset before
continuing.)^  Rows are excluded from the result if the WHERE clause
evaluates to either false or NULL.

<p>For a JOIN or INNER JOIN or CROSS JOIN, there is no difference between 
a constraint expression in the WHERE clause and one in the ON clause.  However,
for a LEFT JOIN or LEFT OUTER JOIN, the difference is very important.  
In a LEFT JOIN,
the extra NULL row for the right-hand table is added after ON clause processing
but before WHERE clause processing.  A constraint of the form "left.x=right.y"
in an ON clause will therefore allow through the added all-NULL rows of the
right table.  But if that same constraint is in the WHERE clause a NULL in
"right.y" will prevent the expression "left.x=right.y" from being true, and
thus exclude that row from the output.

<p><b>3. Generation of the set of result rows.</b>
<tcl>hd_fragment resultset</tcl>
<tcl>hd_keywords {result-set expressions} {GROUP BY}</tcl>

<p>Once the input data from the FROM clause has been filtered by the
WHERE clause expression (if any), the set of result rows for the simple 
4012
4013
4014
4015
4016
4017
4018
4019
4020
4021
4022
4023
4024
4025
4026
<tcl>hd_fragment limitoffset {LIMIT} {OFFSET}</tcl>
<h3>The LIMIT clause</h3>

<p>The LIMIT clause is used to place an upper bound on the number of rows
returned by the entire SELECT statement.

<p>In a [compound SELECT], only the
last (right-most) [simple SELECT] may contain a LIMIT clause.  
In a [compound SELECT], 
the LIMIT clause applies to the entire compound, not just the final SELECT.
If the right-most [simple SELECT] is a [VALUES clause] then no LIMIT clause
is allowed.

<p>^Any scalar expression may be used in the 
LIMIT clause, so long as it evaluates to an integer or a value that can be







|







4032
4033
4034
4035
4036
4037
4038
4039
4040
4041
4042
4043
4044
4045
4046
<tcl>hd_fragment limitoffset {LIMIT} {OFFSET}</tcl>
<h3>The LIMIT clause</h3>

<p>The LIMIT clause is used to place an upper bound on the number of rows
returned by the entire SELECT statement.

<p>In a [compound SELECT], only the
last or right-most [simple SELECT] may contain a LIMIT clause.  
In a [compound SELECT], 
the LIMIT clause applies to the entire compound, not just the final SELECT.
If the right-most [simple SELECT] is a [VALUES clause] then no LIMIT clause
is allowed.

<p>^Any scalar expression may be used in the 
LIMIT clause, so long as it evaluates to an integer or a value that can be
4044
4045
4046
4047
4048
4049
4050
4051

4052



4053
4054
4055
4056
4057
4058
4059
4060
4061
4062
4063












4064
4065
4066
4067
4068
4069
4070
OFFSET clause evaluates to a negative value, the results are the same as if it
had evaluated to zero.

<p>^Instead of a separate OFFSET clause, the LIMIT clause may specify two
scalar expressions separated by a comma. ^In this case, the first expression
is used as the OFFSET expression and the second as the LIMIT expression.
This is counter-intuitive, as when using the OFFSET clause the second of
the two expressions is the OFFSET and the first the LIMIT. This is intentional

- it maximizes compatibility with other SQL database systems.




<tcl>hd_fragment values {VALUES clause} VALUES</tcl>
<h3>The VALUES clause</h3>

<p>The phrase "VALUES(<i>expr-list</i>)" means the same thing
as "SELECT <i>expr-list</i>".  The phrase
"VALUES(<i>expr-list-1</i>),...,(<i>expr-list-N</i>)" means the same
thing as "SELECT <i>expr-list-1</i> UNION ALL ... UNION ALL
SELECT <i>expr-list-N</i>".  There is no advantage to using one form
over the other.  Both forms yield the same result and both forms use
the same amount of memory and processing time.














<h3>The WITH Clause</h3>

<p>SELECT statements may be optional preceded by a single
[WITH clause] that defines one or more [common table expressions]
for using within the SELECT statement.







|
>

>
>
>











>
>
>
>
>
>
>
>
>
>
>
>







4064
4065
4066
4067
4068
4069
4070
4071
4072
4073
4074
4075
4076
4077
4078
4079
4080
4081
4082
4083
4084
4085
4086
4087
4088
4089
4090
4091
4092
4093
4094
4095
4096
4097
4098
4099
4100
4101
4102
4103
4104
4105
4106
OFFSET clause evaluates to a negative value, the results are the same as if it
had evaluated to zero.

<p>^Instead of a separate OFFSET clause, the LIMIT clause may specify two
scalar expressions separated by a comma. ^In this case, the first expression
is used as the OFFSET expression and the second as the LIMIT expression.
This is counter-intuitive, as when using the OFFSET clause the second of
the two expressions is the OFFSET and the first the LIMIT. 
This reversal of the offset and limit is intentional
- it maximizes compatibility with other SQL database systems.
However, to avoid confusion, programmers are strongly encouraged to use
the form of the LIMIT clause that uses the "OFFSET" keyword and avoid
using a LIMIT clause with a comma-separated offset.

<tcl>hd_fragment values {VALUES clause} VALUES</tcl>
<h3>The VALUES clause</h3>

<p>The phrase "VALUES(<i>expr-list</i>)" means the same thing
as "SELECT <i>expr-list</i>".  The phrase
"VALUES(<i>expr-list-1</i>),...,(<i>expr-list-N</i>)" means the same
thing as "SELECT <i>expr-list-1</i> UNION ALL ... UNION ALL
SELECT <i>expr-list-N</i>".  There is no advantage to using one form
over the other.  Both forms yield the same result and both forms use
the same amount of memory and processing time.

<p>There are some restrictions on the use of a VALUES clause that are
not shown on the syntax diagrams:

<ul>
<li><p>
A VALUES clause cannot be followed by [ORDER BY] or [LIMIT].

<li><p>
A VALUES clause cannot be used together with a [WITH] clause in a
[simple SELECT].
</ul>


<h3>The WITH Clause</h3>

<p>SELECT statements may be optional preceded by a single
[WITH clause] that defines one or more [common table expressions]
for using within the SELECT statement.