Documentation Source Text

Check-in [49e2c0eae6]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Clarifications to lang_select.html
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 49e2c0eae66872028baa52df0f79d41b0a7b4206
User & Date: dan 2010-09-08 19:01:22.000
Context
2010-09-09
09:51
Fix a broken link in lang_select.html. (check-in: 53081e3979 user: dan tags: trunk)
2010-09-08
19:01
Clarifications to lang_select.html (check-in: 49e2c0eae6 user: dan tags: trunk)
2010-09-07
14:58
Added link to the Belorussion translation of the FAQ. Added links from parameter binding to SQLITE_LIMIT_VARIABLE_NUMBER. Added section to the "How SQLite Is Tested" document describing disabled optimization tests. (check-in: 71ae536af0 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/lang.in.
2608
2609
2610
2611
2612
2613
2614

2615


2616

2617

2618

2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630


2631







2632
2633
2634
2635




2636


2637
2638

2639
2640
2641
2642



















2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694








2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706


2707
2708
2709
2710
2711
2712
2713
2714


2715
2716
2717
2718
2719
2720
2721
<h3>Simple Select Processing</h3>

<p>The syntax for a simple SELECT statement is depicted in the 
[select-core syntax diagram]. Generating the results of a simple SELECT
statement is presented as a four step process in the description below:

<ol>

  <li> The input dataset is determined by joining data from the tables and


       subqueries specified as part of the FROM clause.

  <li> The input dataset is filtered using the WHERE clause expression.

  <li> The set of result rows is computed by aggregating and calculating

       the result-set expressions for the rows of the filtered input dataset.
  <li> If the query is a "SELECT DISTINCT" query, duplicate rows are removed
       from the set of result rows.
</ol>

<p>^There are two types of simple SELECT statement - aggregate and 
non-aggregate queries. ^A simple SELECT statement is an aggregate query if
it contains either a GROUP BY clause or one or more aggregate functions
in the result-set. ^Otherwise, if a simple SELECT contains no aggregate
functions or a GROUP BY clause, it is a non-aggregate query.

<p><b>1. Calculation of input data (FROM clause processing).</b>










<p>If a FROM clause is specified, the data on which a simple SELECT query
operates comes from the one or more tables or subqueries (SELECT statements
in parenthesis) specified following the FROM keyword. ^If there is only a
single table or subquery following the FROM clause, then the SELECT statement




operates on the contents of the table or the results of the subquery. ^If


there is more than one table or subquery following the FROM clause, then the
data from each table or subquery is combined into a single dataset for the

simple SELECT statement to operate on. Exactly how the data is combined
depends on the specific [join-op|join-ops] and 
[join-constraint|join-constraints] used to connect the tables or subqueries
together, as follows:




















<ul>
  <li> <p>^If the join-op is a comma (","), then the composite dataset is
       the cartesian product of the sets of records from the left and right
       sides of the join-op.

  <li> <p>^If the join-op is a "CROSS JOIN" or "INNER JOIN", then the composite
       dataset is created in the same way as for the comma join-op. There
       is no difference between the "INNER JOIN" and "," join operators. The
       "CROSS JOIN" join operator produces the same data as the "INNER JOIN"
       and "," operators, but is <a href=optoverview.html#manctrl>handled 
       slightly differently by the query optimizer</a>.

  <li> <p>^(If there is an ON clause specified, then the ON expression is
       evaluated for each row of the cartesian product and the result cast to 
       a numeric value as if by a [CAST expression]. All rows for which the
       expression evaluates to NULL or zero (integer value 0 or real value 0.0)
       are excluded from the composite dataset.)^

  <li> <p>^If there is a USING clause specified as part of the join-constraint,
       then each of the column names specified must exist in the datasets to 
       both the left and right of the join-op. ^(For each pair of namesake
       columns, the expression "lhs.X = rhs.X" is evaluated for each row of
       the cartesian product and the result cast to a numeric value. All rows
       for which one or more of the expressions evaluates to NULL or zero are
       excluded from the result set.)^ ^When comparing values as a result of a
       USING clause, the normal rules for handling affinities, collation
       sequences and NULL values in comparisons apply. ^The column from the
       dataset on the left-hand side of the join operator is considered to
       be on the left-hand side of the comparison operator (=) for the purposes
       of collation sequence and affinity precedence.

       <p>^For each pair of columns identified by a USING clause, the column
       from the right-hand dataset is omitted from the joined dataset. ^This 
       is the only difference between a USING clause and its equivalent ON
       constraint.

  <li> <p>^(If the join-op is a "LEFT JOIN" or "LEFT OUTER JOIN", then the
       composite dataset is created as for an "INNER JOIN". Except, after
       the ON or USING filtering clauses have been applied, an extra row is 
       added to the output for each row in the original left-hand input 
       dataset (if any) that corresponds to no rows at all in the composite
       dataset.)^ ^The added rows contain NULL values in the columns that would
       normally contain values copied from the right-hand input dataset.

  <li> <p>^(If the NATURAL keyword is added to any of the join-ops, then an
       implicit USING clause is added to the join-constraints. The implicit
       USING clause contains each of the column names that appear in both
       the left and right-hand input datasets.)^ ^If the left and right-hand
       input datasets feature no common column names, then the NATURAL keyword
       has no effect on the results of the join. ^A USING or ON clause may
       not be added to a join that specifies the NATURAL keyword.








</ul>

<p>^(When more than two tables are joined together as part of a FROM clause,
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is processed as 
((A join-op-1 B) join-op-2 C).)^
       
<p>^If the FROM clause is omitted from a SELECT statement, then the query
operates as if there were an implicit FROM clause that returns a dataset
consisting of a single row with zero columns.

<p><b>2. WHERE clause filtering.</b>



<p>^If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the dataset computed by the explicit or implicit FROM clause and
the result cast to a numeric value. ^All rows for which the WHERE clause
expression evaluates to a NULL value or to zero (integer value 0 or real value
0.0) are excluded from the dataset before continuing.

<p><b>3. Generation of the set of result rows.</b>



<p>Once the input dataset from the FROM clause has been filtered by the
WHERE clause expression (if any), the set of result rows for the simple 
SELECT are calculated. Exactly how this is done depends on whether the simple 
SELECT is an aggregate or non-aggregate query, and whether or not a GROUP
BY clause was specified.








>
|
>
>
|
>
|
>
|
>
|
|
|








|
>
>

>
>
>
>
>
>
>


|
|
>
>
>
>
|
>
>
|
<
>
|
|
|
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


<
<
<
<
<
<
<
<
<
<
<




|



















<
<
<
<
<
<
<
<







>
>
>
>
>
>
>
>




|


<
<
<


>
>








>
>







2608
2609
2610
2611
2612
2613
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641
2642
2643
2644
2645
2646
2647
2648
2649
2650
2651
2652
2653
2654
2655
2656
2657
2658

2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684











2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701
2702
2703
2704
2705
2706
2707
2708








2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
2721
2722
2723
2724
2725
2726
2727
2728
2729
2730



2731
2732
2733
2734
2735
2736
2737
2738
2739
2740
2741
2742
2743
2744
2745
2746
2747
2748
2749
2750
2751
<h3>Simple Select Processing</h3>

<p>The syntax for a simple SELECT statement is depicted in the 
[select-core syntax diagram]. Generating the results of a simple SELECT
statement is presented as a four step process in the description below:

<ol>
  <li> <p>[FROM clause] processing: The input data for the simple SELECT is
       determined. The input data is either implicitly a single row with 0
       columns (if there is no FROM clause) or is determined by analyzing the
       [join-source syntax diagram|join-source] specification that follows 
       an explicit FROM clause.
  <li> <p>[WHERE clause] processing: The input data is filtered using the WHERE
       clause expression.  
  <li> <p>[GROUP BY|GROUP BY, HAVING and result-column expression] processing: 
       The set of result rows is computed by aggregating the data according to
       any GROUP BY clause and calculating the result-set expressions for the
       rows of the filtered input dataset.  
  <li> <p>[DISTINCT|DISTINCT/ALL keyword] processing: If the query is a "SELECT
       DISTINCT" query, duplicate rows are removed from the set of result rows.
</ol>

<p>^There are two types of simple SELECT statement - aggregate and 
non-aggregate queries. ^A simple SELECT statement is an aggregate query if
it contains either a GROUP BY clause or one or more aggregate functions
in the result-set. ^Otherwise, if a simple SELECT contains no aggregate
functions or a GROUP BY clause, it is a non-aggregate query.

<p><b>1. Determination of input data (FROM clause processing).</b>
<tcl>hd_fragment fromclause</tcl>
<tcl>hd_keywords {FROM clause}</tcl>

<p>The input data used by a simple SELECT query is a set of <i>N</i> rows 
each <i>M</i> columns wide.

<p>^(If the FROM clause is omitted from a simple SELECT statement, then the 
input data is implicitly a single row zero columns wide)^ (i.e. <i>N</i>=1 and
<i>M</i>=0).

<p>If a FROM clause is specified, the data on which a simple SELECT query
operates comes from the one or more tables or subqueries (SELECT statements
in parenthesis) specified following the FROM keyword. ^A sub-select specified
in the join-source following the FROM clause in a simple SELECT statement is
handled as if it was a table containing the data returned by executing the
sub-select statement. ^Each column of the sub-select dataset inherits the
[collation sequence] and [affinity] of the corresponding expression in the
sub-select statement.

<p>^If there is only a single table in the join-source following the FROM
clause, then the input data used by the SELECT statement is the contents of the
named table. ^If there is more than one table specified as part of the

join-source following the FROM keyword, then the contents of each named table
are joined into a single dataset for the simple SELECT statement to operate on.
Exactly how the data is combined depends on the specific [join-op] and
[join-constraint] used to connect the tables or subqueries together.

<p>All joins in SQLite are based on the cartesian product of the left and
right-hand datasets. ^The columns of the cartesian product dataset are, in 
order, all the columns of the left-hand dataset followed by all the columns
of the right-hand dataset. ^There is a row in the cartesian product dataset
formed by combining each unique combination of a row from the left-hand 
and right-hand datasets. ^(In other words, if the left-hand dataset consists of
<i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of
<i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a
dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^

<p>^If the join-op is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. ^(The "CROSS JOIN" join operator produces the same data as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
<a href=optoverview.html#manctrl>handled slightly differently by the query
optimizer</a>. Otherwise, it is the cartesian product modified 
according to one or more of the following bullet points: 

<ul>











  <li> <p>^(If there is an ON clause specified, then the ON expression is
       evaluated for each row of the cartesian product and the result cast to 
       a numeric value as if by a [CAST expression]. All rows for which the
       expression evaluates to NULL or zero (integer value 0 or real value 0.0)
       are excluded from the dataset.)^

  <li> <p>^If there is a USING clause specified as part of the join-constraint,
       then each of the column names specified must exist in the datasets to 
       both the left and right of the join-op. ^(For each pair of namesake
       columns, the expression "lhs.X = rhs.X" is evaluated for each row of
       the cartesian product and the result cast to a numeric value. All rows
       for which one or more of the expressions evaluates to NULL or zero are
       excluded from the result set.)^ ^When comparing values as a result of a
       USING clause, the normal rules for handling affinities, collation
       sequences and NULL values in comparisons apply. ^The column from the
       dataset on the left-hand side of the join operator is considered to
       be on the left-hand side of the comparison operator (=) for the purposes
       of collation sequence and affinity precedence.

       <p>^For each pair of columns identified by a USING clause, the column
       from the right-hand dataset is omitted from the joined dataset. ^This 
       is the only difference between a USING clause and its equivalent ON
       constraint.









  <li> <p>^(If the NATURAL keyword is added to any of the join-ops, then an
       implicit USING clause is added to the join-constraints. The implicit
       USING clause contains each of the column names that appear in both
       the left and right-hand input datasets.)^ ^If the left and right-hand
       input datasets feature no common column names, then the NATURAL keyword
       has no effect on the results of the join. ^A USING or ON clause may
       not be added to a join that specifies the NATURAL keyword.

  <li> <p>^(If the join-op is a "LEFT JOIN" or "LEFT OUTER JOIN", then after
       the ON or USING filtering clauses have been applied, an extra row is 
       added to the output for each row in the original left-hand input 
       dataset that corresponds to no rows at all in the composite
       dataset (if any).)^ ^The added rows contain NULL values in the columns
       that would normally contain values copied from the right-hand input
       dataset.  
</ul>

<p>^(When more than two tables are joined together as part of a FROM clause,
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^
       




<p><b>2. WHERE clause filtering.</b>
<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>

<p>^If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the dataset computed by the explicit or implicit FROM clause and
the result cast to a numeric value. ^All rows for which the WHERE clause
expression evaluates to a NULL value or to zero (integer value 0 or real value
0.0) are excluded from the dataset before continuing.

<p><b>3. Generation of the set of result rows.</b>
<tcl>hd_fragment resultset</tcl>
<tcl>hd_keywords {result-set expressions} {GROUP BY}</tcl>

<p>Once the input dataset from the FROM clause has been filtered by the
WHERE clause expression (if any), the set of result rows for the simple 
SELECT are calculated. Exactly how this is done depends on whether the simple 
SELECT is an aggregate or non-aggregate query, and whether or not a GROUP
BY clause was specified.

2750
2751
2752
2753
2754
2755
2756
2757
2758
2759
2760
2761

2762
2763
2764
2765
2766
2767
2768
    aggregate query without a GROUP BY clause. ^An aggregate query without a
    GROUP BY clause always returns exactly one row of data.

  <li><p>^If the SELECT statement is <b>an aggregate query with a GROUP
    BY</b> clause, then each of the expressions specified as part of the
    GROUP BY clause is evaluated for each row of the filtered input dataset.
    ^Each row is grouped with all other rows for which all GROUP BY 
    expressions evaluate to the same values. For the purposes of grouping 
    rows, NULL values are considered equal. The usual rules for selecting
    a collation sequence with which to compare text values apply.
    ^The expressions in the GROUP BY clause do <em>not</em> have to be
    expressions that appear in the result.  


    <p>^If a HAVING clause is specified, it is evaluated once for each group 
    of rows and cast to an integer value. ^If the HAVING clause is an aggregate
    expression, it is evaluated across all rows in the group. ^If a HAVING 
    clause is a non-aggregate expression, it is evaluated with respect to an
    arbitrarily selected row from the group. ^If the result of evaluating the
    HAVING clause is NULL or zero (integer value 0), the group is discarded.







|
|
|
|
|
>







2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
2797
2798
2799
    aggregate query without a GROUP BY clause. ^An aggregate query without a
    GROUP BY clause always returns exactly one row of data.

  <li><p>^If the SELECT statement is <b>an aggregate query with a GROUP
    BY</b> clause, then each of the expressions specified as part of the
    GROUP BY clause is evaluated for each row of the filtered input dataset.
    ^Each row is grouped with all other rows for which all GROUP BY 
    expressions evaluate to the same values. ^For the purposes of grouping 
    rows, NULL values are considered equal. ^The usual rules for selecting
    a collation sequence with which to compare text values apply when evaluating
    expressions in a GROUP BY clause.  ^The expressions in the GROUP BY clause
    do <em>not</em> have to be expressions that appear in the result. ^The
    expressions in a GROUP BY clause may not be aggregate expressions.

    <p>^If a HAVING clause is specified, it is evaluated once for each group 
    of rows and cast to an integer value. ^If the HAVING clause is an aggregate
    expression, it is evaluated across all rows in the group. ^If a HAVING 
    clause is a non-aggregate expression, it is evaluated with respect to an
    arbitrarily selected row from the group. ^If the result of evaluating the
    HAVING clause is NULL or zero (integer value 0), the group is discarded.
2780
2781
2782
2783
2784
2785
2786


2787
2788
2789
2790
2791
2792
2793
    set of result rows. ^Subject to filtering associated with the DISTINCT
    keyword, the number of rows returned by an aggregate query with a GROUP
    BY clause is the same as the number of groups of rows produced by applying
    the GROUP BY and HAVING clauses to the filtered input dataset.
</ul>

<p><b>4. Removal of duplicate rows (DISTINCT processing).</b>



<p>^(One of the ALL or DISTINCT keywords may follow the SELECT keyword in a 
simple SELECT statement. If neither of these are present, then the behaviour
is as if ALL were specified.)^ ^If the simple SELECT is a SELECT ALL, then the
entire set of result rows are returned by the SELECT. ^If the simple SELECT
is a SELECT DISTINCT, then duplicate rows are removed from the set of result
rows before it is returned. ^For the purposes of detecting duplicate rows,







>
>







2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
    set of result rows. ^Subject to filtering associated with the DISTINCT
    keyword, the number of rows returned by an aggregate query with a GROUP
    BY clause is the same as the number of groups of rows produced by applying
    the GROUP BY and HAVING clauses to the filtered input dataset.
</ul>

<p><b>4. Removal of duplicate rows (DISTINCT processing).</b>
<tcl>hd_fragment distinct</tcl>
<tcl>hd_keywords {DISTINCT}</tcl>

<p>^(One of the ALL or DISTINCT keywords may follow the SELECT keyword in a 
simple SELECT statement. If neither of these are present, then the behaviour
is as if ALL were specified.)^ ^If the simple SELECT is a SELECT ALL, then the
entire set of result rows are returned by the SELECT. ^If the simple SELECT
is a SELECT DISTINCT, then duplicate rows are removed from the set of result
rows before it is returned. ^For the purposes of detecting duplicate rows,