Documentation Source Text

Check-in [84418fef8d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add further examples to windowfunctions.in.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 84418fef8d6e4deb1b6f5fe36f38fbd66843fafc82109b9d4e349d3c159b0f20
User & Date: dan 2018-06-22 16:14:07
Context
2018-06-25
20:35
Add documentation for implementing new aggregate window functions. check-in: b5a81b3bdf user: dan tags: trunk
2018-06-22
16:14
Add further examples to windowfunctions.in. check-in: 84418fef8d user: dan tags: trunk
2018-06-21
21:00
Improve windowfunctions.in. check-in: 7094fcac78 user: dan tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/windowfunctions.in.

     2      2   
     3      3   <title>SQLite SQL Window Function Support</title>
     4      4   <table_of_contents>
     5      5   
     6      6   <h2 style="margin-left:1.0em" notoc id=overview> Overview</h2>
     7      7   
     8      8   <p>This page describes the support for SQL window functions added to SQLite
     9         -[version 3.25.0] ([dateof:3.25.0]).
            9  +[version 3.25.0] ([dateof:3.25.0]). SQLite's window function support is closely
           10  +modeled on that of <a href=http://www.postgresql.org>PostgreSQL</a>.
    10     11   
    11     12   <h1>Introduction to Window Functions</h1>
    12     13   
    13     14   <p>A window function is a special type of SQL function for which the results
    14     15   depend on the contents of a "window" of one or more contiguous rows returned
    15     16   by the SELECT statement containing the window function. A window function may
    16     17   only be used in the select-list or ORDER BY clause of a SELECT or sub-select
................................................................................
    37     38   
    38     39   <codeblock>
    39     40     CREATE TABLE t0(x INTEGER PRIMARY KEY, y TEXT);
    40     41     INSERT INTO t0 VALUES (1, 'aaa'), (2, 'ccc'), (3, 'bbb');
    41     42   </codeblock>
    42     43   
    43     44   <p>Then:
           45  +
    44     46   <codeblock>
    45     47     <i>-- The following SELECT statement returns:</i>
    46     48     <i>-- </i>
    47         -  <i>--     1|aaa|1</i>
    48         -  <i>--     2|ccc|3</i>
    49         -  <i>--     3|bbb|2</i>
           49  +  <i>--   x | y | row_number</i>
           50  +  -----------------------
           51  +  <i>--   1 | aaa | 1         </i>
           52  +  <i>--   2 | ccc | 3         </i>
           53  +  <i>--   3 | bbb | 2         </i>
    50     54     <i>-- </i>
    51         -  SELECT x, y, row_number() OVER (ORDER BY y) FROM t0 ORDER BY x;
           55  +  SELECT x, y, row_number() OVER (ORDER BY y) AS row_number FROM t0 ORDER BY x;
    52     56   </codeblock>
    53     57   
    54     58   <p>The example above uses the special built-in window function row_number().
    55     59   This function returns a monotonically increasing integer assigned to each
    56     60   row in order of the the "ORDER BY" clause within the
    57     61   <i>&lt;window-definition&gt;</i> (in this case "ORDER BY y"). Note that 
    58     62   this does not affect the order in which results are returned to the user -
................................................................................
    90     94                             (7, 'G', 'one'  );
    91     95   </codeblock>
    92     96   
    93     97   <p> An aggregate window function is similar to an aggregate function, except
    94     98   adding it to a query does not change the number of rows returned. Instead,
    95     99   for each row the result of the aggregate window function is as if the
    96    100   corresponding aggregate were run over all rows in the "window frame".
          101  +
    97    102   
    98    103   <codeblock>
    99    104     <i>-- The following SELECT statement returns:</i>
   100    105     <i>-- </i>
   101         -  <i>--     1|A|A.B</i>
   102         -  <i>--     2|B|A.B.C</i>
   103         -  <i>--     3|C|B.C.D</i>
   104         -  <i>--     4|D|C.D.E</i>
   105         -  <i>--     5|E|D.E.F</i>
   106         -  <i>--     6|F|E.F.G</i>
   107         -  <i>--     7|G|F.G</i>
          106  +  <i>--   a | b | group_concat</i>
          107  +  -------------------------
          108  +  <i>--   1 | A | A.B         </i>
          109  +  <i>--   2 | B | A.B.C       </i>
          110  +  <i>--   3 | C | B.C.D       </i>
          111  +  <i>--   4 | D | C.D.E       </i>
          112  +  <i>--   5 | E | D.E.F       </i>
          113  +  <i>--   6 | F | E.F.G       </i>
          114  +  <i>--   7 | G | F.G         </i>
   108    115     <i>-- </i>
   109    116     SELECT a, b, group_concat(b, '.') OVER (
   110    117       ORDER BY a ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
   111         -  ) FROM t1;
          118  +  ) AS group_concat FROM t1;
   112    119   </codeblock>
   113    120   
   114    121   <p> In the example above, the window frame consists of all rows between the
   115    122   previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive,
   116    123   where rows are sorted according to the ORDER BY clause in the
   117    124   <i>&lt;window-definition&gt;</i> (in this case "ORDER BY a"). For example, the frame for the row with (a=3) consists of rows (2, 'B', 'two'), (3, 'C', 'three') and (4, 'D', 'one'). The result of group_concat(b, '.') for that row is therefore 'B.C.D'.
   118    125   
   119         -<p> The default &lt;frame-specification&gt; is:
          126  +<p> The default <i>&lt;frame-specification&gt;</i> is:
   120    127   
   121    128   <codeblock>
   122    129       RANGE BETWEEN UNBOUNDED PRECEDING TO CURRENT ROW
   123    130   </codeblock>
   124    131   
   125    132   <p> This means that, after sorting the rows returned by the SELECT according to
   126    133   the ORDER BY clause in the &lt;window-definition&gt;, the window frame consists
   127    134   of all rows between the first row and the last row with the same values as the current row for all ORDER BY expressions. This implies that rows that have the same values for all ORDER BY expressions will also have the same value for the result of the window function (as the window frame is the same). For example:
   128    135   
   129    136   <codeblock>
   130    137     <i>-- The following SELECT statement returns:</i>
   131    138     <i>-- </i>
   132         -  <i>--     1|A|one|A.D.G</i>
   133         -  <i>--     2|B|two|A.D.G.C.F.B.E</i>
   134         -  <i>--     3|C|three|A.D.G.C.F</i>
   135         -  <i>--     4|D|one|A.D.G</i>
   136         -  <i>--     5|E|two|A.D.G.C.F.B.E</i>
   137         -  <i>--     6|F|three|A.D.G.C.F</i>
   138         -  <i>--     7|G|one|A.D.G</i>
          139  +  <i>--   a | b | c | group_concat</i>
          140  +  -----------------------------
          141  +  <i>--   1 | A | one   | A.D.G       </i>
          142  +  <i>--   2 | B | two   | A.D.G.C.F.B.E</i>
          143  +  <i>--   3 | C | three | A.D.G.C.F   </i>
          144  +  <i>--   4 | D | one   | A.D.G       </i>
          145  +  <i>--   5 | E | two   | A.D.G.C.F.B.E</i>
          146  +  <i>--   6 | F | three | A.D.G.C.F   </i>
          147  +  <i>--   7 | G | one   | A.D.G       </i>
   139    148     <i>-- </i>
   140         -  SELECT a, b, c, group_concat(b, '.') OVER (ORDER BY c) FROM t1 ORDER BY a;
          149  +  SELECT a, b, c, 
          150  +         group_concat(b, '.') OVER (ORDER BY c) AS group_concat 
          151  +  FROM t1 ORDER BY a;
   141    152   </codeblock>
   142    153   
   143    154   <p> All of SQLite's [Aggregate Functions|aggregate functions] may be used as aggregate window
   144    155   functions. It is also possible to 
   145    156   [user-defined window functions|create user-defined aggregate window functions].
   146    157   
   147    158   <h2>Frame Specifications</h2>
................................................................................
   189    200   <p> In the following example, the window frame for each row consists of all
   190    201   rows from the current row to the end of the set, where rows are sorted
   191    202   according to "ORDER BY a".
   192    203   
   193    204   <codeblock>
   194    205     <i>-- The following SELECT statement returns:</i>
   195    206     <i>-- </i>
   196         -  <i>-- one  |1|A|A.D.G.C.F.B.E</i>
   197         -  <i>-- one  |4|D|D.G.C.F.B.E</i>
   198         -  <i>-- one  |7|G|G.C.F.B.E</i>
   199         -  <i>-- three|3|C|C.F.B.E</i>
   200         -  <i>-- three|6|F|F.B.E</i>
   201         -  <i>-- two  |2|B|B.E</i>
   202         -  <i>-- two  |5|E|E</i>
          207  +  <i>--   c     | a | b | group_concat</i>
          208  +  ---------------------------------
          209  +  <i>--   one   | 1 | A | A.D.G.C.F.B.E</i>
          210  +  <i>--   one   | 4 | D | D.G.C.F.B.E </i>
          211  +  <i>--   one   | 7 | G | G.C.F.B.E   </i>
          212  +  <i>--   three | 3 | C | C.F.B.E     </i>
          213  +  <i>--   three | 6 | F | F.B.E       </i>
          214  +  <i>--   two   | 2 | B | B.E         </i>
          215  +  <i>--   two   | 5 | E | E           </i>
   203    216     <i>-- </i>
   204    217     SELECT c, a, b, group_concat(b, '.') OVER (
   205    218       ORDER BY c, a ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   206         -  ) 
          219  +  ) AS group_concat
   207    220     FROM t1 ORDER BY c, a;
   208    221   </codeblock>
   209         -
   210    222   
   211    223   <h2>The PARTITION BY Clause</h2>
   212    224   
   213    225   <p> A <i>&lt;window-definition&gt;</i> may also include a PARTITION BY clause.
   214    226   If so, the rows returned by the SELECT statement are divided into groups -
   215    227   partitions - with the same values for each PARTITION BY expression, and then
   216    228   window-function processing performed as described above separately for each
................................................................................
   218    230   are divided into groups before any aggregate processing is performed.
   219    231   
   220    232   <p> For example:
   221    233   
   222    234   <codeblock>
   223    235     <i>-- The following SELECT statement returns:</i>
   224    236     <i>-- </i>
   225         -  <i>-- one  |1|A|A</i>
   226         -  <i>-- one  |4|D|A.D</i>
   227         -  <i>-- one  |7|G|A.D.G</i>
   228         -  <i>-- three|3|C|C</i>
   229         -  <i>-- three|6|F|C.F</i>
   230         -  <i>-- two  |2|B|B</i>
   231         -  <i>-- two  |5|E|B.E</i>
          237  +  <i>--   c     | a | b | group_concat</i>
          238  +  ---------------------------------
          239  +  <i>--   one   | 1 | A | A.D.G       </i>
          240  +  <i>--   one   | 4 | D | D.G         </i>
          241  +  <i>--   one   | 7 | G | G           </i>
          242  +  <i>--   three | 3 | C | C.F         </i>
          243  +  <i>--   three | 6 | F | F           </i>
          244  +  <i>--   two   | 2 | B | B.E         </i>
          245  +  <i>--   two   | 5 | E | E           </i>
   232    246     <i>-- </i>
   233    247     SELECT c, a, b, group_concat(b, '.') OVER (
   234    248       PARTITION BY c ORDER BY a RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
   235         -  ) 
          249  +  ) AS group_concat
   236    250     FROM t1 ORDER BY c, a;
   237    251   </codeblock>
   238    252   
   239    253   <h2>The FILTER Clause</h2>
   240    254   
   241    255   <p>If a FILTER clause is provided, then only rows for which the <i>expr</i> is
   242    256   true are included in the window frame. The aggregate window still returns a
   243    257   value for every row, but those for which the FILTER expression evaluates to
   244    258   other than true are not included in the window frame for any row. For example:
   245    259   
   246    260   <codeblock>
   247    261     <i>-- The following SELECT statement returns:</i>
   248         -  <i>--</i>
   249         -  <i>-- one  |1|A|A.C.D.F.G</i>
   250         -  <i>-- two  |2|B|C.D.F.G</i>
   251         -  <i>-- three|3|C|C.D.F.G</i>
   252         -  <i>-- one  |4|D|D.F.G</i>
   253         -  <i>-- two  |5|E|F.G</i>
   254         -  <i>-- three|6|F|F.G</i>
   255         -  <i>-- one  |7|G|G</i>
          262  +  <i>-- </i>
          263  +  <i>--   c     | a | b | group_concat</i>
          264  +  ---------------------------------
          265  +  <i>--   one   | 1 | A | A           </i>
          266  +  <i>--   two   | 2 | B | A           </i>
          267  +  <i>--   three | 3 | C | A.C         </i>
          268  +  <i>--   one   | 4 | D | A.C.D       </i>
          269  +  <i>--   two   | 5 | E | A.C.D       </i>
          270  +  <i>--   three | 6 | F | A.C.D.F     </i>
          271  +  <i>--   one   | 7 | G | A.C.D.F.G   </i>
   256    272     <i>-- </i>
   257         -  SELECT c, a, b, group_concat(b, '.') FILTER (WHERE c!='two') OVER (ORDER BY a)
          273  +  SELECT c, a, b, group_concat(b, '.') FILTER (WHERE c!='two') OVER (
          274  +    ORDER BY a
          275  +  ) AS group_concat
   258    276     FROM t1 ORDER BY a;
   259    277   </codeblock>
   260         -
   261    278   <h2 tags="user-defined window functions">User-Defined Aggregate Window Functions</h2>
   262    279   
   263    280   <p>TODO: Link to C API docs (sqlite3_create_window_function()).
   264    281   
   265    282   <h1>Built-in Window Functions</h1>
   266    283   
   267    284   <p> As well as aggregate window functions, SQLite features a set of built-in
................................................................................
   363    380          for each row.
   364    381     <dt><p><b>nth_value(expr, N)</b>
   365    382     <dd><p> This built-in window function calculates the window frame for each
   366    383          row in the same way as an aggregate window function. It returns the
   367    384          value of <i>expr</i> evaluated against the row <i>N</i> of the window
   368    385          frame. Rows are numbered within the window frame starting from 1 in
   369    386          the order defined by the ORDER BY clause if one is present, or in
   370         -       arbitrary order otherwise.
          387  +       arbitrary order otherwise. If there is no <i>N</i>th row in the
          388  +       partition, then NULL is returned.
   371    389     <dd>
   372    390   </dl>
   373    391   
          392  +<p>The examples in this section all assume the following data:
          393  +
          394  +<codeblock>
          395  +  CREATE TABLE t2(a, b);
          396  +  INSERT INTO t2 VALUES('a', 'one'), 
          397  +                       ('a', 'two'), 
          398  +                       ('a', 'three'), 
          399  +                       ('b', 'four'), 
          400  +                       ('c', 'five'), 
          401  +                       ('c', 'six');
          402  +</codeblock>
          403  +
          404  +<p>The following example illustrates the behaviour of the five ranking
          405  +functions - row_number(), rank(), dense_rank(), percent_rank() and 
          406  +cume_dist().
          407  +
          408  +<codeblock>
          409  +  <i>-- The following SELECT statement returns:</i>
          410  +  <i>-- </i>
          411  +  <i>--   a | row_number | rank | dense_rank | percent_rank | cume_dist</i>
          412  +  ------------------------------------------------------------------
          413  +  <i>--   a |          1 |    1 |          1 |          0.0 |       0.5</i>
          414  +  <i>--   a |          2 |    1 |          1 |          0.0 |       0.5</i>
          415  +  <i>--   a |          3 |    1 |          1 |          0.0 |       0.5</i>
          416  +  <i>--   b |          4 |    4 |          2 |          0.6 |       0.66</i>
          417  +  <i>--   c |          5 |    5 |          3 |          0.8 |       1.0</i>
          418  +  <i>--   c |          6 |    5 |          3 |          0.8 |       1.0</i>
          419  +  <i>-- </i>
          420  +  SELECT a                        AS a,
          421  +         row_number() OVER win    AS row_number,
          422  +         rank() OVER win          AS rank,
          423  +         dense_rank() OVER win    AS dense_rank,
          424  +         percent_rank() OVER win  AS percent_rank,
          425  +         cume_dist() OVER win     AS cume_dist
          426  +  FROM t2
          427  +  WINDOW win AS (ORDER BY a);
          428  +</codeblock>
          429  +
          430  +<p>The example below uses ntile() to divde the six rows into two groups (the
          431  +ntile(2) call) and into four groups (the ntile(4) call). For ntile(2), there
          432  +are three rows assigned to each group. For ntile(4), there are two groups of
          433  +two and two groups of one. The larger groups of two appear first.
          434  +
          435  +<codeblock>
          436  +  <i>-- The following SELECT statement returns:</i>
          437  +  <i>-- </i>
          438  +  <i>--   a | b     | ntile_2 | ntile_4</i>
          439  +  ----------------------------------
          440  +  <i>--   a | one   |       1 |       1</i>
          441  +  <i>--   a | two   |       1 |       1</i>
          442  +  <i>--   a | three |       1 |       2</i>
          443  +  <i>--   b | four  |       2 |       2</i>
          444  +  <i>--   c | five  |       2 |       3</i>
          445  +  <i>--   c | six   |       2 |       4</i>
          446  +  <i>-- </i>
          447  +  SELECT a                        AS a,
          448  +         b                        AS b,
          449  +         ntile(2) OVER win        AS ntile_2,
          450  +         ntile(4) OVER win        AS ntile_4
          451  +  FROM t2
          452  +  WINDOW win AS (ORDER BY a);
          453  +</codeblock>
          454  +
          455  +<p> The next example demonstrates lag(), lead(), first_value(), last_value()
          456  +and nth_value(). The <i>&lt;frame-specification&gt;</i> is ignored by both lag() and lead(), but respected by first_value(), last_value() and nth_value().
          457  +
          458  +<codeblock>
          459  +  <i>-- The following SELECT statement returns:</i>
          460  +  <i>-- </i>
          461  +  <i>--   b | lead | lag  | first_value | last_value | nth_value_3</i>
          462  +  -------------------------------------------------------------
          463  +  <i>--   A | C    | NULL | A           | A          | NULL       </i>
          464  +  <i>--   B | D    | A    | A           | B          | NULL       </i>
          465  +  <i>--   C | E    | B    | A           | C          | C          </i>
          466  +  <i>--   D | F    | C    | A           | D          | C          </i>
          467  +  <i>--   E | G    | D    | A           | E          | C          </i>
          468  +  <i>--   F | n/a  | E    | A           | F          | C          </i>
          469  +  <i>--   G | n/a  | F    | A           | G          | C          </i>
          470  +  <i>-- </i>
          471  +  SELECT b                          AS b,
          472  +         lead(b, 2, 'n/a') OVER win AS lead,
          473  +         lag(b) OVER win            AS lag,
          474  +         first_value(b) OVER win    AS first_value,
          475  +         last_value(b) OVER win     AS last_value,
          476  +         nth_value(b, 3) OVER win   AS nth_value_3
          477  +  FROM t1
          478  +  WINDOW win AS (ORDER BY b ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
          479  +</codeblock>