Documentation Source Text

Check-in [84418fef8d]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add further examples to windowfunctions.in.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 84418fef8d6e4deb1b6f5fe36f38fbd66843fafc82109b9d4e349d3c159b0f20
User & Date: dan 2018-06-22 16:14:07.086
Context
2018-06-25
20:35
Add documentation for implementing new aggregate window functions. (check-in: b5a81b3bdf user: dan tags: trunk)
2018-06-22
16:14
Add further examples to windowfunctions.in. (check-in: 84418fef8d user: dan tags: trunk)
2018-06-21
21:00
Improve windowfunctions.in. (check-in: 7094fcac78 user: dan tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/windowfunctions.in.
1
2
3
4
5
6
7
8
9

10
11
12
13
14
15
16
<tcl>hd_keywords {window functions}</tcl>

<title>SQLite SQL Window Function Support</title>
<table_of_contents>

<h2 style="margin-left:1.0em" notoc id=overview> Overview</h2>

<p>This page describes the support for SQL window functions added to SQLite
[version 3.25.0] ([dateof:3.25.0]).


<h1>Introduction to Window Functions</h1>

<p>A window function is a special type of SQL function for which the results
depend on the contents of a "window" of one or more contiguous rows returned
by the SELECT statement containing the window function. A window function may
only be used in the select-list or ORDER BY clause of a SELECT or sub-select








|
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
<tcl>hd_keywords {window functions}</tcl>

<title>SQLite SQL Window Function Support</title>
<table_of_contents>

<h2 style="margin-left:1.0em" notoc id=overview> Overview</h2>

<p>This page describes the support for SQL window functions added to SQLite
[version 3.25.0] ([dateof:3.25.0]). SQLite's window function support is closely
modeled on that of <a href=http://www.postgresql.org>PostgreSQL</a>.

<h1>Introduction to Window Functions</h1>

<p>A window function is a special type of SQL function for which the results
depend on the contents of a "window" of one or more contiguous rows returned
by the SELECT statement containing the window function. A window function may
only be used in the select-list or ORDER BY clause of a SELECT or sub-select
37
38
39
40
41
42
43

44
45
46
47

48
49

50
51
52
53
54
55
56
57
58

<codeblock>
  CREATE TABLE t0(x INTEGER PRIMARY KEY, y TEXT);
  INSERT INTO t0 VALUES (1, 'aaa'), (2, 'ccc'), (3, 'bbb');
</codeblock>

<p>Then:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--     1|aaa|1</i>

  <i>--     2|ccc|3</i>
  <i>--     3|bbb|2</i>

  <i>-- </i>
  SELECT x, y, row_number() OVER (ORDER BY y) FROM t0 ORDER BY x;
</codeblock>

<p>The example above uses the special built-in window function row_number().
This function returns a monotonically increasing integer assigned to each
row in order of the the "ORDER BY" clause within the
<i>&lt;window-definition&gt;</i> (in this case "ORDER BY y"). Note that 
this does not affect the order in which results are returned to the user -







>



|
>
|
|
>

|







38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

<codeblock>
  CREATE TABLE t0(x INTEGER PRIMARY KEY, y TEXT);
  INSERT INTO t0 VALUES (1, 'aaa'), (2, 'ccc'), (3, 'bbb');
</codeblock>

<p>Then:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   x | y | row_number</i>
  -----------------------
  <i>--   1 | aaa | 1         </i>
  <i>--   2 | ccc | 3         </i>
  <i>--   3 | bbb | 2         </i>
  <i>-- </i>
  SELECT x, y, row_number() OVER (ORDER BY y) AS row_number FROM t0 ORDER BY x;
</codeblock>

<p>The example above uses the special built-in window function row_number().
This function returns a monotonically increasing integer assigned to each
row in order of the the "ORDER BY" clause within the
<i>&lt;window-definition&gt;</i> (in this case "ORDER BY y"). Note that 
this does not affect the order in which results are returned to the user -
90
91
92
93
94
95
96

97
98
99
100
101

102
103
104
105
106
107

108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132


133
134
135
136
137
138
139

140

141
142
143
144
145
146
147
                          (7, 'G', 'one'  );
</codeblock>

<p> An aggregate window function is similar to an aggregate function, except
adding it to a query does not change the number of rows returned. Instead,
for each row the result of the aggregate window function is as if the
corresponding aggregate were run over all rows in the "window frame".


<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--     1|A|A.B</i>

  <i>--     2|B|A.B.C</i>
  <i>--     3|C|B.C.D</i>
  <i>--     4|D|C.D.E</i>
  <i>--     5|E|D.E.F</i>
  <i>--     6|F|E.F.G</i>
  <i>--     7|G|F.G</i>

  <i>-- </i>
  SELECT a, b, group_concat(b, '.') OVER (
    ORDER BY a ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) FROM t1;
</codeblock>

<p> In the example above, the window frame consists of all rows between the
previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive,
where rows are sorted according to the ORDER BY clause in the
<i>&lt;window-definition&gt;</i> (in this case "ORDER BY a"). For example, the frame for the row with (a=3) consists of rows (2, 'B', 'two'), (3, 'C', 'three') and (4, 'D', 'one'). The result of group_concat(b, '.') for that row is therefore 'B.C.D'.

<p> The default &lt;frame-specification&gt; is:

<codeblock>
    RANGE BETWEEN UNBOUNDED PRECEDING TO CURRENT ROW
</codeblock>

<p> This means that, after sorting the rows returned by the SELECT according to
the ORDER BY clause in the &lt;window-definition&gt;, the window frame consists
of all rows between the first row and the last row with the same values as the current row for all ORDER BY expressions. This implies that rows that have the same values for all ORDER BY expressions will also have the same value for the result of the window function (as the window frame is the same). For example:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--     1|A|one|A.D.G</i>


  <i>--     2|B|two|A.D.G.C.F.B.E</i>
  <i>--     3|C|three|A.D.G.C.F</i>
  <i>--     4|D|one|A.D.G</i>
  <i>--     5|E|two|A.D.G.C.F.B.E</i>
  <i>--     6|F|three|A.D.G.C.F</i>
  <i>--     7|G|one|A.D.G</i>
  <i>-- </i>

  SELECT a, b, c, group_concat(b, '.') OVER (ORDER BY c) FROM t1 ORDER BY a;

</codeblock>

<p> All of SQLite's [Aggregate Functions|aggregate functions] may be used as aggregate window
functions. It is also possible to 
[user-defined window functions|create user-defined aggregate window functions].

<h2>Frame Specifications</h2>







>




|
>
|
|
|
|
|
|
>



|







|












|
>
>
|
|
|
|
|
|

>
|
>







94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
                          (7, 'G', 'one'  );
</codeblock>

<p> An aggregate window function is similar to an aggregate function, except
adding it to a query does not change the number of rows returned. Instead,
for each row the result of the aggregate window function is as if the
corresponding aggregate were run over all rows in the "window frame".


<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   a | b | group_concat</i>
  -------------------------
  <i>--   1 | A | A.B         </i>
  <i>--   2 | B | A.B.C       </i>
  <i>--   3 | C | B.C.D       </i>
  <i>--   4 | D | C.D.E       </i>
  <i>--   5 | E | D.E.F       </i>
  <i>--   6 | F | E.F.G       </i>
  <i>--   7 | G | F.G         </i>
  <i>-- </i>
  SELECT a, b, group_concat(b, '.') OVER (
    ORDER BY a ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
  ) AS group_concat FROM t1;
</codeblock>

<p> In the example above, the window frame consists of all rows between the
previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive,
where rows are sorted according to the ORDER BY clause in the
<i>&lt;window-definition&gt;</i> (in this case "ORDER BY a"). For example, the frame for the row with (a=3) consists of rows (2, 'B', 'two'), (3, 'C', 'three') and (4, 'D', 'one'). The result of group_concat(b, '.') for that row is therefore 'B.C.D'.

<p> The default <i>&lt;frame-specification&gt;</i> is:

<codeblock>
    RANGE BETWEEN UNBOUNDED PRECEDING TO CURRENT ROW
</codeblock>

<p> This means that, after sorting the rows returned by the SELECT according to
the ORDER BY clause in the &lt;window-definition&gt;, the window frame consists
of all rows between the first row and the last row with the same values as the current row for all ORDER BY expressions. This implies that rows that have the same values for all ORDER BY expressions will also have the same value for the result of the window function (as the window frame is the same). For example:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   a | b | c | group_concat</i>
  -----------------------------
  <i>--   1 | A | one   | A.D.G       </i>
  <i>--   2 | B | two   | A.D.G.C.F.B.E</i>
  <i>--   3 | C | three | A.D.G.C.F   </i>
  <i>--   4 | D | one   | A.D.G       </i>
  <i>--   5 | E | two   | A.D.G.C.F.B.E</i>
  <i>--   6 | F | three | A.D.G.C.F   </i>
  <i>--   7 | G | one   | A.D.G       </i>
  <i>-- </i>
  SELECT a, b, c, 
         group_concat(b, '.') OVER (ORDER BY c) AS group_concat 
  FROM t1 ORDER BY a;
</codeblock>

<p> All of SQLite's [Aggregate Functions|aggregate functions] may be used as aggregate window
functions. It is also possible to 
[user-defined window functions|create user-defined aggregate window functions].

<h2>Frame Specifications</h2>
189
190
191
192
193
194
195


196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225

226
227
228
229
230
231

232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249

250
251
252
253
254
255

256
257


258
259
260
261
262
263
264
265
266
267
<p> In the following example, the window frame for each row consists of all
rows from the current row to the end of the set, where rows are sorted
according to "ORDER BY a".

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>


  <i>-- one  |1|A|A.D.G.C.F.B.E</i>
  <i>-- one  |4|D|D.G.C.F.B.E</i>
  <i>-- one  |7|G|G.C.F.B.E</i>
  <i>-- three|3|C|C.F.B.E</i>
  <i>-- three|6|F|F.B.E</i>
  <i>-- two  |2|B|B.E</i>
  <i>-- two  |5|E|E</i>
  <i>-- </i>
  SELECT c, a, b, group_concat(b, '.') OVER (
    ORDER BY c, a ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) 
  FROM t1 ORDER BY c, a;
</codeblock>


<h2>The PARTITION BY Clause</h2>

<p> A <i>&lt;window-definition&gt;</i> may also include a PARTITION BY clause.
If so, the rows returned by the SELECT statement are divided into groups -
partitions - with the same values for each PARTITION BY expression, and then
window-function processing performed as described above separately for each
partition. This is similar to the way the rows traversed by an aggregate query
are divided into groups before any aggregate processing is performed.

<p> For example:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>-- one  |1|A|A</i>

  <i>-- one  |4|D|A.D</i>
  <i>-- one  |7|G|A.D.G</i>
  <i>-- three|3|C|C</i>
  <i>-- three|6|F|C.F</i>
  <i>-- two  |2|B|B</i>
  <i>-- two  |5|E|B.E</i>

  <i>-- </i>
  SELECT c, a, b, group_concat(b, '.') OVER (
    PARTITION BY c ORDER BY a RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) 
  FROM t1 ORDER BY c, a;
</codeblock>

<h2>The FILTER Clause</h2>

<p>If a FILTER clause is provided, then only rows for which the <i>expr</i> is
true are included in the window frame. The aggregate window still returns a
value for every row, but those for which the FILTER expression evaluates to
other than true are not included in the window frame for any row. For example:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>--</i>
  <i>-- one  |1|A|A.C.D.F.G</i>

  <i>-- two  |2|B|C.D.F.G</i>
  <i>-- three|3|C|C.D.F.G</i>
  <i>-- one  |4|D|D.F.G</i>
  <i>-- two  |5|E|F.G</i>
  <i>-- three|6|F|F.G</i>
  <i>-- one  |7|G|G</i>

  <i>-- </i>
  SELECT c, a, b, group_concat(b, '.') FILTER (WHERE c!='two') OVER (ORDER BY a)


  FROM t1 ORDER BY a;
</codeblock>

<h2 tags="user-defined window functions">User-Defined Aggregate Window Functions</h2>

<p>TODO: Link to C API docs (sqlite3_create_window_function()).

<h1>Built-in Window Functions</h1>

<p> As well as aggregate window functions, SQLite features a set of built-in







>
>
|
|
|
|
|
|
|



|


<















|
>
|
|
|
|
|
|
>



|












|
|
>
|
|
|
|
|
|
>

|
>
>


<







200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221

222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277

278
279
280
281
282
283
284
<p> In the following example, the window frame for each row consists of all
rows from the current row to the end of the set, where rows are sorted
according to "ORDER BY a".

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   c     | a | b | group_concat</i>
  ---------------------------------
  <i>--   one   | 1 | A | A.D.G.C.F.B.E</i>
  <i>--   one   | 4 | D | D.G.C.F.B.E </i>
  <i>--   one   | 7 | G | G.C.F.B.E   </i>
  <i>--   three | 3 | C | C.F.B.E     </i>
  <i>--   three | 6 | F | F.B.E       </i>
  <i>--   two   | 2 | B | B.E         </i>
  <i>--   two   | 5 | E | E           </i>
  <i>-- </i>
  SELECT c, a, b, group_concat(b, '.') OVER (
    ORDER BY c, a ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS group_concat
  FROM t1 ORDER BY c, a;
</codeblock>


<h2>The PARTITION BY Clause</h2>

<p> A <i>&lt;window-definition&gt;</i> may also include a PARTITION BY clause.
If so, the rows returned by the SELECT statement are divided into groups -
partitions - with the same values for each PARTITION BY expression, and then
window-function processing performed as described above separately for each
partition. This is similar to the way the rows traversed by an aggregate query
are divided into groups before any aggregate processing is performed.

<p> For example:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   c     | a | b | group_concat</i>
  ---------------------------------
  <i>--   one   | 1 | A | A.D.G       </i>
  <i>--   one   | 4 | D | D.G         </i>
  <i>--   one   | 7 | G | G           </i>
  <i>--   three | 3 | C | C.F         </i>
  <i>--   three | 6 | F | F           </i>
  <i>--   two   | 2 | B | B.E         </i>
  <i>--   two   | 5 | E | E           </i>
  <i>-- </i>
  SELECT c, a, b, group_concat(b, '.') OVER (
    PARTITION BY c ORDER BY a RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
  ) AS group_concat
  FROM t1 ORDER BY c, a;
</codeblock>

<h2>The FILTER Clause</h2>

<p>If a FILTER clause is provided, then only rows for which the <i>expr</i> is
true are included in the window frame. The aggregate window still returns a
value for every row, but those for which the FILTER expression evaluates to
other than true are not included in the window frame for any row. For example:

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   c     | a | b | group_concat</i>
  ---------------------------------
  <i>--   one   | 1 | A | A           </i>
  <i>--   two   | 2 | B | A           </i>
  <i>--   three | 3 | C | A.C         </i>
  <i>--   one   | 4 | D | A.C.D       </i>
  <i>--   two   | 5 | E | A.C.D       </i>
  <i>--   three | 6 | F | A.C.D.F     </i>
  <i>--   one   | 7 | G | A.C.D.F.G   </i>
  <i>-- </i>
  SELECT c, a, b, group_concat(b, '.') FILTER (WHERE c!='two') OVER (
    ORDER BY a
  ) AS group_concat
  FROM t1 ORDER BY a;
</codeblock>

<h2 tags="user-defined window functions">User-Defined Aggregate Window Functions</h2>

<p>TODO: Link to C API docs (sqlite3_create_window_function()).

<h1>Built-in Window Functions</h1>

<p> As well as aggregate window functions, SQLite features a set of built-in
363
364
365
366
367
368
369
370

371
372
373
























































































       for each row.
  <dt><p><b>nth_value(expr, N)</b>
  <dd><p> This built-in window function calculates the window frame for each
       row in the same way as an aggregate window function. It returns the
       value of <i>expr</i> evaluated against the row <i>N</i> of the window
       frame. Rows are numbered within the window frame starting from 1 in
       the order defined by the ORDER BY clause if one is present, or in
       arbitrary order otherwise.

  <dd>
</dl>
































































































|
>



>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
       for each row.
  <dt><p><b>nth_value(expr, N)</b>
  <dd><p> This built-in window function calculates the window frame for each
       row in the same way as an aggregate window function. It returns the
       value of <i>expr</i> evaluated against the row <i>N</i> of the window
       frame. Rows are numbered within the window frame starting from 1 in
       the order defined by the ORDER BY clause if one is present, or in
       arbitrary order otherwise. If there is no <i>N</i>th row in the
       partition, then NULL is returned.
  <dd>
</dl>

<p>The examples in this section all assume the following data:

<codeblock>
  CREATE TABLE t2(a, b);
  INSERT INTO t2 VALUES('a', 'one'), 
                       ('a', 'two'), 
                       ('a', 'three'), 
                       ('b', 'four'), 
                       ('c', 'five'), 
                       ('c', 'six');
</codeblock>

<p>The following example illustrates the behaviour of the five ranking
functions - row_number(), rank(), dense_rank(), percent_rank() and 
cume_dist().

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   a | row_number | rank | dense_rank | percent_rank | cume_dist</i>
  ------------------------------------------------------------------
  <i>--   a |          1 |    1 |          1 |          0.0 |       0.5</i>
  <i>--   a |          2 |    1 |          1 |          0.0 |       0.5</i>
  <i>--   a |          3 |    1 |          1 |          0.0 |       0.5</i>
  <i>--   b |          4 |    4 |          2 |          0.6 |       0.66</i>
  <i>--   c |          5 |    5 |          3 |          0.8 |       1.0</i>
  <i>--   c |          6 |    5 |          3 |          0.8 |       1.0</i>
  <i>-- </i>
  SELECT a                        AS a,
         row_number() OVER win    AS row_number,
         rank() OVER win          AS rank,
         dense_rank() OVER win    AS dense_rank,
         percent_rank() OVER win  AS percent_rank,
         cume_dist() OVER win     AS cume_dist
  FROM t2
  WINDOW win AS (ORDER BY a);
</codeblock>

<p>The example below uses ntile() to divde the six rows into two groups (the
ntile(2) call) and into four groups (the ntile(4) call). For ntile(2), there
are three rows assigned to each group. For ntile(4), there are two groups of
two and two groups of one. The larger groups of two appear first.

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   a | b     | ntile_2 | ntile_4</i>
  ----------------------------------
  <i>--   a | one   |       1 |       1</i>
  <i>--   a | two   |       1 |       1</i>
  <i>--   a | three |       1 |       2</i>
  <i>--   b | four  |       2 |       2</i>
  <i>--   c | five  |       2 |       3</i>
  <i>--   c | six   |       2 |       4</i>
  <i>-- </i>
  SELECT a                        AS a,
         b                        AS b,
         ntile(2) OVER win        AS ntile_2,
         ntile(4) OVER win        AS ntile_4
  FROM t2
  WINDOW win AS (ORDER BY a);
</codeblock>

<p> The next example demonstrates lag(), lead(), first_value(), last_value()
and nth_value(). The <i>&lt;frame-specification&gt;</i> is ignored by both lag() and lead(), but respected by first_value(), last_value() and nth_value().

<codeblock>
  <i>-- The following SELECT statement returns:</i>
  <i>-- </i>
  <i>--   b | lead | lag  | first_value | last_value | nth_value_3</i>
  -------------------------------------------------------------
  <i>--   A | C    | NULL | A           | A          | NULL       </i>
  <i>--   B | D    | A    | A           | B          | NULL       </i>
  <i>--   C | E    | B    | A           | C          | C          </i>
  <i>--   D | F    | C    | A           | D          | C          </i>
  <i>--   E | G    | D    | A           | E          | C          </i>
  <i>--   F | n/a  | E    | A           | F          | C          </i>
  <i>--   G | n/a  | F    | A           | G          | C          </i>
  <i>-- </i>
  SELECT b                          AS b,
         lead(b, 2, 'n/a') OVER win AS lead,
         lag(b) OVER win            AS lag,
         first_value(b) OVER win    AS first_value,
         last_value(b) OVER win     AS last_value,
         nth_value(b, 3) OVER win   AS nth_value_3
  FROM t1
  WINDOW win AS (ORDER BY b ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
</codeblock>