Documentation Source Text

Check-in [219fa13637]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Further enhancement to the optimizer overview document, giving the change log for 3.23.0 something to link to.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 219fa1363791b94165b75b0a15c2705a7b31b59827a0de82885e3a1152754c6c
User & Date: drh 2018-03-22 18:55:29.425
Context
2018-03-23
11:47
Fix typos. Clarification to the CoC in response to criticism. (check-in: 79f3efbc86 user: drh tags: trunk)
2018-03-22
18:55
Further enhancement to the optimizer overview document, giving the change log for 3.23.0 something to link to. (check-in: 219fa13637 user: drh tags: trunk)
13:36
Refactor the optoverview.html document to be written in HTML with occasional <tcl> tags, rather than in pure TCL, so that it works better with fancy-format. (check-in: 69fd8c5ba9 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/changes.in.
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
<li> If the [xColumn] method in a [virtual table] implementation returns
     an error message using [sqlite3_result_error()] then give that error
     message preference over internally-generated messages.
<li> Added the -A command-line option to the [CLI] to make it easier to manage
     [SQLite Archive files].
<li> Query optimizer enhancements:
<ol type='a'>
  <li> Improve the omit-left-join optimization so that it works in cases where
       the right-hand table is UNIQUE but not necessarily NOT NULL.
  <li> Improve the push-down optimization so that works for many LEFT JOINs.
  <li> Add the left join strength reduction optimization that converts a LEFT
       JOIN into an ordinary JOIN if there exist terms in the WHERE clause
       that would prevent the extra all-NULL row of the LEFT JOIN from
       appearing in the output set.
  <li> Avoid unnecessary writes to the sqlite_sequence table when an
       AUTOINCREMENT table is updated with an rowid that is less than the
       maximum.
</ol>
<li> Bug fixes:
<ol type='a'>
  <li> Fix the parser to accept valid [row value] syntax.
       Ticket [https://www.sqlite.org/src/info/7310e2fb3d046a5|7310e2fb3d046a5]
  <li> Fix the query planner so that it takes into account dependencies in







|

|
|




|







36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
<li> If the [xColumn] method in a [virtual table] implementation returns
     an error message using [sqlite3_result_error()] then give that error
     message preference over internally-generated messages.
<li> Added the -A command-line option to the [CLI] to make it easier to manage
     [SQLite Archive files].
<li> Query optimizer enhancements:
<ol type='a'>
  <li> Improve the [omit-left-join optimization] so that it works in cases where
       the right-hand table is UNIQUE but not necessarily NOT NULL.
  <li> Improve the [push-down optimization] so that works for many LEFT JOINs.
  <li> Add the [LEFT JOIN strength reduction optimization] that converts a LEFT
       JOIN into an ordinary JOIN if there exist terms in the WHERE clause
       that would prevent the extra all-NULL row of the LEFT JOIN from
       appearing in the output set.
  <li> Avoid unnecessary writes to the sqlite_sequence table when an
       [AUTOINCREMENT] table is updated with an rowid that is less than the
       maximum.
</ol>
<li> Bug fixes:
<ol type='a'>
  <li> Fix the parser to accept valid [row value] syntax.
       Ticket [https://www.sqlite.org/src/info/7310e2fb3d046a5|7310e2fb3d046a5]
  <li> Fix the query planner so that it takes into account dependencies in
Changes to pages/optoverview.in.
980
981
982
983
984
985
986
987

988
989
990
991
992
993
994
  the ORDER BY are compared against the previous row.  If they have
  changed, the current sort is finished and output and a new sort is
  started.  This results in a slightly faster sort.  But the bigger
  advantages are that many fewer rows need to be held in memory,
  reducing memory requirements, and outputs can begin to appear before
  the core query has run to completion.

<tcl>hd_fragment flattening {flattening optimization} {query flattener}</tcl>

<h1>Subquery flattening</h1>

<p>
  When a subquery occurs in the FROM clause of a SELECT, the simplest
  behavior is to evaluate the subquery into a transient table, then run
  the outer SELECT against the transient table.  But such a plan
  can be suboptimal since the transient table will not have any indices







|
>







980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
  the ORDER BY are compared against the previous row.  If they have
  changed, the current sort is finished and output and a new sort is
  started.  This results in a slightly faster sort.  But the bigger
  advantages are that many fewer rows need to be held in memory,
  reducing memory requirements, and outputs can begin to appear before
  the core query has run to completion.

<tcl>hd_fragment flattening {flattening optimization} \
   {query flattener} {flattened}</tcl>
<h1>Subquery flattening</h1>

<p>
  When a subquery occurs in the FROM clause of a SELECT, the simplest
  behavior is to evaluate the subquery into a transient table, then run
  the outer SELECT against the transient table.  But such a plan
  can be suboptimal since the transient table will not have any indices
1296
1297
1298
1299
1300
1301
1302





































































































































  The automatic indexes described here exist only for the duration of a
  single query, are never persisted to disk, and are only visible to a
  single database connection.  Internal indexes are part of the implementation
  of PRIMARY KEY and UNIQUE constraints, are long-lasting and persisted
  to disk, and are visible to all database connections.  The term "autoindex"
  appears in the names of [internal indexes] for legacy reasons and does
  not indicate that internal indexes and automatic indexes are related.












































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
  The automatic indexes described here exist only for the duration of a
  single query, are never persisted to disk, and are only visible to a
  single database connection.  Internal indexes are part of the implementation
  of PRIMARY KEY and UNIQUE constraints, are long-lasting and persisted
  to disk, and are visible to all database connections.  The term "autoindex"
  appears in the names of [internal indexes] for legacy reasons and does
  not indicate that internal indexes and automatic indexes are related.

<tcl>hd_fragment pushdown {push-down optimization}</tcl>
<h1>The Push-Down Optimization</h1>

<p>
  If a subquery cannot be [flattened] into the outer query, it might
  still be possible to enhance performance by "pushing down" WHERE clause
  terms from the outer query into the subquery.  Consider an example:

<codeblock>
  CREATE TABLE t1(a INT, b INT);
  CREATE TABLE t2(x INT, y INT);
  CREATE VIEW v1(a,b) AS SELECT DISTINCT a, b FROM t1;

  SELECT x, y, b
    FROM t2 JOIN v1 ON (x=a)
   WHERE b BETWEEN 10 AND 20;
</codeblock>

<p>
  The view v1 cannot be [flattened] because it is DISTINCT.  It must
  instead be run as a subquery with the results being stored in a
  transient table, then the join is performed between t2 and the
  transient table.  The push-down optimization pushes down the
  "b BETWEEN 10 AND 20" term into the view in order to make the transient
  table smaller, or (if there were an index on t1.b) help the subquery
  to run faster.  The resulting evaluation is like this:

<codeblock>
  SELECT x, y, b
    FROM t2
    JOIN (SELECT DISTINCT a, b FROM t1 WHERE b BETWEEN 10 AND 20)
   WHERE b BETWEEN 10 AND 10;
</codeblock>

<p>
  The push-down optimization cannot always be used.  For example,
  if the subquery contains a LIMIT, then pushing down any part of
  the WHERE clause from the outer query could change the result of
  the inner query.  There are other restrictions, explained in a
  comment in the source code on the pushDownWhereTerms() routine
  that implements this optimization.

<tcl>hd_fragment leftjoinreduction \
   {LEFT JOIN strength reduction optimization}</tcl>
<h1>The LEFT JOIN Strength Reduction Optimization</h1>

<p>
  Sometimes a LEFT JOIN can be converted into an ordinary JOIN,
  if there are terms in the WHERE clause that guarantee that the
  two joins will give identical results.  In particular, if any
  column in the right-hand table of the LEFT JOIN must be non-NULL
  in order for the WHERE clause to be true, then the LEFT JOIN is
  demoted to an ordinary JOIN.

<p>
  The prover that determines whether any column of the right-hand
  table of a LEFT JOIN must be non-NULL in the WHERE clause is
  imperfect.  It sometimes returns a false negative.  In other words,
  it sometimes fails to reduce the strength of a LEFT JOIN when doing
  so was in fact possible.  For example, the prover does not know
  the [datetime() SQL function] will always return NULL if its first
  argument is NULL, and so it will not recognize that the LEFT JOIN
  in the following query could be strength-reduced:

<codeblock>
  SELECT urls.url
    FROM urls
    LEFT JOIN
      (SELECT *
        FROM (SELECT url_id AS uid, max(retrieval_time) AS rtime
                FROM lookups GROUP BY 1 ORDER BY 1)
        WHERE uid IN (358341,358341,358341)
      ) recent
      ON u.source_seed_id = recent.xyz OR u.url_id = recent.xyz
   WHERE
       DATETIME(recent.rtime) > DATETIME('now', '-5 days');
</codeblock>

<p>
  It is possible that future enhancements to prover might enable it
  to recognize that NULL inputs to certain built-in functions
  always result in a NULL answer.  But not all built-in
  functions have that property (for example [coalesce()]) and, of
  course, the prover will never be able to reason about
  [application-defined SQL functions].


<tcl>hd_fragment omitnoopjoin {omit-left-join optimization}</tcl>
<h1>The Omit LEFT JOIN Optimization</h1>

<p>
  Sometimes a LEFT JOIN can be completely omitted from a query without
  changing the result.  This can happen if all of the following are
  true:

<p>
  <ol>
  <li> The query is not an aggregate
  <li> Either the query is DISTINCT or else the ON or USING clause
       on the LEFT JOIN must constraint the join such that it matches
       only a single row
  <li> The right-hand table of the LEFT JOIN must not be used anywhere
       in the query outside of its own USING or ON clause.
  </ol>

<p>
  LEFT JOIN elimination often comes up when LEFT JOINs are used
  inside of views, and then the view is used in such as way that
  none of the columns of the right-hand table of the LEFT JOIN are
  referenced.

<p>
  Here is an simple example of omitting a LEFT JOIN:

<codeblock>
  CREATE TABLE t1(ipk INTEGER PRIMARY KEY, v1);
  CREATE TABLE t2(ipk INTEGER PRIMARY KEY, v2);
  CREATE TABLE t3(ipk INTEGER PRIMARY KEY, v3);

  SELECT v1, v3 FROM t1 
    LEFT JOIN t2 USING (t1.ipk=t2.ipk)
    LEFT JOIN t3 USING (t1.ipk=t3.ipk)
</codeblock>

<p>
  The t2 table is completely unused in the query above, and so the
  query planner is able to implement the query as if it were written:

<codeblock>
  SELECT v1, v3 FROM t1 
    LEFT JOIN t3 USING (t1.ipk=t3.ipk)
</codeblock>