Documentation Source Text

Check-in [219fa13637]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Further enhancement to the optimizer overview document, giving the change log for 3.23.0 something to link to.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 219fa1363791b94165b75b0a15c2705a7b31b59827a0de82885e3a1152754c6c
User & Date: drh 2018-03-22 18:55:29
Context
2018-03-23
11:47
Fix typos. Clarification to the CoC in response to criticism. check-in: 79f3efbc86 user: drh tags: trunk
2018-03-22
18:55
Further enhancement to the optimizer overview document, giving the change log for 3.23.0 something to link to. check-in: 219fa13637 user: drh tags: trunk
13:36
Refactor the optoverview.html document to be written in HTML with occasional <tcl> tags, rather than in pure TCL, so that it works better with fancy-format. check-in: 69fd8c5ba9 user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/changes.in.

    36     36   <li> If the [xColumn] method in a [virtual table] implementation returns
    37     37        an error message using [sqlite3_result_error()] then give that error
    38     38        message preference over internally-generated messages.
    39     39   <li> Added the -A command-line option to the [CLI] to make it easier to manage
    40     40        [SQLite Archive files].
    41     41   <li> Query optimizer enhancements:
    42     42   <ol type='a'>
    43         -  <li> Improve the omit-left-join optimization so that it works in cases where
           43  +  <li> Improve the [omit-left-join optimization] so that it works in cases where
    44     44          the right-hand table is UNIQUE but not necessarily NOT NULL.
    45         -  <li> Improve the push-down optimization so that works for many LEFT JOINs.
    46         -  <li> Add the left join strength reduction optimization that converts a LEFT
           45  +  <li> Improve the [push-down optimization] so that works for many LEFT JOINs.
           46  +  <li> Add the [LEFT JOIN strength reduction optimization] that converts a LEFT
    47     47          JOIN into an ordinary JOIN if there exist terms in the WHERE clause
    48     48          that would prevent the extra all-NULL row of the LEFT JOIN from
    49     49          appearing in the output set.
    50     50     <li> Avoid unnecessary writes to the sqlite_sequence table when an
    51         -       AUTOINCREMENT table is updated with an rowid that is less than the
           51  +       [AUTOINCREMENT] table is updated with an rowid that is less than the
    52     52          maximum.
    53     53   </ol>
    54     54   <li> Bug fixes:
    55     55   <ol type='a'>
    56     56     <li> Fix the parser to accept valid [row value] syntax.
    57     57          Ticket [https://www.sqlite.org/src/info/7310e2fb3d046a5|7310e2fb3d046a5]
    58     58     <li> Fix the query planner so that it takes into account dependencies in

Changes to pages/optoverview.in.

   980    980     the ORDER BY are compared against the previous row.  If they have
   981    981     changed, the current sort is finished and output and a new sort is
   982    982     started.  This results in a slightly faster sort.  But the bigger
   983    983     advantages are that many fewer rows need to be held in memory,
   984    984     reducing memory requirements, and outputs can begin to appear before
   985    985     the core query has run to completion.
   986    986   
   987         -<tcl>hd_fragment flattening {flattening optimization} {query flattener}</tcl>
          987  +<tcl>hd_fragment flattening {flattening optimization} \
          988  +   {query flattener} {flattened}</tcl>
   988    989   <h1>Subquery flattening</h1>
   989    990   
   990    991   <p>
   991    992     When a subquery occurs in the FROM clause of a SELECT, the simplest
   992    993     behavior is to evaluate the subquery into a transient table, then run
   993    994     the outer SELECT against the transient table.  But such a plan
   994    995     can be suboptimal since the transient table will not have any indices
................................................................................
  1296   1297     The automatic indexes described here exist only for the duration of a
  1297   1298     single query, are never persisted to disk, and are only visible to a
  1298   1299     single database connection.  Internal indexes are part of the implementation
  1299   1300     of PRIMARY KEY and UNIQUE constraints, are long-lasting and persisted
  1300   1301     to disk, and are visible to all database connections.  The term "autoindex"
  1301   1302     appears in the names of [internal indexes] for legacy reasons and does
  1302   1303     not indicate that internal indexes and automatic indexes are related.
         1304  +
         1305  +<tcl>hd_fragment pushdown {push-down optimization}</tcl>
         1306  +<h1>The Push-Down Optimization</h1>
         1307  +
         1308  +<p>
         1309  +  If a subquery cannot be [flattened] into the outer query, it might
         1310  +  still be possible to enhance performance by "pushing down" WHERE clause
         1311  +  terms from the outer query into the subquery.  Consider an example:
         1312  +
         1313  +<codeblock>
         1314  +  CREATE TABLE t1(a INT, b INT);
         1315  +  CREATE TABLE t2(x INT, y INT);
         1316  +  CREATE VIEW v1(a,b) AS SELECT DISTINCT a, b FROM t1;
         1317  +
         1318  +  SELECT x, y, b
         1319  +    FROM t2 JOIN v1 ON (x=a)
         1320  +   WHERE b BETWEEN 10 AND 20;
         1321  +</codeblock>
         1322  +
         1323  +<p>
         1324  +  The view v1 cannot be [flattened] because it is DISTINCT.  It must
         1325  +  instead be run as a subquery with the results being stored in a
         1326  +  transient table, then the join is performed between t2 and the
         1327  +  transient table.  The push-down optimization pushes down the
         1328  +  "b BETWEEN 10 AND 20" term into the view in order to make the transient
         1329  +  table smaller, or (if there were an index on t1.b) help the subquery
         1330  +  to run faster.  The resulting evaluation is like this:
         1331  +
         1332  +<codeblock>
         1333  +  SELECT x, y, b
         1334  +    FROM t2
         1335  +    JOIN (SELECT DISTINCT a, b FROM t1 WHERE b BETWEEN 10 AND 20)
         1336  +   WHERE b BETWEEN 10 AND 10;
         1337  +</codeblock>
         1338  +
         1339  +<p>
         1340  +  The push-down optimization cannot always be used.  For example,
         1341  +  if the subquery contains a LIMIT, then pushing down any part of
         1342  +  the WHERE clause from the outer query could change the result of
         1343  +  the inner query.  There are other restrictions, explained in a
         1344  +  comment in the source code on the pushDownWhereTerms() routine
         1345  +  that implements this optimization.
         1346  +
         1347  +<tcl>hd_fragment leftjoinreduction \
         1348  +   {LEFT JOIN strength reduction optimization}</tcl>
         1349  +<h1>The LEFT JOIN Strength Reduction Optimization</h1>
         1350  +
         1351  +<p>
         1352  +  Sometimes a LEFT JOIN can be converted into an ordinary JOIN,
         1353  +  if there are terms in the WHERE clause that guarantee that the
         1354  +  two joins will give identical results.  In particular, if any
         1355  +  column in the right-hand table of the LEFT JOIN must be non-NULL
         1356  +  in order for the WHERE clause to be true, then the LEFT JOIN is
         1357  +  demoted to an ordinary JOIN.
         1358  +
         1359  +<p>
         1360  +  The prover that determines whether any column of the right-hand
         1361  +  table of a LEFT JOIN must be non-NULL in the WHERE clause is
         1362  +  imperfect.  It sometimes returns a false negative.  In other words,
         1363  +  it sometimes fails to reduce the strength of a LEFT JOIN when doing
         1364  +  so was in fact possible.  For example, the prover does not know
         1365  +  the [datetime() SQL function] will always return NULL if its first
         1366  +  argument is NULL, and so it will not recognize that the LEFT JOIN
         1367  +  in the following query could be strength-reduced:
         1368  +
         1369  +<codeblock>
         1370  +  SELECT urls.url
         1371  +    FROM urls
         1372  +    LEFT JOIN
         1373  +      (SELECT *
         1374  +        FROM (SELECT url_id AS uid, max(retrieval_time) AS rtime
         1375  +                FROM lookups GROUP BY 1 ORDER BY 1)
         1376  +        WHERE uid IN (358341,358341,358341)
         1377  +      ) recent
         1378  +      ON u.source_seed_id = recent.xyz OR u.url_id = recent.xyz
         1379  +   WHERE
         1380  +       DATETIME(recent.rtime) > DATETIME('now', '-5 days');
         1381  +</codeblock>
         1382  +
         1383  +<p>
         1384  +  It is possible that future enhancements to prover might enable it
         1385  +  to recognize that NULL inputs to certain built-in functions
         1386  +  always result in a NULL answer.  But not all built-in
         1387  +  functions have that property (for example [coalesce()]) and, of
         1388  +  course, the prover will never be able to reason about
         1389  +  [application-defined SQL functions].
         1390  +
         1391  +
         1392  +<tcl>hd_fragment omitnoopjoin {omit-left-join optimization}</tcl>
         1393  +<h1>The Omit LEFT JOIN Optimization</h1>
         1394  +
         1395  +<p>
         1396  +  Sometimes a LEFT JOIN can be completely omitted from a query without
         1397  +  changing the result.  This can happen if all of the following are
         1398  +  true:
         1399  +
         1400  +<p>
         1401  +  <ol>
         1402  +  <li> The query is not an aggregate
         1403  +  <li> Either the query is DISTINCT or else the ON or USING clause
         1404  +       on the LEFT JOIN must constraint the join such that it matches
         1405  +       only a single row
         1406  +  <li> The right-hand table of the LEFT JOIN must not be used anywhere
         1407  +       in the query outside of its own USING or ON clause.
         1408  +  </ol>
         1409  +
         1410  +<p>
         1411  +  LEFT JOIN elimination often comes up when LEFT JOINs are used
         1412  +  inside of views, and then the view is used in such as way that
         1413  +  none of the columns of the right-hand table of the LEFT JOIN are
         1414  +  referenced.
         1415  +
         1416  +<p>
         1417  +  Here is an simple example of omitting a LEFT JOIN:
         1418  +
         1419  +<codeblock>
         1420  +  CREATE TABLE t1(ipk INTEGER PRIMARY KEY, v1);
         1421  +  CREATE TABLE t2(ipk INTEGER PRIMARY KEY, v2);
         1422  +  CREATE TABLE t3(ipk INTEGER PRIMARY KEY, v3);
         1423  +
         1424  +  SELECT v1, v3 FROM t1 
         1425  +    LEFT JOIN t2 USING (t1.ipk=t2.ipk)
         1426  +    LEFT JOIN t3 USING (t1.ipk=t3.ipk)
         1427  +</codeblock>
         1428  +
         1429  +<p>
         1430  +  The t2 table is completely unused in the query above, and so the
         1431  +  query planner is able to implement the query as if it were written:
         1432  +
         1433  +<codeblock>
         1434  +  SELECT v1, v3 FROM t1 
         1435  +    LEFT JOIN t3 USING (t1.ipk=t3.ipk)
         1436  +</codeblock>