Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Further enhancement to the optimizer overview document, giving the change log for 3.23.0 something to link to. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
219fa1363791b94165b75b0a15c2705a |
User & Date: | drh 2018-03-22 18:55:29.425 |
Context
2018-03-23
| ||
11:47 | Fix typos. Clarification to the CoC in response to criticism. (check-in: 79f3efbc86 user: drh tags: trunk) | |
2018-03-22
| ||
18:55 | Further enhancement to the optimizer overview document, giving the change log for 3.23.0 something to link to. (check-in: 219fa13637 user: drh tags: trunk) | |
13:36 | Refactor the optoverview.html document to be written in HTML with occasional <tcl> tags, rather than in pure TCL, so that it works better with fancy-format. (check-in: 69fd8c5ba9 user: drh tags: trunk) | |
Changes
Changes to pages/changes.in.
︙ | ︙ | |||
36 37 38 39 40 41 42 | <li> If the [xColumn] method in a [virtual table] implementation returns an error message using [sqlite3_result_error()] then give that error message preference over internally-generated messages. <li> Added the -A command-line option to the [CLI] to make it easier to manage [SQLite Archive files]. <li> Query optimizer enhancements: <ol type='a'> | | | | | | 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | <li> If the [xColumn] method in a [virtual table] implementation returns an error message using [sqlite3_result_error()] then give that error message preference over internally-generated messages. <li> Added the -A command-line option to the [CLI] to make it easier to manage [SQLite Archive files]. <li> Query optimizer enhancements: <ol type='a'> <li> Improve the [omit-left-join optimization] so that it works in cases where the right-hand table is UNIQUE but not necessarily NOT NULL. <li> Improve the [push-down optimization] so that works for many LEFT JOINs. <li> Add the [LEFT JOIN strength reduction optimization] that converts a LEFT JOIN into an ordinary JOIN if there exist terms in the WHERE clause that would prevent the extra all-NULL row of the LEFT JOIN from appearing in the output set. <li> Avoid unnecessary writes to the sqlite_sequence table when an [AUTOINCREMENT] table is updated with an rowid that is less than the maximum. </ol> <li> Bug fixes: <ol type='a'> <li> Fix the parser to accept valid [row value] syntax. Ticket [https://www.sqlite.org/src/info/7310e2fb3d046a5|7310e2fb3d046a5] <li> Fix the query planner so that it takes into account dependencies in |
︙ | ︙ |
Changes to pages/optoverview.in.
︙ | ︙ | |||
980 981 982 983 984 985 986 | the ORDER BY are compared against the previous row. If they have changed, the current sort is finished and output and a new sort is started. This results in a slightly faster sort. But the bigger advantages are that many fewer rows need to be held in memory, reducing memory requirements, and outputs can begin to appear before the core query has run to completion. | | > | 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 | the ORDER BY are compared against the previous row. If they have changed, the current sort is finished and output and a new sort is started. This results in a slightly faster sort. But the bigger advantages are that many fewer rows need to be held in memory, reducing memory requirements, and outputs can begin to appear before the core query has run to completion. <tcl>hd_fragment flattening {flattening optimization} \ {query flattener} {flattened}</tcl> <h1>Subquery flattening</h1> <p> When a subquery occurs in the FROM clause of a SELECT, the simplest behavior is to evaluate the subquery into a transient table, then run the outer SELECT against the transient table. But such a plan can be suboptimal since the transient table will not have any indices |
︙ | ︙ | |||
1296 1297 1298 1299 1300 1301 1302 | The automatic indexes described here exist only for the duration of a single query, are never persisted to disk, and are only visible to a single database connection. Internal indexes are part of the implementation of PRIMARY KEY and UNIQUE constraints, are long-lasting and persisted to disk, and are visible to all database connections. The term "autoindex" appears in the names of [internal indexes] for legacy reasons and does not indicate that internal indexes and automatic indexes are related. | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 | The automatic indexes described here exist only for the duration of a single query, are never persisted to disk, and are only visible to a single database connection. Internal indexes are part of the implementation of PRIMARY KEY and UNIQUE constraints, are long-lasting and persisted to disk, and are visible to all database connections. The term "autoindex" appears in the names of [internal indexes] for legacy reasons and does not indicate that internal indexes and automatic indexes are related. <tcl>hd_fragment pushdown {push-down optimization}</tcl> <h1>The Push-Down Optimization</h1> <p> If a subquery cannot be [flattened] into the outer query, it might still be possible to enhance performance by "pushing down" WHERE clause terms from the outer query into the subquery. Consider an example: <codeblock> CREATE TABLE t1(a INT, b INT); CREATE TABLE t2(x INT, y INT); CREATE VIEW v1(a,b) AS SELECT DISTINCT a, b FROM t1; SELECT x, y, b FROM t2 JOIN v1 ON (x=a) WHERE b BETWEEN 10 AND 20; </codeblock> <p> The view v1 cannot be [flattened] because it is DISTINCT. It must instead be run as a subquery with the results being stored in a transient table, then the join is performed between t2 and the transient table. The push-down optimization pushes down the "b BETWEEN 10 AND 20" term into the view in order to make the transient table smaller, or (if there were an index on t1.b) help the subquery to run faster. The resulting evaluation is like this: <codeblock> SELECT x, y, b FROM t2 JOIN (SELECT DISTINCT a, b FROM t1 WHERE b BETWEEN 10 AND 20) WHERE b BETWEEN 10 AND 10; </codeblock> <p> The push-down optimization cannot always be used. For example, if the subquery contains a LIMIT, then pushing down any part of the WHERE clause from the outer query could change the result of the inner query. There are other restrictions, explained in a comment in the source code on the pushDownWhereTerms() routine that implements this optimization. <tcl>hd_fragment leftjoinreduction \ {LEFT JOIN strength reduction optimization}</tcl> <h1>The LEFT JOIN Strength Reduction Optimization</h1> <p> Sometimes a LEFT JOIN can be converted into an ordinary JOIN, if there are terms in the WHERE clause that guarantee that the two joins will give identical results. In particular, if any column in the right-hand table of the LEFT JOIN must be non-NULL in order for the WHERE clause to be true, then the LEFT JOIN is demoted to an ordinary JOIN. <p> The prover that determines whether any column of the right-hand table of a LEFT JOIN must be non-NULL in the WHERE clause is imperfect. It sometimes returns a false negative. In other words, it sometimes fails to reduce the strength of a LEFT JOIN when doing so was in fact possible. For example, the prover does not know the [datetime() SQL function] will always return NULL if its first argument is NULL, and so it will not recognize that the LEFT JOIN in the following query could be strength-reduced: <codeblock> SELECT urls.url FROM urls LEFT JOIN (SELECT * FROM (SELECT url_id AS uid, max(retrieval_time) AS rtime FROM lookups GROUP BY 1 ORDER BY 1) WHERE uid IN (358341,358341,358341) ) recent ON u.source_seed_id = recent.xyz OR u.url_id = recent.xyz WHERE DATETIME(recent.rtime) > DATETIME('now', '-5 days'); </codeblock> <p> It is possible that future enhancements to prover might enable it to recognize that NULL inputs to certain built-in functions always result in a NULL answer. But not all built-in functions have that property (for example [coalesce()]) and, of course, the prover will never be able to reason about [application-defined SQL functions]. <tcl>hd_fragment omitnoopjoin {omit-left-join optimization}</tcl> <h1>The Omit LEFT JOIN Optimization</h1> <p> Sometimes a LEFT JOIN can be completely omitted from a query without changing the result. This can happen if all of the following are true: <p> <ol> <li> The query is not an aggregate <li> Either the query is DISTINCT or else the ON or USING clause on the LEFT JOIN must constraint the join such that it matches only a single row <li> The right-hand table of the LEFT JOIN must not be used anywhere in the query outside of its own USING or ON clause. </ol> <p> LEFT JOIN elimination often comes up when LEFT JOINs are used inside of views, and then the view is used in such as way that none of the columns of the right-hand table of the LEFT JOIN are referenced. <p> Here is an simple example of omitting a LEFT JOIN: <codeblock> CREATE TABLE t1(ipk INTEGER PRIMARY KEY, v1); CREATE TABLE t2(ipk INTEGER PRIMARY KEY, v2); CREATE TABLE t3(ipk INTEGER PRIMARY KEY, v3); SELECT v1, v3 FROM t1 LEFT JOIN t2 USING (t1.ipk=t2.ipk) LEFT JOIN t3 USING (t1.ipk=t3.ipk) </codeblock> <p> The t2 table is completely unused in the query above, and so the query planner is able to implement the query as if it were written: <codeblock> SELECT v1, v3 FROM t1 LEFT JOIN t3 USING (t1.ipk=t3.ipk) </codeblock> |