Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Minor enhancements and updates to various documents. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
9e12c649c9ba5aafb8145b387e97d48e |
User & Date: | drh 2018-11-24 20:24:59.829 |
Context
2018-11-24
| ||
20:58 | In the GeoPoly documentation, mention that geopoly_ccw() can be used to correct vertex order after geopoly_xform(). (check-in: bfc897c9c0 user: drh tags: trunk) | |
20:24 | Minor enhancements and updates to various documents. (check-in: 9e12c649c9 user: drh tags: trunk) | |
19:14 | Update the speed-and-size spreadsheet and the cpu.html page. Also make minor tweaks to the omitted.html page. (check-in: 555bf82e10 user: drh tags: trunk) | |
Changes
Changes to pages/amalgamation.in.
1 2 3 4 5 6 7 8 9 10 11 | <title>The SQLite Amalgamation</title> <tcl>hd_keywords {amalgamation} {the amalgamation}</tcl> <fancy_format> <h1>Executive Summary</h1> <p>Over 100 separate source files are concatenated into a single large files of C-code named "sqlite3.c" and called "the amalgamation". The amalgamation contains everything an application needs to embed SQLite. | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | <title>The SQLite Amalgamation</title> <tcl>hd_keywords {amalgamation} {the amalgamation}</tcl> <fancy_format> <h1>Executive Summary</h1> <p>Over 100 separate source files are concatenated into a single large files of C-code named "sqlite3.c" and called "the amalgamation". The amalgamation contains everything an application needs to embed SQLite. The amalgamation file is more than 220,000 lines long and over 7.5 megabytes in size (as of 2018-11-24). <p>Combining all the code for SQLite into one big file makes SQLite easier to deploy — there is just one file to keep track of. And because all code is in a single translation unit, compilers can do better inter-procedure optimization resulting in machine code that is between 5% and 10% faster. |
︙ | ︙ |
Changes to pages/arch.in.
︙ | ︙ | |||
219 220 221 222 223 224 225 | Memory allocation, caseless string comparison routines, portable text-to-number conversion routines, and other utilities are located in <file>util.c</file>. Symbol tables used by the parser are maintained by hash tables found in <file>hash.c</file>. The <file>utf.c</file> source file contains Unicode conversion subroutines. SQLite has its own private implementation of | | | 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 | Memory allocation, caseless string comparison routines, portable text-to-number conversion routines, and other utilities are located in <file>util.c</file>. Symbol tables used by the parser are maintained by hash tables found in <file>hash.c</file>. The <file>utf.c</file> source file contains Unicode conversion subroutines. SQLite has its own private implementation of [built-in printf|printf()] (with some extensions) in <file>printf.c</file> and its own pseudo-random number generator (PRNG) in <file>random.c</file>. </p> <h1>Test Code</h1> <p> |
︙ | ︙ |
Changes to pages/optoverview.in.
︙ | ︙ | |||
20 21 22 23 24 25 26 | <p> Additional background information is available in the [indexing tutorial] document. <p> | > | | 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | <p> Additional background information is available in the [indexing tutorial] document. <p> With release 3.8.0 ([dateof:3.8.0]), the SQLite query planner was reimplemented as the [Next Generation Query Planner] or "NGQP". All of the features, techniques, and algorithms described in this document are applicable to both the pre-3.8.0 legacy query planner and to the NGQP. For further information on how the NGQP differs from the legacy query planner, see the [NGQP | detailed description of the NGQP]. |
︙ | ︙ | |||
1249 1250 1251 1252 1253 1254 1255 | CREATE TABLE t2(c,d); -- Insert many rows into both t1 and t2 SELECT * FROM t1, t2 WHERE a=c; </codeblock> <p> In the query above, if both t1 and t2 have approximately N rows, then without any indices the query will require O(N*N) time. On the other | | | | 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 | CREATE TABLE t2(c,d); -- Insert many rows into both t1 and t2 SELECT * FROM t1, t2 WHERE a=c; </codeblock> <p> In the query above, if both t1 and t2 have approximately N rows, then without any indices the query will require O(N*N) time. On the other hand, creating an index on table t2 requires O(NlogN) time and then use that index to evaluate the query requires an additional O(NlogN) time. In the absence of [ANALYZE] information, SQLite guesses that N is one million and hence it believes that constructing the automatic index will be the cheaper approach. <p> An automatic index might also be used for a subquery: <codeblock> CREATE TABLE t1(a,b); CREATE TABLE t2(c,d); -- Insert many rows into both t1 and t2 SELECT a, (SELECT d FROM t2 WHERE c=b) FROM t1; </codeblock> <p> In this example, the t2 table is used in a subquery to translate values of the t1.b column. If each table contains N rows, SQLite expects that the subquery will run N times, and hence it will believe it is faster to construct an automatic, transient index on t2 first and then use that index to satisfy the N instances of the subquery. <p> The automatic indexing capability can be disabled at run-time using the [automatic_index pragma]. Automatic indexing is turned on by default, but this can be changed so that automatic indexing is off by default using the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option. |
︙ | ︙ |
Changes to pages/queryplanner-ng.in.
︙ | ︙ | |||
36 37 38 39 40 41 42 | However, there may exist legacy applications that unknowingly depend on undefined and/or suboptimal behavior in the legacy query planner, and upgrading to the NGQP on those legacy applications could cause performance regressions. This risk is considered and a checklist is provided for reducing the risk and for fixing any issues that do arise.</p> <p>This document focuses on the NGQP. For a more general overview of the | | | > | | | 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | However, there may exist legacy applications that unknowingly depend on undefined and/or suboptimal behavior in the legacy query planner, and upgrading to the NGQP on those legacy applications could cause performance regressions. This risk is considered and a checklist is provided for reducing the risk and for fixing any issues that do arise.</p> <p>This document focuses on the NGQP. For a more general overview of the SQLite query planner that encompasses the entire history of SQLite, see the "[query planner | The SQLite Query Optimizer Overview]" and "[indexing | How Indexes Work]" documents.</p> <h1> Background</h1> <p>For simple queries against a single table with few indexes, there is usually an obvious choice for the best algorithm. But for larger and more complex queries, such as multi-way joins with many indexes and subqueries, there can be hundreds, thousands, or millions of reasonable algorithms for computing the result. The job of the query planner is to choose the single "best" query plan from this multitude of possibilities.</p> <p>Query planners are what make SQL database engines so amazingly useful and powerful. (This is true of all SQL database engines, not just SQLite.) |
︙ | ︙ | |||
81 82 83 84 85 86 87 | <h2> Query Planning In SQLite</h2> <p>SQLite computes joins using nested loops, one loop for each table in the join. (Additional loops might be inserted for IN and OR operators in the WHERE clause. SQLite considers those too, but for simplicity we will ignore them in this essay.) | | | | | | | 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | <h2> Query Planning In SQLite</h2> <p>SQLite computes joins using nested loops, one loop for each table in the join. (Additional loops might be inserted for IN and OR operators in the WHERE clause. SQLite considers those too, but for simplicity we will ignore them in this essay.) One or more indexes might be used on each loop to speed the search, or a loop might be a "full table scan" that reads every row in the table. Thus query planning decomposes into two subtasks:</p> <ol> <li> Picking the nested order of the various loops <li> Choosing good indexes for each loop </ol> <p>Picking the nesting order is generally the more challenging problem. Once the nesting order of the join is established, the choice of indexes for each loop is normally obvious.</p> <tcl>hd_fragment qpstab {query planner stability guarantee} QPSG</tcl> <h2> The SQLite Query Planner Stability Guarantee</h2> <p>When the Query Planner Stability Guarantee (QPSG) is enabled SQLite will always pick the same query plan for any given SQL statement as long as: <ol type="a"> <li>the database schema does not change in significant ways such as adding or dropping indexes,</li> <li>the ANALYZE command is not rerun, </li> <li>the same version of SQLite is used.</li> </ol> <p>The QPSG is disabled by default. It can be enabled at compile-time using the [SQLITE_ENABLE_QPSG] compile-time option, or at run-time by invoking [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_QPSG],1,0). <p>The QPSG means that if all of your queries run efficiently during testing, and if your application does not change the schema, then SQLite will not suddenly decide to start using a different query plan, possibly causing a performance problem, after your application is released to users. If your application works in the lab, it will continue working the same way after deployment.</p> <p>Enterprise-class client/server SQL database engines do not normally make this guarantee. In client/server SQL database engines, the server keeps track of statistics on the sizes of tables and on the quality of indexes and the query planner uses those statistics to help select the best plans. As content is added, deleted, or changed in the database, the statistics will evolve and may cause the query planner to begin using a different query plan for some particular query. Usually the new plan will be better for the evolving structure of the data. But sometimes the new query plan will cause a performance reduction. With a client/server database engine, there is typically a Database Administrator (DBA) on hand to deal with these |
︙ | ︙ | |||
214 215 216 217 218 219 220 | <h2> Complications</h2> <p>The presentation of the query planner problem above is a simplification. The costs are estimates. We cannot know what the true cost of running a loop is until we actually run the loop. SQLite makes guesses for the cost of running a loop based on | | | 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | <h2> Complications</h2> <p>The presentation of the query planner problem above is a simplification. The costs are estimates. We cannot know what the true cost of running a loop is until we actually run the loop. SQLite makes guesses for the cost of running a loop based on the availability of indexes and constraints found in the WHERE clause. These guesses are usually pretty good, but they can sometimes be off. Using the [ANALYZE] command to collect additional statistical information about the database can sometimes enable SQLite to make better guesses about the cost.</p> <p>The costs are comprised of multiple numbers, not a single number as shown in the graph. |
︙ | ︙ | |||
350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | <p>The initial implementation of NGQP chooses N=1 for simple queries, N=5 for two-way joins and N=10 for all joins with three or more tables. This formula for selecting N might change in subsequent releases.</p> <tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl> <h1> Hazards Of Upgrading To NGQP</h1> <p>For most applications, upgrading from the legacy query planner to the NGQP requires little thought or effort. Simply replace the older SQLite version with the newer version of SQLite and recompile and the application will run faster. There are no API changes nor modifications to compilation procedures.</p> <p>But as with any query planner change, upgrading to the NGQP does carry a small risk of introducing performance regressions. The problem here is not that the NGQP is incorrect or buggy or inferior to the legacy query | > > > > > > > | | | | | 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 | <p>The initial implementation of NGQP chooses N=1 for simple queries, N=5 for two-way joins and N=10 for all joins with three or more tables. This formula for selecting N might change in subsequent releases.</p> <tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl> <h1> Hazards Of Upgrading To NGQP</h1> <p><i>Update on 2018-11-24: This section was important when the NGQP was new. But five years have elapsed, the NGQP has been deployed successfully to billions of devices, and everyone has upgraded. The upgrade hazard has vanished. This section is retained for historical reference only. Modern reads can skip ahead to the [query planner checklist].</i> <p>For most applications, upgrading from the legacy query planner to the NGQP requires little thought or effort. Simply replace the older SQLite version with the newer version of SQLite and recompile and the application will run faster. There are no API changes nor modifications to compilation procedures.</p> <p>But as with any query planner change, upgrading to the NGQP does carry a small risk of introducing performance regressions. The problem here is not that the NGQP is incorrect or buggy or inferior to the legacy query planner. Given reliable information about the selectivity of indexes, the NGQP should always pick a plan that is as good or better than before. The problem is that some applications may be using low-quality and low-selectivity indexes without having run [ANALYZE]. The older query planners look at many fewer possible implementations for each query and so they may have stumbled over a good plan by stupid luck. The NGQP, on the other hand, looks at many more query plan possibilities, and it may choose a different query plan that works better in theory, assuming good indexes, but which gives a performance regression in practice, because of the shape of the data.</p> <p>Key points:</p> <ul> <li><p>The NGQP will always find an equal or better query plan, compared to prior query planners, as long as it has access to accurate [ANALYZE] data in the [SQLITE_STAT1] file.</p> <li><p>The NGQP will always find a good query plan as long as the schema does not contain indexes that have more than about 10 or 20 rows with the same value in the left-most column of the index.</p> </ul> <p>Not all applications meet these conditions. Fortunately, the NGQP will still usually find good query plans, even without these conditions. However, cases do arise (rarely) where performance regressions can occur.</p> |
︙ | ︙ | |||
519 520 521 522 523 524 525 | Intuitively, we humans understand that algorithm-1 is best. Each check-in is likely to have few children (one child is the most common case) and each child can be tested for the $trunk tag in logarithmic time. Indeed, algorithm-1 is the faster choice in practice. But the NGQP has no intuition. The NGQP must use hard math, and algorithm-2 is slightly better mathematically. This is because, in the absence of other information, | | | | | 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 | Intuitively, we humans understand that algorithm-1 is best. Each check-in is likely to have few children (one child is the most common case) and each child can be tested for the $trunk tag in logarithmic time. Indeed, algorithm-1 is the faster choice in practice. But the NGQP has no intuition. The NGQP must use hard math, and algorithm-2 is slightly better mathematically. This is because, in the absence of other information, the NGQP must assume that the indexes PLINK_I1 and TAGXREF_I1 are of equal quality and are equally selective. Algorithm-2 uses one field of the TAGXREF_I1 index and both fields of the PLINK_I1 index whereas algorithm-1 only uses the first field of each index. Since algorithm-2 uses more index material, the NGQP is correct to judge it to be the better algorithm. The scores are close and algorithm-2 just barely squeaks ahead of algorithm-1. But algorithm-2 really is the correct choice here. </p> <p> Unfortunately, algorithm-2 is slower than algorithm-1 in this application. </p> <p> The problem is that the indexes are not of equal quality. A check-in is likely to only have one child. So the first field of PLINK_I1 will usually narrow down the search to just a single row. But there are thousands and thousands check-ins tagged with "trunk", so the first field of TAGXREF_I1 will be of little help in narrowing down the search. </p> <p> The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this query, unless [ANALYZE] has been run on the database. The [ANALYZE] command gathers statistics on the quality of the various indexes and stores those statistics in [SQLITE_STAT1] table. Having access to this statistical information, the NGQP easily chooses algorithm-1 as the best algorithm, by a wide margin.</p> <p>Why didn't the legacy query planner choose algorithm-2? Easy: because the NN algorithm |
︙ | ︙ | |||
622 623 624 625 626 627 628 | <ol> <li><p><b>Don't panic!</b> Cases where the query planner picks an inferior plan are actually quite rare. You are unlikely to run across any problems in your application. If you are not having performance issues, you do not need to worry about any of this.</p> | | | | | | | | | | 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 | <ol> <li><p><b>Don't panic!</b> Cases where the query planner picks an inferior plan are actually quite rare. You are unlikely to run across any problems in your application. If you are not having performance issues, you do not need to worry about any of this.</p> <li><p><b>Create appropriate indexes.</b> Most SQL performance problems arise not because of query planner issues but rather due to lack of appropriate indexes. Make sure indexes are available to assist all large queries. Most performance issues can be resolved by one or two CREATE INDEX commands and with no changes to application code.</p> <li><p><b>Avoid creating low-quality indexes.</b>. A low-quality index (for the purpose of this checklist) is one where there are more than 10 or 20 rows in the table that have the same value for the left-most column of the index. In particular, avoid using boolean or "enum" columns as the left-most columns of your indexes.</p> <p>The Fossil performance problem described in the previous section of this document arose because there were over ten-thousand entries in the TAGXREF table with the same value for the left-most column (the TAGID column) of the TAGXREF_I1 index.</p> <li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b> Low-quality indexes will not confuse the query planner as long as the query planner knows that the indexes are of low quality. And the way the query planner knows this is by the content of the [SQLITE_STAT1] table, which is computed by the ANALYZE command.</p> <p>Of course, ANALYZE only works effectively if you have a significant amount of content in your database in the first place. When creating a new database that you expect to accumulate a lot of data, you can run the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table, then prepopulate the SQLITE_STAT1 table (using ordinary INSERT statements) with content that describes a typical database for your application - perhaps content that you extracted after running ANALYZE on a well-populated template database in the lab.</p> <li><p><b>Instrument your code.</b> Add logic that lets you know quickly and easily which queries are taking too much time. Then work on just those specific queries.</p> <li><p><b>Use [unlikely()] and [likelihood()] SQL functions.</b> SQLite normally assumes that terms in the WHERE clause that cannot be used by indexes have a strong probability of being true. If this assumption is incorrect, it could lead to a suboptimal query plan. The [unlikely()] and [likelihood()] SQL functions can be used to provide hints to the query planner about WHERE clause terms that are probably not true, and thus aid the query planner in selecting the best possible plan. <li><p><b>Use the [CROSS JOIN] syntax to enforce a particular loop nesting order on queries that might use low-quality indexes in an unanalyzed database.</b> SQLite [treats the CROSS JOIN operator specially], forcing the table to the left to be an outer loop relative to the table on the right.</p> <p>Avoid this step if possible, as it defeats one of the huge advantages of the whole SQL language concept, specifically that the application programmer does not need to get involved with query planning. If you |
︙ | ︙ | |||
696 697 698 699 700 701 702 | Avoid using this trick if at all possible, and especially avoid it early in the application development cycle. Beware that adding a unary "+" operator to an equality expression might change the result of that expression if <a href="datatype3.html#affinity">type affinity</a> is involved.</p> <li><p><b>Use the [INDEXED BY] syntax to enforce the selection of | | | 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 | Avoid using this trick if at all possible, and especially avoid it early in the application development cycle. Beware that adding a unary "+" operator to an equality expression might change the result of that expression if <a href="datatype3.html#affinity">type affinity</a> is involved.</p> <li><p><b>Use the [INDEXED BY] syntax to enforce the selection of particular indexes on problem queries.</b> As with the previous two bullets, avoid this step if possible, and especially avoid doing this early in development as it is clearly a premature optimization.</p> </ol> <h1> Summary</h1> |
︙ | ︙ |