Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Further improvement and expansion of the NGQP documentation, including adding a checklist for preventing and dealing with bad choices by the query planner. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
828ca81aaeb7d4c32381e17aefe08b83 |
User & Date: | drh 2013-06-25 16:31:40.016 |
Context
2013-06-26
| ||
14:35 | Updates to the changes.html page. Fix typos on the NGQP documentation. (check-in: e0c107dec3 user: drh tags: trunk) | |
2013-06-25
| ||
16:31 | Further improvement and expansion of the NGQP documentation, including adding a checklist for preventing and dealing with bad choices by the query planner. (check-in: 828ca81aae user: drh tags: trunk) | |
01:44 | Second draft of the NGQP document. (check-in: e09abae6c3 user: drh tags: trunk) | |
Changes
Changes to pages/queryplanner-ng.in.
︙ | ︙ | |||
12 13 14 15 16 17 18 | hd_puts [string trimright $txt]\n hd_puts "</pre></table></center>\n" } hd_keywords {next generation query planner} </tcl> <h1 align="center">The Next Generation Query Planner</h1> | | > | | | | | | > | | | | | | | 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | hd_puts [string trimright $txt]\n hd_puts "</pre></table></center>\n" } hd_keywords {next generation query planner} </tcl> <h1 align="center">The Next Generation Query Planner</h1> <h2>1.0 Introduction</h2> <p> The task of the "query planner" is to figure out the best algorithm or "query plan" to accomplish an SQL statement. Beginning with SQLite version 3.8.0, the query planner component has been rewritten so that it runs faster and generates better plans. The rewrite is called the "next generation query planner" or "NGQP". </p> <p>This article overviews the importance of query planning, describes some of the problems inherent to query planning, and outlines how the NGQP solves those problems.</p> <p>The NGQP is almost always better than the legacy query planner. However, there may exist legacy applications that unknowingly depend on undefined and/or suboptimal behavior in the legacy query planner, and upgrading to the NGQP on those legacy applications could cause performance regressions. This risk is considered and a checklist is provided for reducing the risk and fixing any issues that do arise.</p> <h2>2.0 Background</h2> <p>For simple queries against a single table with few indices, there is usually an obvious choice for the best algorithm. But for larger and more complex queries, such as multi-way joins with many indices and subqueries, there can be hundreds, thousands, or millions of reasonable algorithms for computing the result. The job of the query planner is to chose a single "best" query plan from this multitude of possibilities.</p> <p>The existence of a query planner is what makes SQL database engines so amazingly useful and powerful. The query planner frees the programmer from the chore of selecting a particular query plan, and thereby allows the programmer to focus more mental energy on higher-level application issues and on providing more value to the end user. For simple queries where the choice of query plans is obvious, this is not of huge importance. But as applications and schemas and queries grow more complex, a clever query planner can greatly speed and simplify the work of application development. There is amazing power in being about to tell the database engine what content is desired, and then let the database engine to work out the best way to retrieve that content.</p> <p>Writing a good query planner is an inexact science. The query planner must work with incomplete information. It cannot determine how long any particular plan will take without actually running that plan. So when comparing two or more plans to figure out which is "best", the query planner has to make some guesses and assumption and those guesses and assumptions will sometimes be wrong. A good query plan is one that will find the correct solution often enough that application programmers only rarely need to get involved.</p> <h3>2.1 Query Planning In SQLite</h3> <p>SQLite computes joins using nested loops, one loop for each table in the join. (Additional loops might be inserted to handle IN operators in the WHERE clause, but those extra loops are not considered here.) One or more indices might be used on each loop to speed the search, or a loop might be a "full table scan" using only the original table. Thus query planning decomposes into two subtasks:</p> <ol> |
︙ | ︙ | |||
104 105 106 107 108 109 110 | then SQLite will not suddenly decide to start using a new and different query plan once your application is in the field, and thereby possibly causing a performance problem. If your application works in the lab, it will continue working the same way after deployment.</p> <p>Enterprise-class client/server SQL database engines do not normally make this guarantee. | | | | | | | 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | then SQLite will not suddenly decide to start using a new and different query plan once your application is in the field, and thereby possibly causing a performance problem. If your application works in the lab, it will continue working the same way after deployment.</p> <p>Enterprise-class client/server SQL database engines do not normally make this guarantee. In client/server SQL database engines, the server keeps track of statistics on the sizes of tables and on the quality of indices and the query planner uses those statistics to help select the best plans. As content is added, deleted, or changed in the database, the statistics will evolve and may cause the query planner to begin using a different query plan for some particular query. Usually the new plan will be better for the evolving structure of the data. But sometimes the new query plan will cause a performance reduction. With a client/server database engine, there is typically a Database Administrator (DBA) on hand to deal with these rare problems as they come up. But DBAs are not available to fix problems in an embedded database like SQLite, and hence SQLite is careful to ensure that plans do not change unexpectedly after deployment.</p> <p>This stability guarantee applies equally to the legacy query planner in SQLite and to the NGQP.</p> <p>It is important to note that changing versions of SQLite might cause changes in query plans, as new enhancements are added to the query planner. The same version of SQLite it will |
︙ | ︙ | |||
164 165 166 167 168 169 170 | <img src="images/qp/tpchq8.gif"> </center> <p> In the diagram, each of the 8 tables in the FROM clause of the query is identified by a large circle with the label of the FROM-clause term: N2, S, L, P, O, C, N1 and R. | | | | 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | <img src="images/qp/tpchq8.gif"> </center> <p> In the diagram, each of the 8 tables in the FROM clause of the query is identified by a large circle with the label of the FROM-clause term: N2, S, L, P, O, C, N1 and R. The arcs in the graph represent the estimated cost of computing each term assuming that the origin of the arc is in an outer loop. For example, the cost of running the S loop as an inner loop to L is 2.30 whereas the cost of running the S loop as an outer loop to L is 9.17.</p> <p>The "cost" here is logarithmic. With nested loops, the work is multiplied, not added. But it customary to think of graphs with additive weights and so the graph shows the logarithm of the various costs. The graph shows a cost advantage of S being inside of |
︙ | ︙ | |||
188 189 190 191 192 193 194 | result. One can think of the *-costs as a short-hand notation indicating multiple arcs with that cost, one each from each of the other nodes in the graph. The graph is is therefore "complete", meaning that there are arcs (some explicit and some implied) in both directions between every pair of nodes in the graph.</p> <p>The problem of finding the best query plan is equivalent to finding | | > > > > | 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | result. One can think of the *-costs as a short-hand notation indicating multiple arcs with that cost, one each from each of the other nodes in the graph. The graph is is therefore "complete", meaning that there are arcs (some explicit and some implied) in both directions between every pair of nodes in the graph.</p> <p>The problem of finding the best query plan is equivalent to finding a minimum-cost path through the graph that visits each node exactly once.</p> <p>(Side note: The costs estimates in the TPC-H Q8 graph where computed by the query planner in SQLite 3.7.16 and converted using a base-10 logarithm.) </p> <h3>3.2 Complications</h3> <p>The presentation of the query planner problem above is a simplification. The costs are estimates. We cannot know what the true cost of running a loop is until we actually run the loop. SQLite makes guesses for the cost of running a loop based on |
︙ | ︙ | |||
257 258 259 260 261 262 263 | <p>Unfortunately, the query plan computed by NN for TPC-H Q8 is not optimal. The plan computed using NN is R-N1-N2-S-C-O-L-P with a cost of 36.92. The notation in the previous sentence means that the R table is run in the outer loop, N1 is in the next inner loop, N2 is in the third loop, and so forth down to P which is in the inner-most loop. The shortest path through the graph (as found via exhaustive search) is P-L-O-C-N1-R-S-N2 | | > | | | | | 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 | <p>Unfortunately, the query plan computed by NN for TPC-H Q8 is not optimal. The plan computed using NN is R-N1-N2-S-C-O-L-P with a cost of 36.92. The notation in the previous sentence means that the R table is run in the outer loop, N1 is in the next inner loop, N2 is in the third loop, and so forth down to P which is in the inner-most loop. The shortest path through the graph (as found via exhaustive search) is P-L-O-C-N1-R-S-N2 with a cost of 27.38. The difference might not seem like much, but remember that the costs are logarithmic, so the shortest path is nearly 14,000 times faster than that path found using the NN heuristic.</p> <p>One solution to this problem is to change SQLite to do an exhaustive search for the best path. But an exhaustive search requires time proportional to K! (where K is the number of tables in the join) and so when you get beyond a 10-way join, the time to run [sqlite3_prepare()] becomes very large.</p> <h3>3.4 The N Nearest Neighbors or "N3" Heuristic</h3> <p>The NGQP uses a new heuristic for seeking the best path through the graph: "N Nearest Neighbors" (hereafter "N3"). With N3, instead of choosing just one nearest neighbor for each step, the algorithm keeps track of the N bests paths at each step for some small integer N.</p> <p>Suppose N=4. Then for the TPC-H Q8 graph, the first step finds the four shortest paths to visit any single node in the graph: <blockquote> R (cost: 3.56) <br> N1 (cost: 5.52) <br> N2 (cost: 5.52) <br> P (cost: 7.71) <br> |
︙ | ︙ | |||
312 313 314 315 316 317 318 | R-N2-S (cost: 15.08) <br> </blockquote> <p>And so forth. There are 8 nodes in the TPC-H Q8 query, so this process repeats a total of 8 times. In the general case of a K-way join, the storage requirements is O(N) and the computation time is O(K*N), which is significantly faster | | | > | > | | | | | | > > | | | > > | | | 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 | R-N2-S (cost: 15.08) <br> </blockquote> <p>And so forth. There are 8 nodes in the TPC-H Q8 query, so this process repeats a total of 8 times. In the general case of a K-way join, the storage requirements is O(N) and the computation time is O(K*N), which is significantly faster than the O(2<small><sup>K</sup></small>) exact solution.</p> <p>But what value to choose for N? One might try N=K. This makes the algorithm O(K<small><sup>2</sup></small>) which is actually still quite efficient, since the maximum value of K is 64 and rarely exceeds 10. But that is not enough for the TPC-H Q8 problem. With N=8 on TPC-H Q8 the N3 algorithm finds the solution R-N1-C-O-L-S-N2-P with a cost of 29.78. That is a big improvement over NN, but it is still not optimal. N3 finds the optimal solution for TPC-H Q8 when N is 10 or greater.</p> <p>The initial implementation of NGQP choses N=1 for simple queries, N=5 for two-way joins and N=10 for all joins with three or more tables. This formula for selecting N might change in subsequent releases.</p> <tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl> <h2>4.0 Hazards Of Upgrading To NGQP</h2> <p>For most applications, upgrading from the legacy query planner to the NGQP requires little thought or effort. Simply replace the older SQLite version with the newer version of SQLite and recompile and the application will run faster. There are no API changes nor modifications to compilation procedures.</p> <p>But as with any query planner change, upgrading to the NGQP does carry a small risk of introducing performance regressions. The problem here is not that the NGQP is incorrect or buggy or inferior to the legacy query planner. Given accurate cost estimates, the NGQP will always pick a better plan than older query planners. The problem is that cost estimates might sometimes be inaccurate and the legacy query planner just happened to stumbled over a good plan by stupid luck while the NGQP is not as lucky.</p> <h3>4.1 Case Study: Upgrading Fossil to the NGQP</h3> <p>The <a href="http://www.fossil-scm.org/">Fossil DVCS</a> is the version control system used to track all of the SQLite source code. A Fossil repository is an SQLite database file. (Readers are invited to ponder this recursion as an independent exercise.) |
︙ | ︙ | |||
412 413 414 415 416 417 418 | <p><ol> <li> The check-in has a "trunk" tag. <li> The check-in has a child that has a "trunk" tag. <li> The check-in has a parent that has a "trunk" tag. </ol></p> <p> | > > > | | | | | | | | < | | | | 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 | <p><ol> <li> The check-in has a "trunk" tag. <li> The check-in has a child that has a "trunk" tag. <li> The check-in has a parent that has a "trunk" tag. </ol></p> <p> The first condition causes all of the trunk check-ins to be displayed and the second and third cause check-ins that merge into or are derived from the trunk to also be included. The three conditions are implemented by the three OR-connected EXISTS statements in the WHERE clause of the query. The slowdown that occurred with the NGQP was caused by the second and third conditions. The problem is the same in each, so we will examine just the second conditions. The subquery of the second condition can be rewritten (with minor and immaterial simplifications) as follows:</p> <blockquote><pre> SELECT 1 FROM plink JOIN tagxref ON tagxref.rid=plink.cid WHERE tagxref.tagid=$trunk AND plink.pid=$ckid; </pre></blockquote> <p>The PLINK table holds parent-child relationships between check-ins. The TAGXREF table maps tags into check-ins. For reference, the relevant portions of the schemas for these two tables is shown here:</p> <blockquote><pre> CREATE TABLE plink( pid INTEGER REFERENCES blob, cid INTEGER REFERENCES blob ); CREATE UNIQUE INDEX plink_i1 ON plink(pid,cid); CREATE TABLE tagxref( tagid INTEGER REFERENCES tag, mtime TIMESTAMP, rid INTEGER REFERENCE blob, UNIQUE(rid, tagid) ); CREATE INDEX tagxref_i1 ON tagxref(tagid, mtime); </pre></blockquote> <p>There are only two reasonable ways to implement this query. (There are many other possible algorithms, but none of the others are contenders for being the "best" algorithm.)</p> <ol type="1"> <li value=1><p> Find all children of check-in $ckid and test each one to see if it has the $trunk tag. <li value=2><p> Find all check-ins with the $trunk tag and test each one to see if it is a child of $ckid. </ol> <p> Intuitively, we humans understand that algorithm-1 is best. Each check-in is likely to have few children (one child is the most common case) and each child can be tested for the $trunk tag in logarithmic time. Indeed, algorithm-1 is the faster choice in practice. But the NGQP has no intuition. The NGQP must use hard math, and algorithm-2 is slightly better mathematically. This is because, in the absence of other information, the NGQP must assume that the indices PLINK_I1 and TAGXREF_I1 are of equal quality and are equally selective. Algorithm-2 uses one field of the TAGXREF_I1 index and both fields of the PLINK_I1 index whereas |
︙ | ︙ | |||
487 488 489 490 491 492 493 | <p> Unfortunately, algorithm-2 is slower than algorithm-1 in this application. </p> <p> The problem is that the indices are not of equal quality. | | < | | | < < < | | | | | > | | > > | | > | > > > > > > > > > > | > | | | | > | | > > > > > > > > > > > > > > > > > > > | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 | <p> Unfortunately, algorithm-2 is slower than algorithm-1 in this application. </p> <p> The problem is that the indices are not of equal quality. A check-in is likely to only have one child. So the first field of PLINK_I1 will usually narrow down the search to just a single row. But there are thousands and thousands check-ins tagged with "trunk", so the first field of TAGXREF_I1 will be of little help in narrowing down the search. </p> <p> The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this query, unless [ANALYZE] has been run on the database. The [ANALYZE] command gathers statistics on the quality of the various indices and stores the result in SQLITE_STAT1 table. Having access to this statistical information, the NGQP easily choses algorithm-1 as the best algorithm, by a wide margin.</p> <p>Why didn't the legacy query planner choose algorithm-2? Easy: because the NN algorithm never even considered algorithm-2. Graphs of the planning problem look like this:</p> <center> <img src="images/qp/fqp1.gif"> </center> <p> In the "without ANALYZE" case on the left, the NN algorithm chooses loop P (PLINK) as the outer loop because 4.9 is less than 5.2, resulting in path P-T which is algorithm-1. NN only looks a the single best choice at each step so it completely misses the fact that 5.2+4.4 makes a slightly cheaper plan than 4.9+4.8. But the N3 algorithm keeps track of the 5 best paths for a 2-way join, so it ends up selecting path T-P because of its slightly lower overall cost. Path T-P is algorithm-2. </p> <p> Note that with ANALYZE the cost estimates are better aligned with reality and algorithm-1 is selected by both NN and N3. </p> <p>(Side note: The costs estimates in the two most recent graphs were computed by the NGQP using a base-2 logarithm and slightly different cost assumptions compared to the legacy query planner. Hence, the cost estimates in these latter two graphs are not directly comparible to the cost estimates in the TPC-H Q8 graph.)</p> <h3>4.2 Fixing The Problem</h3> <p>Running [ANALYZE] on the repository database immediately fixed the performance problem. However, we want Fossil to be robust and to always work quickly regardless of whether or not its repository has been analyzed. For this reason, the query was modify to use the CROSS JOIN operator instead of simply JOIN. (<a href="http://www.fossil-scm.org/fossil/fdiff?v1=c498d980579e18f8&v2=2a35f6ffed42e024&sbs=1">The difference</a>) SQLite will not reorder the tables of a CROSS JOIN. This is a long-standing feature of SQLite that is specifically designed to allow knowledgeable programmers to enforce a particular loop nesting order. Once the join was changed to CROSS JOIN (the addition of a single keyword) the NGQP was forced to chose the faster algorithm-1 regardless of whether or not statistical information had been gathered using ANALYZE.</p> <p>We say that algorithm-1 "faster", but this is not strictly true. Algorithm-1 is faster in common repositories, but it is possible to construct a repository in which every check-in is on a different uniquely-named branch and all check-ins are children of the root check-in. In that case, TAGXREF_I1 would become more selective than PLINK_I1 and algorithm-2 really would be the faster choice. However such repositories are very unlikely to appear in practice and so hard-coding the loop nested order using the CROSS JOIN syntax is a reasonable solution to the problem i nthis case.</p> <tcl>hd_fragment howtofix</tcl> <h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2> <ol> <li><p><b>Don't panic!</b> Cases where the query planner picks an inferior plan are actually quite rare. You are unlikely to run accross any problems in your application. If you are not having performance issues, the you do not need to worry about any of this.</p> <li><p><b>Create appropriate indices.</b> Most SQL performance problems arise not because of query planner issues but rather due to lack of appropriate indices. Make sure indices are available to assist all large queries. Most performance issues can be resolved by one or two CREATE INDEX commands and with no changes to application code.</p> <li><p><b>Avoid creating low-quality indices.</b>. A low-quality index (for the purpose of this checklist) is one where there are more than 10 or 20 rows in the table that have the same value for the left-most column of the index. In particular, avoid using boolean or "enum" columns as the left-most columns of your indices.</p> <p>The performance problem in Fossil described in the previous section came about because there were over ten thousands entries in the TAGXREF table with the same value for the left-most column (the TAGID column) of the TAGXREF_I1 index.</p> <li><p><b>If you must use a low-quality index, be sure to run ANALYZE.</b> Low-quality indices will not confuse the query planner as long as the query planner knows that the indices are of low quality. And the way the query planner knows this is by the content of the SQLITE_STAT1 table, normally computed by the ANALYZE command.</p> <p>Of course, ANALYZE only works effectively if you have a significant amount of content in your database in the first place. When creating a new database that you expect to accumulate a lot of data, you can run the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table, then prepopulate the SQLITE_STAT1 table (using ordinary INSERT statements) with content that describes a typical database for your application - perhaps content that you extracted after running ANALYZE on a well-populated template database in the lab.</p> <li><p><b>Instrument your code.</b> Add logic that lets you know quickly and easily which queries are taking too much time. Then work on just those specific queries.</p> <li><p><b>Use the CROSS JOIN syntax to enforce a particular loop nesting order on queries that might use low-quality indices in unanalyzed database.</b> SQLite treats the keyword CROSS JOIN specially, forcing the table to the left to be an an outer loop relative to the table on the right.</p> <p>Avoid this step if possible, as it defeats one of the huge advantages of the whole SQL language concept, specifically that the application programmer does not need to get involved with query planning. If you do use CROSS JOIN, wait until late in your development cycle to do so, and comment the use of CROSS JOIN carefully so that you can take it out later if possible. Avoid using CROSS JOIN early in the development cycle as doing so is a premature optimization, which is well known to be <a href="http://c2.com/cgi/wiki?PrematureOptimization">the root of all evil</a>.</p> <li><p><b>Use the [INDEXED BY] syntax to enforce the selection of particular indices on problem queries.</b> As with the previous bullet, avoid this step if possible, and especially avoid doing this early in development as it is clearly a premature optimization.</p> </ol> <h2>6.0 Summary</h2> <p>The query planner in SQLite normally does a terrific job of selecting fast algorithms for running your SQL statements. This is true of the legacy query planner and even more true of the new NGQP. There may be an occasional situation where, due to incomplete information, the query planner selects a suboptimal plan. This will happen less often with the NGQP than with the legacy query planner, but it might still happen. Only in those rare cases do application developers need to get involved and help the query planner to do the right thing. In the common case, the NGQP is just a new enhancement to SQLite that makes the application run a little faster and which requires no new developer though or action.</p> |