Documentation Source Text

Check-in [8471968fb0]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix some typos in queryplanner-ng.in.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 8471968fb0bd6aeb997e30f65eb670b35bcdf4d0
User & Date: dan 2014-07-02 16:24:28.570
Context
2014-07-03
11:15
Fix a typo in vtab.html. (check-in: 34e181958c user: dan tags: trunk)
2014-07-02
16:24
Fix some typos in queryplanner-ng.in. (check-in: 8471968fb0 user: dan tags: trunk)
2014-06-30
17:52
Enhance the description of the sqlite_stat4 format for WITHOUT ROWID tables. (check-in: d7f83670d7 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/queryplanner-ng.in.
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

<p>For simple queries against a single table with few indices, there is usually
an obvious choice for the best algorithm.
But for larger and more complex queries, such as
multi-way joins with many indices
and subqueries, there can be hundreds, thousands, or millions
of reasonable algorithms for computing the result.
The job of the query planner is to chose a single "best" query plan from
this multitude of possibilities.</p>

<p>Query planners are what make SQL database engines so amazingly useful and powerful.
(This is true of all SQL database engines, not just SQLite.)
The query planner frees the programmer from the chore of selecting
a particular query plan, and thereby allows the programmer to
focus more mental energy on higher-level application issues and on 







|







45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

<p>For simple queries against a single table with few indices, there is usually
an obvious choice for the best algorithm.
But for larger and more complex queries, such as
multi-way joins with many indices
and subqueries, there can be hundreds, thousands, or millions
of reasonable algorithms for computing the result.
The job of the query planner is to choose the single "best" query plan from
this multitude of possibilities.</p>

<p>Query planners are what make SQL database engines so amazingly useful and powerful.
(This is true of all SQL database engines, not just SQLite.)
The query planner frees the programmer from the chore of selecting
a particular query plan, and thereby allows the programmer to
focus more mental energy on higher-level application issues and on 
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

<p>Writing a good query planner is more art than science.
The query planner must work with incomplete information.
It cannot determine how long any particular plan will take
without actually running that plan.  So when comparing two
or more plans to figure out which is "best", the query planner has to make
some guesses and assumptions and those guesses and assumptions will 
sometimes be wrong. A good query plan is one that will
find the correct solution often enough that application
programmers rarely need to get involved.</p>

<h3>2.1 Query Planning In SQLite</h3>

<p>SQLite computes joins using nested loops, 
one loop for each table







|







68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

<p>Writing a good query planner is more art than science.
The query planner must work with incomplete information.
It cannot determine how long any particular plan will take
without actually running that plan.  So when comparing two
or more plans to figure out which is "best", the query planner has to make
some guesses and assumptions and those guesses and assumptions will 
sometimes be wrong. A good query planner is one that will
find the correct solution often enough that application
programmers rarely need to get involved.</p>

<h3>2.1 Query Planning In SQLite</h3>

<p>SQLite computes joins using nested loops, 
one loop for each table
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
planner.  Given reliable information about the selectivity of indices, 
the NGQP should always pick a plan that is as good or better than before.
The problem is that some applications may be using low-quality and
low-selectivity indices without having run [ANALYZE].  The older query
planners look at many fewer possible implementations for each query and 
so they may have stumbled over a good plan by stupid luck.  The NGQP, on 
the other hand, looks at many more query plan possibilities, and it may 
chose a different query plan that
works better in theory, assuming good indices, but which gives a performance
regression in practice, because of the shape of the data.</p>

<p>Key points:</p>

<ul>
<li><p>The NGQP will always find an equal or better query plan, compared to







|







364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
planner.  Given reliable information about the selectivity of indices, 
the NGQP should always pick a plan that is as good or better than before.
The problem is that some applications may be using low-quality and
low-selectivity indices without having run [ANALYZE].  The older query
planners look at many fewer possible implementations for each query and 
so they may have stumbled over a good plan by stupid luck.  The NGQP, on 
the other hand, looks at many more query plan possibilities, and it may 
choose a different query plan that
works better in theory, assuming good indices, but which gives a performance
regression in practice, because of the shape of the data.</p>

<p>Key points:</p>

<ul>
<li><p>The NGQP will always find an equal or better query plan, compared to
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
For this reason, the query was modified to use the CROSS JOIN operator 
instead of the plain JOIN operator.
SQLite will not reorder the tables of a CROSS JOIN.
This is a long-standing feature of SQLite that is specifically designed
to allow knowledgeable programmers
to enforce a particular loop nesting order.  Once the join
was changed to CROSS JOIN (the addition of a single keyword) the NGQP was
forced to chose the faster algorithm-1 regardless of whether or not
statistical information had been gathered using ANALYZE.</p>

<p>We say that algorithm-1 "faster", but this
is not strictly true.  Algorithm-1 is faster in common repositories, but
it is possible to construct a repository in which
every check-in is on a different uniquely-named branch and
all check-ins are children of the root check-in.
In that case, TAGXREF_I1 would become more selective
than PLINK_I1 and algorithm-2 really would be the faster choice.
However such repositories are very unlikely to appear in
practice and so hard-coding the loop nested order using the
CROSS JOIN syntax is a reasonable solution
to the problem in this case.</p>

<tcl>hd_fragment howtofix {query planner checklist}</tcl>
<h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2>

<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run across any problems in your application.
If you are not having performance issues, the you do not need to worry
about any of this.</p>

<li><p><b>Create appropriate indices.</b>
Most SQL performance problems arise not because of query planner issues
but rather due to lack of appropriate indices.  Make sure indices are
available to assist all large queries.  Most performance issues can be
resolved by one or two CREATE INDEX commands and with no changes to
application code.</p>

<li><p><b>Avoid creating low-quality indices.</b>.
A low-quality index (for the purpose of this checklist) is one where
there are more than 10 or 20 rows in the table that have the same value
for the left-most column of the index.  In particular, avoid using
boolean or "enum" columns as the left-most columns of your indices.</p>

<p>The Fossil performance problem described in the previous section of
this document arose because there were over
ten thousands entries in the TAGXREF table with the same value for the 
left-most column (the TAGID column) of the TAGXREF_I1 index.</p>

<li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b>
Low-quality indices will not confuse the query planner as long as the
query planner knows that the indices are of low quality.  And the way
the query planner knows this is by the content of the [SQLITE_STAT1] table,
which is computed by the ANALYZE command.</p>







|


|


















|

















|







594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
For this reason, the query was modified to use the CROSS JOIN operator 
instead of the plain JOIN operator.
SQLite will not reorder the tables of a CROSS JOIN.
This is a long-standing feature of SQLite that is specifically designed
to allow knowledgeable programmers
to enforce a particular loop nesting order.  Once the join
was changed to CROSS JOIN (the addition of a single keyword) the NGQP was
forced to choose the faster algorithm-1 regardless of whether or not
statistical information had been gathered using ANALYZE.</p>

<p>We say that algorithm-1 is "faster", but this
is not strictly true.  Algorithm-1 is faster in common repositories, but
it is possible to construct a repository in which
every check-in is on a different uniquely-named branch and
all check-ins are children of the root check-in.
In that case, TAGXREF_I1 would become more selective
than PLINK_I1 and algorithm-2 really would be the faster choice.
However such repositories are very unlikely to appear in
practice and so hard-coding the loop nested order using the
CROSS JOIN syntax is a reasonable solution
to the problem in this case.</p>

<tcl>hd_fragment howtofix {query planner checklist}</tcl>
<h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2>

<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run across any problems in your application.
If you are not having performance issues, you do not need to worry
about any of this.</p>

<li><p><b>Create appropriate indices.</b>
Most SQL performance problems arise not because of query planner issues
but rather due to lack of appropriate indices.  Make sure indices are
available to assist all large queries.  Most performance issues can be
resolved by one or two CREATE INDEX commands and with no changes to
application code.</p>

<li><p><b>Avoid creating low-quality indices.</b>.
A low-quality index (for the purpose of this checklist) is one where
there are more than 10 or 20 rows in the table that have the same value
for the left-most column of the index.  In particular, avoid using
boolean or "enum" columns as the left-most columns of your indices.</p>

<p>The Fossil performance problem described in the previous section of
this document arose because there were over
ten-thousand entries in the TAGXREF table with the same value for the 
left-most column (the TAGID column) of the TAGXREF_I1 index.</p>

<li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b>
Low-quality indices will not confuse the query planner as long as the
query planner knows that the indices are of low quality.  And the way
the query planner knows this is by the content of the [SQLITE_STAT1] table,
which is computed by the ANALYZE command.</p>
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676

<li><p><b>Use [unlikely()] and [likelihood()] SQL functions.</b>
SQLite normally assumes that terms in the WHERE clause that cannot be used
by indices have a strong probability of being true.  If this assumption
is incorrect, it could lead to a suboptimal query plan.  The
[unlikely()] and [likelihood()] SQL functions can be used to provide
hints to the query planner about WHERE clause terms that are probably
not true, and thus aid the query planner is selecting the best possible
plan.

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an outer loop relative to the table on the right.</p>







|







662
663
664
665
666
667
668
669
670
671
672
673
674
675
676

<li><p><b>Use [unlikely()] and [likelihood()] SQL functions.</b>
SQLite normally assumes that terms in the WHERE clause that cannot be used
by indices have a strong probability of being true.  If this assumption
is incorrect, it could lead to a suboptimal query plan.  The
[unlikely()] and [likelihood()] SQL functions can be used to provide
hints to the query planner about WHERE clause terms that are probably
not true, and thus aid the query planner in selecting the best possible
plan.

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an outer loop relative to the table on the right.</p>