Documentation Source Text

Check-in [9747293881]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos in the queryplanner-ng document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 97472938818bab78bfc37c3f1c488bee469d9286
User & Date: drh 2013-08-27 13:34:32.076
Context
2013-08-29
14:29
Preparations for the 3.8.0.1 patch release. (check-in: 8d74017a56 user: drh tags: branch-3.8.0)
2013-08-28
14:13
Add documentation for STAT4. (check-in: 04ab4fa5ec user: drh tags: trunk)
2013-08-27
13:34
Fix typos in the queryplanner-ng document. (check-in: 9747293881 user: drh tags: trunk)
2013-08-26
12:54
Version-3.8.0 (check-in: e672a5f4d8 user: drh tags: trunk, release, version-3.8.0)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/queryplanner-ng.in.
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
engine figure out the best way to retrieve that content.</p>

<p>Writing a good query planner is more art than science.
The query planner must work with incomplete information.
It cannot determine how long any particular plan will take
without actually running that plan.  So when comparing two
or more plans to figure out which is "best", the query planner has to make
some guesses and assumption and those guesses and assumptions will 
sometimes be wrong. A good query plan is one that will
find the correct solution often enough that application
programmers rarely need to get involved.</p>

<h3>2.1 Query Planning In SQLite</h3>

<p>SQLite computes joins using nested loops, 







|







67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
engine figure out the best way to retrieve that content.</p>

<p>Writing a good query planner is more art than science.
The query planner must work with incomplete information.
It cannot determine how long any particular plan will take
without actually running that plan.  So when comparing two
or more plans to figure out which is "best", the query planner has to make
some guesses and assumptions and those guesses and assumptions will 
sometimes be wrong. A good query plan is one that will
find the correct solution often enough that application
programmers rarely need to get involved.</p>

<h3>2.1 Query Planning In SQLite</h3>

<p>SQLite computes joins using nested loops, 
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
ensure that plans do not change unexpectedly after deployment.</p>

<p>The SQLite stability guarantee applies equally to the legacy query planner 
and to the NGQP.</p>

<p>It is important to note that changing versions of SQLite might cause
changes in query plans.
The same version of SQLite it will
always pick the same query plan, but if you relink your application to use
a different version of SQLite, then query plans might change.  In rare
cases, an SQLite version change might lead to a performance regression.  
This is one reason
you should consider statically linking your applications against SQLite 
rather than use a system-wide SQLite shared library which might 
change without your knowledge or control.</p>







|







130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
ensure that plans do not change unexpectedly after deployment.</p>

<p>The SQLite stability guarantee applies equally to the legacy query planner 
and to the NGQP.</p>

<p>It is important to note that changing versions of SQLite might cause
changes in query plans.
The same version of SQLite will
always pick the same query plan, but if you relink your application to use
a different version of SQLite, then query plans might change.  In rare
cases, an SQLite version change might lead to a performance regression.  
This is one reason
you should consider statically linking your applications against SQLite 
rather than use a system-wide SQLite shared library which might 
change without your knowledge or control.</p>
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
N2, S, L, P, O, C, N1 and R.
The arcs in the graph represent the estimated cost of computing each term 
assuming that the origin of the arc is in an outer loop.  For example, the
cost of running the S loop as an inner loop to L is 2.30 whereas the
cost of running the S loop as an outer loop to L is 9.17.</p>

<p>The "cost" here is logarithmic.  With nested loops, the work
is multiplied, not added.  But it customary to think of graphs
with additive weights and so the graph shows the logarithm of the
various costs.  The graph shows a cost advantage of S being inside of
L of about 6.87, but this translates into the query running about
963 times faster when S loop is inside of the L loop rather than
being outside of it.</p>

<p>The arrows from the small circles labeled with "*" indicate the cost
of running each loop with no dependencies.  The outer loop must use this
*-cost.  Inner loops have the option of using the *-cost or a cost assuming
one of the other terms is in an outer loop, whichever gives the best
result.  One can think of the *-costs as a short-hand notation indicating
multiple arcs, one from each of the other nodes in the
graph.  The graph is therefore "complete", meaning that there are arcs
(some explicit and some implied) in both directions between every pair of 
nodes in the graph.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph that visits each node
exactly once.</p>

<p>(Side note:  The costs estimates in the TPC-H Q8 graph where computed
by the query planner in SQLite 3.7.16 and converted using a base-10 logarithm.)
</p>

<h3>3.2 Complications</h3>

<p>The presentation of the query planner problem above is a simplification.
The costs are estimates.  We cannot







|




















|







179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
N2, S, L, P, O, C, N1 and R.
The arcs in the graph represent the estimated cost of computing each term 
assuming that the origin of the arc is in an outer loop.  For example, the
cost of running the S loop as an inner loop to L is 2.30 whereas the
cost of running the S loop as an outer loop to L is 9.17.</p>

<p>The "cost" here is logarithmic.  With nested loops, the work
is multiplied, not added.  But it is customary to think of graphs
with additive weights and so the graph shows the logarithm of the
various costs.  The graph shows a cost advantage of S being inside of
L of about 6.87, but this translates into the query running about
963 times faster when S loop is inside of the L loop rather than
being outside of it.</p>

<p>The arrows from the small circles labeled with "*" indicate the cost
of running each loop with no dependencies.  The outer loop must use this
*-cost.  Inner loops have the option of using the *-cost or a cost assuming
one of the other terms is in an outer loop, whichever gives the best
result.  One can think of the *-costs as a short-hand notation indicating
multiple arcs, one from each of the other nodes in the
graph.  The graph is therefore "complete", meaning that there are arcs
(some explicit and some implied) in both directions between every pair of 
nodes in the graph.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph that visits each node
exactly once.</p>

<p>(Side note:  The costs estimates in the TPC-H Q8 graph were computed
by the query planner in SQLite 3.7.16 and converted using a base-10 logarithm.)
</p>

<h3>3.2 Complications</h3>

<p>The presentation of the query planner problem above is a simplification.
The costs are estimates.  We cannot
700
701
702
703
704
705
706
707
legacy query planner and even more true of the new NGQP.  There may be
an occasional situation where, due to incomplete information, the query
planner selects a suboptimal plan.  This will happen less often with the
NGQP than with the legacy query planner, but it might still happen.  Only
in those rare cases do application developers need to get involved and
help the query planner to do the right thing.  In the common case, the
NGQP is just a new enhancement to SQLite that makes the application run
a little faster and which requires no new developer though or action.</p>







|
700
701
702
703
704
705
706
707
legacy query planner and even more true of the new NGQP.  There may be
an occasional situation where, due to incomplete information, the query
planner selects a suboptimal plan.  This will happen less often with the
NGQP than with the legacy query planner, but it might still happen.  Only
in those rare cases do application developers need to get involved and
help the query planner to do the right thing.  In the common case, the
NGQP is just a new enhancement to SQLite that makes the application run
a little faster and which requires no new developer thought or action.</p>