Documentation Source Text

Check-in [28554e7596]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to the next-generation-query-planner document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 28554e75969d632ff9111f5750d0db22cbc25f4c
User & Date: drh 2013-04-30 22:20:00.775
Context
2013-05-01
00:14
Fix a typo in intern-v-extern-blob.html (check-in: c8b280b8eb user: drh tags: trunk)
2013-04-30
22:20
Updates to the next-generation-query-planner document. (check-in: 28554e7596 user: drh tags: trunk)
16:49
Add the Next Generation Query Planner document. (check-in: 74fbc7b392 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/queryplanner-ng.in.
162
163
164
165
166
167
168
169



170
171
172
173
174
175

176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210





    R (cost: 3.56) <br>
    N1 (cost: 5.52) <br>
    N2 (cost: 5.52) <br>
    P (cost: 7.71) <br>
</blockquote>

<p>The second step would find the four shortest paths to visit two nodes 
and that begin with one of the four paths from the previous step:</p>




<blockquote>
    R-N1 (cost: 7.03) <br>
    N1-R (cost: 7.31) <br>
    R-N2 (cost: 9.08) <br>
    N2-R (cost: 9.08) <br>

</blockquote>

<p>The third stop starts with the four shortest two-node paths and finds
the four shortest three-node paths:</p>

<blockquote>
    R-N1-N2 (cost: 12.55) <br>
    R-N2-N1 (cost: 12.55) <br>
    N2-R-N1 (cost: 12.55) <br>
    N1-R-N2 (cost: 12.83) <br>
</blockquote>

<p>And so forth.  There are 8 nodes in the TPC-H Q8 query, 
so this process repeats a total of 8
times.  In the general case of a K-way join, the storage requirements
is O(N) and the computation time is O(K*N).  That is a lot less than an
exact solution which requires O(2**K) time.</p>

<p>But what value to choose for N?  We might make N==K.  This makes the
algorithm O(N**2) which is actually still quite efficient, since the
maximum value of K is 64.  But that is not enough for the TPC-H Q8
problem.  With N==8 on TPC-H Q8 the NNN algorithm finds 
the solution R-N1-C-O-L-S-N2-P with a cost of 29.78.  
That is an improvement over NN, but it is still
not optimal.</p>

<p>NNN finds the optimal solution for TPC-H Q8 when N is 21 or greater.</p>

<p>The current thinking is to make N a parameter with a default
value of 30 or so.  This will likely find at least a very good plan in most
circumstances.  We think you will need to work very hard to
construct a query where NNN with N==30 does not find the optimal plan.
Remember that the all SQLite versions prior to 3.7.18 do
an excellent job with N==1.  The value of N==30 would be the compile-time
default, but can be changed using a PRAGMA.</p>












|
>
>
>



<

|
>



|



|
|
|
















|


|

|

|

>
>
>
>
>
162
163
164
165
166
167
168
169
170
171
172
173
174
175

176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
    R (cost: 3.56) <br>
    N1 (cost: 5.52) <br>
    N2 (cost: 5.52) <br>
    P (cost: 7.71) <br>
</blockquote>

<p>The second step would find the four shortest paths to visit two nodes 
and that begin with one of the four paths from the previous step.  In the
case where two or more paths are equivalent (they have the same set of
visited nodes, though possibly in a different order) then only the
first and lowest cost path is retained.  We have:</p>

<blockquote>
    R-N1 (cost: 7.03) <br>

    R-N2 (cost: 9.08) <br>
    N2-N1 (cost: 11.04) <br>
    R-P {cost: 11.27} <br>
</blockquote>

<p>The third stop starts with the four shortest two-node paths and finds
the four shortest non-equivalent three-node paths:</p>

<blockquote>
    R-N1-N2 (cost: 12.55) <br>
    R-N1-C (cost: 13.43) <br>
    R-N1-P (cost: 14.74) <br>
    R-N2-S (cost: 15.08) <br>
</blockquote>

<p>And so forth.  There are 8 nodes in the TPC-H Q8 query, 
so this process repeats a total of 8
times.  In the general case of a K-way join, the storage requirements
is O(N) and the computation time is O(K*N).  That is a lot less than an
exact solution which requires O(2**K) time.</p>

<p>But what value to choose for N?  We might make N==K.  This makes the
algorithm O(N**2) which is actually still quite efficient, since the
maximum value of K is 64.  But that is not enough for the TPC-H Q8
problem.  With N==8 on TPC-H Q8 the NNN algorithm finds 
the solution R-N1-C-O-L-S-N2-P with a cost of 29.78.  
That is an improvement over NN, but it is still
not optimal.</p>

<p>NNN finds the optimal solution for TPC-H Q8 when N is 10 or greater.</p>

<p>The current thinking is to make N a parameter with a default
value of 15 or so.  This will likely find at least a very good plan in most
circumstances.  We think you will need to work very hard to
construct a query where NNN with N==15 does not find the optimal plan.
Remember that the all SQLite versions prior to 3.7.18 do
an excellent job with N==1.  The value of N==15 would be the compile-time
default, but can be changed using a PRAGMA.</p>

<p>Another idea begin considered is to initially attempt to find a query
plan using N==1 and then fall back and make a second attempt with a larger
N if the first attempt fails to find a provably optimal plan.  A provably
optimal plan is one that uses the lowest-cost arc into each node.</p>