Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
SHA1:  28554e75969d632ff9111f5750d0db22cbc25f4c 

Date:  20130430 22:20:00 
User:  drh 
Comment:  Updates to the nextgenerationqueryplanner document. 
Tags And Properties
 branch=trunk inherited from [b2e03e]
 symtrunk inherited from [b2e03e]
Context
20130501
 
00:14  [c8b280] Fix a typo in internvexternblob.html (user: drh, tags: trunk)  
20130430
 
22:20  [28554e] Updates to the nextgenerationqueryplanner document. (user: drh, tags: trunk)  
16:49  [74fbc7] Add the Next Generation Query Planner document. (user: drh, tags: trunk)  
Changes
Changes to pages/queryplannerng.in.
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
...
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210

R (cost: 3.56) <br> N1 (cost: 5.52) <br> N2 (cost: 5.52) <br> P (cost: 7.71) <br> </blockquote> <p>The second step would find the four shortest paths to visit two nodes and that begin with one of the four paths from the previous step:</p> <blockquote> RN1 (cost: 7.03) <br> N1R (cost: 7.31) <br> RN2 (cost: 9.08) <br> N2R (cost: 9.08) <br> </blockquote> <p>The third stop starts with the four shortest twonode paths and finds the four shortest threenode paths:</p> <blockquote> RN1N2 (cost: 12.55) <br> RN2N1 (cost: 12.55) <br> N2RN1 (cost: 12.55) <br> N1RN2 (cost: 12.83) <br> </blockquote> <p>And so forth. There are 8 nodes in the TPCH Q8 query, so this process repeats a total of 8 times. In the general case of a Kway join, the storage requirements is O(N) and the computation time is O(K*N). That is a lot less than an exact solution which requires O(2**K) time.</p> ................................................................................ algorithm O(N**2) which is actually still quite efficient, since the maximum value of K is 64. But that is not enough for the TPCH Q8 problem. With N==8 on TPCH Q8 the NNN algorithm finds the solution RN1COLSN2P with a cost of 29.78. That is an improvement over NN, but it is still not optimal.</p> <p>NNN finds the optimal solution for TPCH Q8 when N is 21 or greater.</p> <p>The current thinking is to make N a parameter with a default value of 30 or so. This will likely find at least a very good plan in most circumstances. We think you will need to work very hard to construct a query where NNN with N==30 does not find the optimal plan. Remember that the all SQLite versions prior to 3.7.18 do an excellent job with N==1. The value of N==30 would be the compiletime default, but can be changed using a PRAGMA.</p> 

>
>
>
<
>









>
>
>
>
>

162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
...
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218

R (cost: 3.56) <br> N1 (cost: 5.52) <br> N2 (cost: 5.52) <br> P (cost: 7.71) <br> </blockquote> <p>The second step would find the four shortest paths to visit two nodes and that begin with one of the four paths from the previous step. In the case where two or more paths are equivalent (they have the same set of visited nodes, though possibly in a different order) then only the first and lowest cost path is retained. We have:</p> <blockquote> RN1 (cost: 7.03) <br> RN2 (cost: 9.08) <br> N2N1 (cost: 11.04) <br> RP {cost: 11.27} <br> </blockquote> <p>The third stop starts with the four shortest twonode paths and finds the four shortest nonequivalent threenode paths:</p> <blockquote> RN1N2 (cost: 12.55) <br> RN1C (cost: 13.43) <br> RN1P (cost: 14.74) <br> RN2S (cost: 15.08) <br> </blockquote> <p>And so forth. There are 8 nodes in the TPCH Q8 query, so this process repeats a total of 8 times. In the general case of a Kway join, the storage requirements is O(N) and the computation time is O(K*N). That is a lot less than an exact solution which requires O(2**K) time.</p> ................................................................................ algorithm O(N**2) which is actually still quite efficient, since the maximum value of K is 64. But that is not enough for the TPCH Q8 problem. With N==8 on TPCH Q8 the NNN algorithm finds the solution RN1COLSN2P with a cost of 29.78. That is an improvement over NN, but it is still not optimal.</p> <p>NNN finds the optimal solution for TPCH Q8 when N is 10 or greater.</p> <p>The current thinking is to make N a parameter with a default value of 15 or so. This will likely find at least a very good plan in most circumstances. We think you will need to work very hard to construct a query where NNN with N==15 does not find the optimal plan. Remember that the all SQLite versions prior to 3.7.18 do an excellent job with N==1. The value of N==15 would be the compiletime default, but can be changed using a PRAGMA.</p> <p>Another idea begin considered is to initially attempt to find a query plan using N==1 and then fall back and make a second attempt with a larger N if the first attempt fails to find a provably optimal plan. A provably optimal plan is one that uses the lowestcost arc into each node.</p> 