# Documentation Source Text

Check-in [28554e7596]

Overview
Comment: Updates to the next-generation-query-planner document. family | ancestors | descendants | both | trunk files | file ages | folders 28554e75969d632ff9111f5750d0db22cbc25f4c drh 2013-04-30 22:20:00
Context
 2013-05-01 00:14 Fix a typo in intern-v-extern-blob.html check-in: c8b280b8eb user: drh tags: trunk 2013-04-30 22:20 Updates to the next-generation-query-planner document. check-in: 28554e7596 user: drh tags: trunk 16:49 Add the Next Generation Query Planner document. check-in: 74fbc7b392 user: drh tags: trunk
Changes

Changes to pages/queryplanner-ng.in.

 ```162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 ... 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 ``` ``` R (cost: 3.56)
N1 (cost: 5.52)
N2 (cost: 5.52)
P (cost: 7.71)

The second step would find the four shortest paths to visit two nodes and that begin with one of the four paths from the previous step:

R-N1 (cost: 7.03)
N1-R (cost: 7.31)
R-N2 (cost: 9.08)
N2-R (cost: 9.08)

The third stop starts with the four shortest two-node paths and finds the four shortest three-node paths:

R-N1-N2 (cost: 12.55)
R-N2-N1 (cost: 12.55)
N2-R-N1 (cost: 12.55)
N1-R-N2 (cost: 12.83)

And so forth. There are 8 nodes in the TPC-H Q8 query, so this process repeats a total of 8 times. In the general case of a K-way join, the storage requirements is O(N) and the computation time is O(K*N). That is a lot less than an exact solution which requires O(2**K) time.

................................................................................ algorithm O(N**2) which is actually still quite efficient, since the maximum value of K is 64. But that is not enough for the TPC-H Q8 problem. With N==8 on TPC-H Q8 the NNN algorithm finds the solution R-N1-C-O-L-S-N2-P with a cost of 29.78. That is an improvement over NN, but it is still not optimal.

NNN finds the optimal solution for TPC-H Q8 when N is 21 or greater.

The current thinking is to make N a parameter with a default value of 30 or so. This will likely find at least a very good plan in most circumstances. We think you will need to work very hard to construct a query where NNN with N==30 does not find the optimal plan. Remember that the all SQLite versions prior to 3.7.18 do an excellent job with N==1. The value of N==30 would be the compile-time default, but can be changed using a PRAGMA.

``` ``` | > > > < > | | | | | | | | | > > > > > ``` ```162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 ... 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 ``` ``` R (cost: 3.56)
N1 (cost: 5.52)
N2 (cost: 5.52)
P (cost: 7.71)

The second step would find the four shortest paths to visit two nodes and that begin with one of the four paths from the previous step. In the case where two or more paths are equivalent (they have the same set of visited nodes, though possibly in a different order) then only the first and lowest cost path is retained. We have:

R-N1 (cost: 7.03)
R-N2 (cost: 9.08)
N2-N1 (cost: 11.04)
R-P {cost: 11.27}

The third stop starts with the four shortest two-node paths and finds the four shortest non-equivalent three-node paths:

R-N1-N2 (cost: 12.55)
R-N1-C (cost: 13.43)
R-N1-P (cost: 14.74)
R-N2-S (cost: 15.08)

And so forth. There are 8 nodes in the TPC-H Q8 query, so this process repeats a total of 8 times. In the general case of a K-way join, the storage requirements is O(N) and the computation time is O(K*N). That is a lot less than an exact solution which requires O(2**K) time.

................................................................................ algorithm O(N**2) which is actually still quite efficient, since the maximum value of K is 64. But that is not enough for the TPC-H Q8 problem. With N==8 on TPC-H Q8 the NNN algorithm finds the solution R-N1-C-O-L-S-N2-P with a cost of 29.78. That is an improvement over NN, but it is still not optimal.

NNN finds the optimal solution for TPC-H Q8 when N is 10 or greater.

The current thinking is to make N a parameter with a default value of 15 or so. This will likely find at least a very good plan in most circumstances. We think you will need to work very hard to construct a query where NNN with N==15 does not find the optimal plan. Remember that the all SQLite versions prior to 3.7.18 do an excellent job with N==1. The value of N==15 would be the compile-time default, but can be changed using a PRAGMA.

Another idea begin considered is to initially attempt to find a query plan using N==1 and then fall back and make a second attempt with a larger N if the first attempt fails to find a provably optimal plan. A provably optimal plan is one that uses the lowest-cost arc into each node.

```