Documentation Source Text

Check-in [9e12c649c9]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Minor enhancements and updates to various documents.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256:9e12c649c9ba5aafb8145b387e97d48e055f845049da86c592681f5b9e8b3fb0
User & Date: drh 2018-11-24 20:24:59
Context
2018-11-24
20:58
In the GeoPoly documentation, mention that geopoly_ccw() can be used to correct vertex order after geopoly_xform(). check-in: bfc897c9c0 user: drh tags: trunk
20:24
Minor enhancements and updates to various documents. check-in: 9e12c649c9 user: drh tags: trunk
19:14
Update the speed-and-size spreadsheet and the cpu.html page. Also make minor tweaks to the omitted.html page. check-in: 555bf82e10 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Show Whitespace Changes Patch

Changes to pages/amalgamation.in.

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

<h1>Executive Summary</h1>

<p>Over 100 separate source files are concatenated into a
single large files of C-code named "sqlite3.c" and
called "the amalgamation". The amalgamation
contains everything an application needs to embed SQLite.
The amalgamation file is more than 180,000 lines long and over 6
megabytes in size.

<p>Combining all the code for SQLite into one big file makes SQLite
easier to deploy &mdash; there is just one file to keep track of.  
And because all code is in
a single translation unit, compilers can do better inter-procedure
optimization resulting in machine code that is between 5% and 10% faster.








|
|







5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

<h1>Executive Summary</h1>

<p>Over 100 separate source files are concatenated into a
single large files of C-code named "sqlite3.c" and
called "the amalgamation". The amalgamation
contains everything an application needs to embed SQLite.
The amalgamation file is more than 220,000 lines long and over 7.5
megabytes in size (as of 2018-11-24).

<p>Combining all the code for SQLite into one big file makes SQLite
easier to deploy &mdash; there is just one file to keep track of.  
And because all code is in
a single translation unit, compilers can do better inter-procedure
optimization resulting in machine code that is between 5% and 10% faster.

Changes to pages/arch.in.

219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
Memory allocation, caseless string comparison routines, 
portable text-to-number conversion routines, and other utilities
are located in <file>util.c</file>.
Symbol tables used by the parser are maintained by hash tables found
in <file>hash.c</file>.  The <file>utf.c</file> source file contains Unicode
conversion subroutines.
SQLite has its own private implementation of 
[sqlite3_mprintf|printf()] (with
some extensions) in <file>printf.c</file> and its own
pseudo-random number generator (PRNG) in <file>random.c</file>.
</p>

<h1>Test Code</h1>

<p>
Files in the "src/" folder of the source tree whose names begin with
<b>test</b> are for testing only and are not included in a standard
build of the library.
</p>
}
</tcl>







|













219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
Memory allocation, caseless string comparison routines, 
portable text-to-number conversion routines, and other utilities
are located in <file>util.c</file>.
Symbol tables used by the parser are maintained by hash tables found
in <file>hash.c</file>.  The <file>utf.c</file> source file contains Unicode
conversion subroutines.
SQLite has its own private implementation of 
[built-in printf|printf()] (with
some extensions) in <file>printf.c</file> and its own
pseudo-random number generator (PRNG) in <file>random.c</file>.
</p>

<h1>Test Code</h1>

<p>
Files in the "src/" folder of the source tree whose names begin with
<b>test</b> are for testing only and are not included in a standard
build of the library.
</p>
}
</tcl>

Changes to pages/optoverview.in.

20
21
22
23
24
25
26

27
28
29
30
31
32
33
34
....
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
....
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282

<p>
  Additional background information is available in the
  [indexing tutorial] document.


<p>

  With release 3.8.0, the SQLite query planner was reimplemented as the
  [Next Generation Query Planner] or "NGQP".  All of the features, techniques,
  and algorithms described in this document are applicable to both the
  pre-3.8.0 legacy query planner and to the NGQP.  For further information on
  how the NGQP differs from the legacy query planner, see the 
  [NGQP | detailed description of the NGQP].


................................................................................
  CREATE TABLE t2(c,d);
  -- Insert many rows into both t1 and t2
  SELECT * FROM t1, t2 WHERE a=c;
</codeblock>
<p>
  In the query above, if both t1 and t2 have approximately N rows, then
  without any indices the query will require O(N*N) time.  On the other
  hand, creating an index on table t2 requires O(NlogN) time and then using 
  that index to evaluate the query requires an additional O(NlogN) time.
  In the absence of [ANALYZE] information, SQLite guesses that N is one
  million and hence it believes that constructing the automatic index will
  be the cheaper approach.

<p>
  An automatic index might also be used for a subquery:
................................................................................
  -- Insert many rows into both t1 and t2
  SELECT a, (SELECT d FROM t2 WHERE c=b) FROM t1;
</codeblock>
<p>
  In this example, the t2 table is used in a subquery to translate values
  of the t1.b column.  If each table contains N rows, SQLite expects that
  the subquery will run N times, and hence it will believe it is faster
  to construct an automatic, transient index on t2 first and then using
  that index to satisfy the N instances of the subquery.

<p>
  The automatic indexing capability can be disabled at run-time using
  the [automatic_index pragma].  Automatic indexing is turned on by
  default, but this can be changed so that automatic indexing is off
  by default using the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option.







>
|







 







|







 







|







20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
....
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
....
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283

<p>
  Additional background information is available in the
  [indexing tutorial] document.


<p>
  With release 3.8.0 ([dateof:3.8.0]),
  the SQLite query planner was reimplemented as the
  [Next Generation Query Planner] or "NGQP".  All of the features, techniques,
  and algorithms described in this document are applicable to both the
  pre-3.8.0 legacy query planner and to the NGQP.  For further information on
  how the NGQP differs from the legacy query planner, see the 
  [NGQP | detailed description of the NGQP].


................................................................................
  CREATE TABLE t2(c,d);
  -- Insert many rows into both t1 and t2
  SELECT * FROM t1, t2 WHERE a=c;
</codeblock>
<p>
  In the query above, if both t1 and t2 have approximately N rows, then
  without any indices the query will require O(N*N) time.  On the other
  hand, creating an index on table t2 requires O(NlogN) time and then use
  that index to evaluate the query requires an additional O(NlogN) time.
  In the absence of [ANALYZE] information, SQLite guesses that N is one
  million and hence it believes that constructing the automatic index will
  be the cheaper approach.

<p>
  An automatic index might also be used for a subquery:
................................................................................
  -- Insert many rows into both t1 and t2
  SELECT a, (SELECT d FROM t2 WHERE c=b) FROM t1;
</codeblock>
<p>
  In this example, the t2 table is used in a subquery to translate values
  of the t1.b column.  If each table contains N rows, SQLite expects that
  the subquery will run N times, and hence it will believe it is faster
  to construct an automatic, transient index on t2 first and then use
  that index to satisfy the N instances of the subquery.

<p>
  The automatic indexing capability can be disabled at run-time using
  the [automatic_index pragma].  Automatic indexing is turned on by
  default, but this can be changed so that automatic indexing is off
  by default using the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option.

Changes to pages/queryplanner-ng.in.

36
37
38
39
40
41
42
43
44

45
46
47
48
49
50
51
52
53
54
55
56
57
58
..
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
...
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
...
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
...
350
351
352
353
354
355
356







357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
...
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
...
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
...
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
...
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
...
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
However, there may exist legacy applications that unknowingly depend on 
undefined and/or suboptimal behavior in the legacy query planner, and
upgrading to the NGQP on those legacy applications could cause performance
regressions.  This risk is considered and a checklist is provided
for reducing the risk and for fixing any issues that do arise.</p>

<p>This document focuses on the NGQP.  For a more general overview of the
SQLite query planner that encompasses the entire history of SQLite, see
"[query planner | The SQLite Query Optimizer Overview]".</p>


<h1> Background</h1>

<p>For simple queries against a single table with few indices, there is usually
an obvious choice for the best algorithm.
But for larger and more complex queries, such as
multi-way joins with many indices
and subqueries, there can be hundreds, thousands, or millions
of reasonable algorithms for computing the result.
The job of the query planner is to choose the single "best" query plan from
this multitude of possibilities.</p>

<p>Query planners are what make SQL database engines so amazingly useful and powerful.
(This is true of all SQL database engines, not just SQLite.)
................................................................................
<h2> Query Planning In SQLite</h2>

<p>SQLite computes joins using nested loops, 
one loop for each table
in the join.  (Additional loops might be inserted for IN
and OR operators in the WHERE clause.  SQLite considers those too,
but for simplicity we will ignore them in this essay.)
One or more indices might be used on each loop to speed the search,
or a loop might be a "full table scan" that reads every row in the
table.  Thus query planning decomposes into two subtasks:</p>
<ol>
<li> Picking the nested order of the various loops
<li> Choosing good indices for each loop
</ol>
<p>Picking the nesting order is generally the more challenging problem.
Once the nesting order of the join is established, the choice of indices
for each loop is normally obvious.</p>

<tcl>hd_fragment qpstab {query planner stability guarantee} QPSG</tcl>
<h2> The SQLite Query Planner Stability Guarantee</h2>

<p>When the Query Planner Stability Guarantee (QPSG) is enabled
SQLite will always pick the same query plan for any
given SQL statement as long as:

<ol type="a">
<li>the database schema does not change in significant ways such as 
    adding or dropping indices,</li>
<li>the ANALYZE command is not rerun, </li>
<li>the same version of SQLite is used.</li>
</ol>

<p>The QPSG is disabled by default.  It can be enabled at compile-time
using the [SQLITE_ENABLE_QPSG] compile-time option, or at run-time by
invoking [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_QPSG],1,0).
................................................................................
query plan, possibly causing a performance problem, after your application 
is released to users.  If your application works in the lab, it
will continue working the same way after deployment.</p>

<p>Enterprise-class client/server SQL database engines do not normally 
make this guarantee.
In client/server SQL database engines, the server keeps track of
statistics on the sizes of tables and on the quality of indices 
and the query planner uses those statistics to help select the best plans.
As content is added, deleted, or changed in the database, the statistics 
will evolve and may cause the query planner to begin using a different 
query plan for some particular query.  Usually the new plan will be better 
for the evolving structure of the data.  But sometimes the new query plan will
cause a performance reduction.  With a client/server database engine, there
is typically a Database Administrator (DBA) on hand to deal with these 
................................................................................

<h2> Complications</h2>

<p>The presentation of the query planner problem above is a simplification.
The costs are estimates.  We cannot
know what the true cost of running a loop is until we actually run the loop.
SQLite makes guesses for the cost of running a loop based on
the availability of indices and constraints found in the WHERE
clause.  These guesses are usually pretty good, but they can sometimes be
off.  Using the [ANALYZE] command to collect additional statistical
information about the database can sometimes enable SQLite to make better
guesses about the cost.</p>

<p>The costs are comprised of multiple numbers, not a single number as
shown in the graph.
................................................................................
<p>The initial implementation of NGQP chooses N=1 for simple queries, N=5
for two-way joins and N=10 for all joins with three or more tables.  This
formula for selecting N might change in subsequent releases.</p>

<tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl>
<h1> Hazards Of Upgrading To NGQP</h1>








<p>For most applications, upgrading from the legacy query planner to the NGQP
requires little thought or effort.
Simply replace the older SQLite version with the newer version of SQLite 
and recompile and the application will run faster.  
There are no API changes nor modifications
to compilation procedures.</p>

<p>But as with any query planner change, upgrading to the NGQP does carry
a small risk of introducing performance regressions.  The problem here is
not that the NGQP is incorrect or buggy or inferior to the legacy query
planner.  Given reliable information about the selectivity of indices, 
the NGQP should always pick a plan that is as good or better than before.
The problem is that some applications may be using low-quality and
low-selectivity indices without having run [ANALYZE].  The older query
planners look at many fewer possible implementations for each query and 
so they may have stumbled over a good plan by stupid luck.  The NGQP, on 
the other hand, looks at many more query plan possibilities, and it may 
choose a different query plan that
works better in theory, assuming good indices, but which gives a performance
regression in practice, because of the shape of the data.</p>

<p>Key points:</p>

<ul>
<li><p>The NGQP will always find an equal or better query plan, compared to
    prior query planners, as long as it
    has access to accurate [ANALYZE] data in the [SQLITE_STAT1] file.</p>
<li><p>The NGQP will always find a good query plan 
    as long as the schema does not contain indices that have more than
    about 10 or 20 rows with the same value in the left-most column of the
    index.</p>
</ul>

<p>Not all applications meet these conditions.  Fortunately,
the NGQP will still usually find good query plans, even without these conditions.
However, cases do arise (rarely) where performance regressions can occur.</p>
................................................................................
Intuitively, we humans understand that algorithm-1 is best.
Each check-in is likely to have few children (one child is
the most common case) and each child can be tested for the
$trunk tag in logarithmic time.  Indeed, algorithm-1 is the
faster choice in practice.  But the NGQP has no intuition.  The
NGQP must use hard math, and algorithm-2 is slightly
better mathematically.  This is because, in the absence of other information,
the NGQP must assume that the indices PLINK_I1 and TAGXREF_I1 are of
equal quality and are equally selective.  Algorithm-2 uses one field
of the TAGXREF_I1 index and both fields of the PLINK_I1 index whereas
algorithm-1 only uses the first field of each index.  Since 
algorithm-2 uses more index material, the NGQP is correct
to judge it to be the better algorithm.  The scores are close and
algorithm-2 just barely squeaks ahead of algorithm-1.  But
algorithm-2 really is the correct choice here.
................................................................................

<p>
Unfortunately, algorithm-2 is slower than algorithm-1 in
this application.
</p>

<p>
The problem is that the indices are not of equal quality.
A check-in is likely to only have one child.  So the first
field of PLINK_I1 will usually narrow down the search to just a single
row.  But there are thousands and thousands check-ins tagged with "trunk", 
so the first field of TAGXREF_I1 will be
of little help in narrowing down the search.
</p>

<p>
The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
gathers statistics on the quality of the various indices and stores those
statistics in [SQLITE_STAT1] table.  
Having access to this statistical information,
the NGQP easily chooses algorithm-1 as the best algorithm, by a wide
margin.</p>

<p>Why didn't the legacy query planner choose algorithm-2?
Easy: because the NN algorithm
................................................................................
<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run across any problems in your application.
If you are not having performance issues, you do not need to worry
about any of this.</p>

<li><p><b>Create appropriate indices.</b>
Most SQL performance problems arise not because of query planner issues
but rather due to lack of appropriate indices.  Make sure indices are
available to assist all large queries.  Most performance issues can be
resolved by one or two CREATE INDEX commands and with no changes to
application code.</p>

<li><p><b>Avoid creating low-quality indices.</b>.
A low-quality index (for the purpose of this checklist) is one where
there are more than 10 or 20 rows in the table that have the same value
for the left-most column of the index.  In particular, avoid using
boolean or "enum" columns as the left-most columns of your indices.</p>

<p>The Fossil performance problem described in the previous section of
this document arose because there were over
ten-thousand entries in the TAGXREF table with the same value for the 
left-most column (the TAGID column) of the TAGXREF_I1 index.</p>

<li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b>
Low-quality indices will not confuse the query planner as long as the
query planner knows that the indices are of low quality.  And the way
the query planner knows this is by the content of the [SQLITE_STAT1] table,
which is computed by the ANALYZE command.</p>

<p>Of course, ANALYZE only works effectively if you have a significant 
amount of content in your database in the first place.  When creating a 
new database that you expect to accumulate a lot of data, you can run 
the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table,
................................................................................

<li><p><b>Instrument your code.</b>
Add logic that lets you know quickly and easily which queries are taking
too much time.  Then work on just those specific queries.</p>

<li><p><b>Use [unlikely()] and [likelihood()] SQL functions.</b>
SQLite normally assumes that terms in the WHERE clause that cannot be used
by indices have a strong probability of being true.  If this assumption
is incorrect, it could lead to a suboptimal query plan.  The
[unlikely()] and [likelihood()] SQL functions can be used to provide
hints to the query planner about WHERE clause terms that are probably
not true, and thus aid the query planner in selecting the best possible
plan.

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an outer loop relative to the table on the right.</p>

<p>Avoid this step if possible, as it defeats one of the huge advantages
of the whole SQL language concept, specifically that the application 
programmer does not need to get involved with query planning.  If you
................................................................................
Avoid using this trick if at all possible, and especially avoid it
early in the application development cycle.  Beware that
adding a unary "+" operator to an equality expression might change
the result of that expression if 
<a href="datatype3.html#affinity">type affinity</a> is involved.</p>

<li><p><b>Use the [INDEXED BY] syntax to enforce the selection of
particular indices on problem queries.</b>
As with the previous two bullets, avoid this step if possible, and 
especially avoid doing this early in development as it is clearly a
premature optimization.</p>
</ol>

<h1> Summary</h1>








|
|
>



|


|







 







|




|


|











|







 







|







 







|







 







>
>
>
>
>
>
>










|


|




|









|







 







|







 







|










|







 







|

|




|



|







|
|







 







|







|







 







|







36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
..
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
...
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
...
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
...
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
...
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
...
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
...
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
...
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
...
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
However, there may exist legacy applications that unknowingly depend on 
undefined and/or suboptimal behavior in the legacy query planner, and
upgrading to the NGQP on those legacy applications could cause performance
regressions.  This risk is considered and a checklist is provided
for reducing the risk and for fixing any issues that do arise.</p>

<p>This document focuses on the NGQP.  For a more general overview of the
SQLite query planner that encompasses the entire history of SQLite, see the
"[query planner | The SQLite Query Optimizer Overview]" and
"[indexing | How Indexes Work]" documents.</p>

<h1> Background</h1>

<p>For simple queries against a single table with few indexes, there is usually
an obvious choice for the best algorithm.
But for larger and more complex queries, such as
multi-way joins with many indexes
and subqueries, there can be hundreds, thousands, or millions
of reasonable algorithms for computing the result.
The job of the query planner is to choose the single "best" query plan from
this multitude of possibilities.</p>

<p>Query planners are what make SQL database engines so amazingly useful and powerful.
(This is true of all SQL database engines, not just SQLite.)
................................................................................
<h2> Query Planning In SQLite</h2>

<p>SQLite computes joins using nested loops, 
one loop for each table
in the join.  (Additional loops might be inserted for IN
and OR operators in the WHERE clause.  SQLite considers those too,
but for simplicity we will ignore them in this essay.)
One or more indexes might be used on each loop to speed the search,
or a loop might be a "full table scan" that reads every row in the
table.  Thus query planning decomposes into two subtasks:</p>
<ol>
<li> Picking the nested order of the various loops
<li> Choosing good indexes for each loop
</ol>
<p>Picking the nesting order is generally the more challenging problem.
Once the nesting order of the join is established, the choice of indexes
for each loop is normally obvious.</p>

<tcl>hd_fragment qpstab {query planner stability guarantee} QPSG</tcl>
<h2> The SQLite Query Planner Stability Guarantee</h2>

<p>When the Query Planner Stability Guarantee (QPSG) is enabled
SQLite will always pick the same query plan for any
given SQL statement as long as:

<ol type="a">
<li>the database schema does not change in significant ways such as 
    adding or dropping indexes,</li>
<li>the ANALYZE command is not rerun, </li>
<li>the same version of SQLite is used.</li>
</ol>

<p>The QPSG is disabled by default.  It can be enabled at compile-time
using the [SQLITE_ENABLE_QPSG] compile-time option, or at run-time by
invoking [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_QPSG],1,0).
................................................................................
query plan, possibly causing a performance problem, after your application 
is released to users.  If your application works in the lab, it
will continue working the same way after deployment.</p>

<p>Enterprise-class client/server SQL database engines do not normally 
make this guarantee.
In client/server SQL database engines, the server keeps track of
statistics on the sizes of tables and on the quality of indexes 
and the query planner uses those statistics to help select the best plans.
As content is added, deleted, or changed in the database, the statistics 
will evolve and may cause the query planner to begin using a different 
query plan for some particular query.  Usually the new plan will be better 
for the evolving structure of the data.  But sometimes the new query plan will
cause a performance reduction.  With a client/server database engine, there
is typically a Database Administrator (DBA) on hand to deal with these 
................................................................................

<h2> Complications</h2>

<p>The presentation of the query planner problem above is a simplification.
The costs are estimates.  We cannot
know what the true cost of running a loop is until we actually run the loop.
SQLite makes guesses for the cost of running a loop based on
the availability of indexes and constraints found in the WHERE
clause.  These guesses are usually pretty good, but they can sometimes be
off.  Using the [ANALYZE] command to collect additional statistical
information about the database can sometimes enable SQLite to make better
guesses about the cost.</p>

<p>The costs are comprised of multiple numbers, not a single number as
shown in the graph.
................................................................................
<p>The initial implementation of NGQP chooses N=1 for simple queries, N=5
for two-way joins and N=10 for all joins with three or more tables.  This
formula for selecting N might change in subsequent releases.</p>

<tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl>
<h1> Hazards Of Upgrading To NGQP</h1>

<p><i>Update on 2018-11-24: This section was important 
when the NGQP was new.  But five years have elapsed, the NGQP has been
deployed successfully to billions of devices, and everyone has upgraded.
The upgrade hazard has vanished.
This section is retained for historical reference only.
Modern reads can skip ahead to the [query planner checklist].</i>

<p>For most applications, upgrading from the legacy query planner to the NGQP
requires little thought or effort.
Simply replace the older SQLite version with the newer version of SQLite 
and recompile and the application will run faster.  
There are no API changes nor modifications
to compilation procedures.</p>

<p>But as with any query planner change, upgrading to the NGQP does carry
a small risk of introducing performance regressions.  The problem here is
not that the NGQP is incorrect or buggy or inferior to the legacy query
planner.  Given reliable information about the selectivity of indexes, 
the NGQP should always pick a plan that is as good or better than before.
The problem is that some applications may be using low-quality and
low-selectivity indexes without having run [ANALYZE].  The older query
planners look at many fewer possible implementations for each query and 
so they may have stumbled over a good plan by stupid luck.  The NGQP, on 
the other hand, looks at many more query plan possibilities, and it may 
choose a different query plan that
works better in theory, assuming good indexes, but which gives a performance
regression in practice, because of the shape of the data.</p>

<p>Key points:</p>

<ul>
<li><p>The NGQP will always find an equal or better query plan, compared to
    prior query planners, as long as it
    has access to accurate [ANALYZE] data in the [SQLITE_STAT1] file.</p>
<li><p>The NGQP will always find a good query plan 
    as long as the schema does not contain indexes that have more than
    about 10 or 20 rows with the same value in the left-most column of the
    index.</p>
</ul>

<p>Not all applications meet these conditions.  Fortunately,
the NGQP will still usually find good query plans, even without these conditions.
However, cases do arise (rarely) where performance regressions can occur.</p>
................................................................................
Intuitively, we humans understand that algorithm-1 is best.
Each check-in is likely to have few children (one child is
the most common case) and each child can be tested for the
$trunk tag in logarithmic time.  Indeed, algorithm-1 is the
faster choice in practice.  But the NGQP has no intuition.  The
NGQP must use hard math, and algorithm-2 is slightly
better mathematically.  This is because, in the absence of other information,
the NGQP must assume that the indexes PLINK_I1 and TAGXREF_I1 are of
equal quality and are equally selective.  Algorithm-2 uses one field
of the TAGXREF_I1 index and both fields of the PLINK_I1 index whereas
algorithm-1 only uses the first field of each index.  Since 
algorithm-2 uses more index material, the NGQP is correct
to judge it to be the better algorithm.  The scores are close and
algorithm-2 just barely squeaks ahead of algorithm-1.  But
algorithm-2 really is the correct choice here.
................................................................................

<p>
Unfortunately, algorithm-2 is slower than algorithm-1 in
this application.
</p>

<p>
The problem is that the indexes are not of equal quality.
A check-in is likely to only have one child.  So the first
field of PLINK_I1 will usually narrow down the search to just a single
row.  But there are thousands and thousands check-ins tagged with "trunk", 
so the first field of TAGXREF_I1 will be
of little help in narrowing down the search.
</p>

<p>
The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
gathers statistics on the quality of the various indexes and stores those
statistics in [SQLITE_STAT1] table.  
Having access to this statistical information,
the NGQP easily chooses algorithm-1 as the best algorithm, by a wide
margin.</p>

<p>Why didn't the legacy query planner choose algorithm-2?
Easy: because the NN algorithm
................................................................................
<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run across any problems in your application.
If you are not having performance issues, you do not need to worry
about any of this.</p>

<li><p><b>Create appropriate indexes.</b>
Most SQL performance problems arise not because of query planner issues
but rather due to lack of appropriate indexes.  Make sure indexes are
available to assist all large queries.  Most performance issues can be
resolved by one or two CREATE INDEX commands and with no changes to
application code.</p>

<li><p><b>Avoid creating low-quality indexes.</b>.
A low-quality index (for the purpose of this checklist) is one where
there are more than 10 or 20 rows in the table that have the same value
for the left-most column of the index.  In particular, avoid using
boolean or "enum" columns as the left-most columns of your indexes.</p>

<p>The Fossil performance problem described in the previous section of
this document arose because there were over
ten-thousand entries in the TAGXREF table with the same value for the 
left-most column (the TAGID column) of the TAGXREF_I1 index.</p>

<li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b>
Low-quality indexes will not confuse the query planner as long as the
query planner knows that the indexes are of low quality.  And the way
the query planner knows this is by the content of the [SQLITE_STAT1] table,
which is computed by the ANALYZE command.</p>

<p>Of course, ANALYZE only works effectively if you have a significant 
amount of content in your database in the first place.  When creating a 
new database that you expect to accumulate a lot of data, you can run 
the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table,
................................................................................

<li><p><b>Instrument your code.</b>
Add logic that lets you know quickly and easily which queries are taking
too much time.  Then work on just those specific queries.</p>

<li><p><b>Use [unlikely()] and [likelihood()] SQL functions.</b>
SQLite normally assumes that terms in the WHERE clause that cannot be used
by indexes have a strong probability of being true.  If this assumption
is incorrect, it could lead to a suboptimal query plan.  The
[unlikely()] and [likelihood()] SQL functions can be used to provide
hints to the query planner about WHERE clause terms that are probably
not true, and thus aid the query planner in selecting the best possible
plan.

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indexes in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an outer loop relative to the table on the right.</p>

<p>Avoid this step if possible, as it defeats one of the huge advantages
of the whole SQL language concept, specifically that the application 
programmer does not need to get involved with query planning.  If you
................................................................................
Avoid using this trick if at all possible, and especially avoid it
early in the application development cycle.  Beware that
adding a unary "+" operator to an equality expression might change
the result of that expression if 
<a href="datatype3.html#affinity">type affinity</a> is involved.</p>

<li><p><b>Use the [INDEXED BY] syntax to enforce the selection of
particular indexes on problem queries.</b>
As with the previous two bullets, avoid this step if possible, and 
especially avoid doing this early in development as it is clearly a
premature optimization.</p>
</ol>

<h1> Summary</h1>