Documentation Source Text

Check-in [641e107a66]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Lots of tweaking and enhancements to the documentation, especially regarding the query planner and ways of controlling the query planner.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 641e107a6680cc33216443b007804205773ef0d5
User & Date: drh 2013-07-01 16:07:50.618
Context
2013-07-01
20:52
Fix a typo in an example in the fts4aux virtual table documentation. (check-in: 6743aed594 user: drh tags: trunk)
16:07
Lots of tweaking and enhancements to the documentation, especially regarding the query planner and ways of controlling the query planner. (check-in: 641e107a66 user: drh tags: trunk)
2013-06-26
14:35
Updates to the changes.html page. Fix typos on the NGQP documentation. (check-in: e0c107dec3 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/changes.in.
44
45
46
47
48
49
50
51






52
53
54
55
56
57
58

chng {2013-08-15 (3.8.0)} {
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.
<li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()].
<li>Added the "percentile()" function as a loadable extension in the ext/misc
    subdirectory of the source tree.
<li>






}

chng {2013-05-20 (3.7.17)} {
<li>Add support for [memory-mapped I/O].
<li>Add the [sqlite3_strglob()] convenience interface.
<li>Assigned the integer at offset 68 in the [database header] as the
    [Application ID] for when SQLite is used as an [application file-format].







|
>
>
>
>
>
>







44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

chng {2013-08-15 (3.8.0)} {
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.
<li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()].
<li>Added the "percentile()" function as a loadable extension in the ext/misc
    subdirectory of the source tree.
<li>A running SELECT statement that lacks a FROM clause (or any other statement that
    never reads or writes from any database file) will not prevent a read
    transaction from closing.
<li>Add the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option that, if set to 0,
    disables automatic indices by default.
<li>Issue an [SQLITE_WARNING_AUTOINDEX] warning on the [SQLITE_CONFIG_LOG] whenever
    the query planner uses an automatic index.
}

chng {2013-05-20 (3.7.17)} {
<li>Add support for [memory-mapped I/O].
<li>Add the [sqlite3_strglob()] convenience interface.
<li>Assigned the integer at offset 68 in the [database header] as the
    [Application ID] for when SQLite is used as an [application file-format].
Changes to pages/compile.in.
32
33
34
35
36
37
38










39
40
41
42
43
44
45
    hd_fragment [string tolower $all]
    hd_keywords $all
  }
  hd_puts <p><b>$name</b></p>
  regsub -all "\n\\s*\n" $text "</p>\n\n<p>" text
  hd_resolve <blockquote><p>$text</p></blockquote>
}











COMPILE_OPTION {SQLITE_DEFAULT_AUTOVACUUM=<i>&lt;0 or 1 or 2&gt;</i>} {
  This macro determines if SQLite creates databases with the 
  [auto_vacuum] flag set by default to OFF (0), FULL (1), or
  INCREMENTAL (2). The default value is 0 meaning that databases
  are created with auto-vacuum turned off.
  In any case the compile-time default may be overridden by the 







>
>
>
>
>
>
>
>
>
>







32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
    hd_fragment [string tolower $all]
    hd_keywords $all
  }
  hd_puts <p><b>$name</b></p>
  regsub -all "\n\\s*\n" $text "</p>\n\n<p>" text
  hd_resolve <blockquote><p>$text</p></blockquote>
}

COMPILE_OPTION {SQLITE_DEFAULT_AUTOMATIC_INDEX=<i>&lt;0 or 1&gt;</i>} {
  This macro determines the initial setting for [PRAGMA automatic_index]
  for newly opened [database connections].
  For all versions of SQLite through 3.7.17,
  automatic indices are normally enabled for new database connections if
  this compile-time option is omitted.
  However, that might change in future releases of SQLite.
  <p>See also: [SQLITE_OMIT_AUTOMATIC_INDEX]
}

COMPILE_OPTION {SQLITE_DEFAULT_AUTOVACUUM=<i>&lt;0 or 1 or 2&gt;</i>} {
  This macro determines if SQLite creates databases with the 
  [auto_vacuum] flag set by default to OFF (0), FULL (1), or
  INCREMENTAL (2). The default value is 0 meaning that databases
  are created with auto-vacuum turned off.
  In any case the compile-time default may be overridden by the 
746
747
748
749
750
751
752

753
754
755
756
757
758
759
  to invoke [sqlite3_initialize()] directly prior to beginning use of the
  SQLite library.
}

COMPILE_OPTION {SQLITE_OMIT_AUTOMATIC_INDEX} {
  This option is used to omit the 
  [automatic indexing] functionality.

}

COMPILE_OPTION {SQLITE_OMIT_AUTORESET} {
  By default, the [sqlite3_step()] interface will automatically invoke
  [sqlite3_reset()] to reset the [prepared statement] if necessary.  This
  compile-time option changes that behavior so that [sqlite3_step()] will
  return [SQLITE_MISUSE] if it called again after returning anything other







>







756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
  to invoke [sqlite3_initialize()] directly prior to beginning use of the
  SQLite library.
}

COMPILE_OPTION {SQLITE_OMIT_AUTOMATIC_INDEX} {
  This option is used to omit the 
  [automatic indexing] functionality.
  See also: [SQLITE_DEFAULT_AUTOMATIC_INDEX].
}

COMPILE_OPTION {SQLITE_OMIT_AUTORESET} {
  By default, the [sqlite3_step()] interface will automatically invoke
  [sqlite3_reset()] to reset the [prepared statement] if necessary.  This
  compile-time option changes that behavior so that [sqlite3_step()] will
  return [SQLITE_MISUSE] if it called again after returning anything other
Changes to pages/features.in.
39
40
41
42
43
44
45
46

47
48
49
50
51
52
53
54
55
56
57
</ul>
</p>

<h2>Suggested Uses For SQLite:</h2>

<p><ul>
<li><p><b>Application File Format.</b>
Rather than using fopen() to write XML or some proprietary format into

disk files used by your application, use an SQLite database instead.
You'll avoid having to write and troubleshoot a parser, your data
will be more easily accessible and cross-platform, and your updates
will be transactional.</p></li>

<li><p><b>Database For Gadgets.</b>
SQLite is popular choice for the database engine in cellphones,
PDAs, MP3 players, set-top boxes, and other electronic gadgets.
SQLite has a small code footprint, makes efficient use of memory,
disk space, and disk bandwidth, is highly reliable, and requires
no maintenance from a Database Administrator.</p></li>







|
>
|


|







39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
</ul>
</p>

<h2>Suggested Uses For SQLite:</h2>

<p><ul>
<li><p><b>Application File Format.</b>
Rather than using fopen() to write XML, JSON, CSV,
or some proprietary format into
disk files used by your application, use an SQLite database.
You'll avoid having to write and troubleshoot a parser, your data
will be more easily accessible and cross-platform, and your updates
will be transactional.  ([application file-format | more...])</p></li>

<li><p><b>Database For Gadgets.</b>
SQLite is popular choice for the database engine in cellphones,
PDAs, MP3 players, set-top boxes, and other electronic gadgets.
SQLite has a small code footprint, makes efficient use of memory,
disk space, and disk bandwidth, is highly reliable, and requires
no maintenance from a Database Administrator.</p></li>
Changes to pages/fileformat2.in.
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168

<p>^Application code is allowed to modify the sqlite_sequence table, to add
new rows, to delete rows, or to modify existing rows.  ^However, application
code cannot create the sqlite_sequence table if it does not already exist.
^Application code can delete all entries from the sqlite_sequence table,
but application code cannot drop the sqlite_sequence table.

<tcl>hd_fragment stat1tab {sqlite_stat1}</tcl>
<h4>2.5.3 The sqlite_stat1 table</h4>

<p>^The sqlite_stat1 is an internal table created by the [ANALYZE] command
and used to hold supplemental information about tables and indices that the
query planner can use to help it find better ways of performing queries.
^Applications can update, delete from, insert into or drop the sqlite_stat1
table, but may not create or alter the sqlite_stat1 table.







|







1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168

<p>^Application code is allowed to modify the sqlite_sequence table, to add
new rows, to delete rows, or to modify existing rows.  ^However, application
code cannot create the sqlite_sequence table if it does not already exist.
^Application code can delete all entries from the sqlite_sequence table,
but application code cannot drop the sqlite_sequence table.

<tcl>hd_fragment stat1tab {sqlite_stat1} SQLITE_STAT1 </tcl>
<h4>2.5.3 The sqlite_stat1 table</h4>

<p>^The sqlite_stat1 is an internal table created by the [ANALYZE] command
and used to hold supplemental information about tables and indices that the
query planner can use to help it find better ways of performing queries.
^Applications can update, delete from, insert into or drop the sqlite_stat1
table, but may not create or alter the sqlite_stat1 table.
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
Conceptually, the index space is divided into
10 uniform buckets and the samples are the middle row from each bucket.

<p>The format for sqlite_stat2 is recorded here for legacy reference.  
Recent versions of SQLite no longer support sqlite_stat2 and the
sqlite_stat2 table, it is exists, is simply ignored.

<tcl>hd_fragment stat3tab {sqlite_stat3}</tcl>
<h4>2.5.5 The sqlite_stat3 table</h4>

<p>The sqlite_stat3 is only created and is only used if SQLite is compiled
with [SQLITE_ENABLE_STAT3] and if the SQLite version number is
3.7.9 or greater.  The sqlite_stat3 table is neither read nor written by any
version of SQLite before 3.6.9.
The sqlite_stat3 table contains additional information







|







1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
Conceptually, the index space is divided into
10 uniform buckets and the samples are the middle row from each bucket.

<p>The format for sqlite_stat2 is recorded here for legacy reference.  
Recent versions of SQLite no longer support sqlite_stat2 and the
sqlite_stat2 table, it is exists, is simply ignored.

<tcl>hd_fragment stat3tab {sqlite_stat3} SQLITE_STAT3</tcl>
<h4>2.5.5 The sqlite_stat3 table</h4>

<p>The sqlite_stat3 is only created and is only used if SQLite is compiled
with [SQLITE_ENABLE_STAT3] and if the SQLite version number is
3.7.9 or greater.  The sqlite_stat3 table is neither read nor written by any
version of SQLite before 3.6.9.
The sqlite_stat3 table contains additional information
Changes to pages/lang.in.
217
218
219
220
221
222
223
224


225
226
227
228
229
230
231
232
233
234


235
236
237
238
239
240
241
[INSERT], and [UPDATE] commands.
^(The [DROP TABLE] command works on sqlite_stat1 and
sqlite_stat3 as of SQLite version 3.7.9.)^
Appropriate care should be used when changing the content of the statistics
tables as invalid content can cause SQLite to select inefficient
query plans.  Generally speaking, one should not modify the content of
the statistics tables by any mechanism other than invoking the
ANALYZE command.</p>



<p> ^Statistics gathered by ANALYZE are <u>not</u> automatically updated as
the content of the database changes.  If the content of the database
changes significantly, or if the database schema changes, then one should
consider rerunning the ANALYZE command in order to update the statistics.</p>

<p> The query planner might not notice manual changes to the
sqlite_stat1 and/or sqlite_stat3 tables.  ^An application
can force the query planner to reread the statistics tables by running
<b>ANALYZE sqlite_master</b>. </p>



<tcl>
##############################################################################
Section {ATTACH DATABASE} attach *ATTACH

BubbleDiagram attach-stmt 1
</tcl>







|
>
>

|








>
>







217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
[INSERT], and [UPDATE] commands.
^(The [DROP TABLE] command works on sqlite_stat1 and
sqlite_stat3 as of SQLite version 3.7.9.)^
Appropriate care should be used when changing the content of the statistics
tables as invalid content can cause SQLite to select inefficient
query plans.  Generally speaking, one should not modify the content of
the statistics tables by any mechanism other than invoking the
ANALYZE command.  
See "[Manual Control Of Query Plans Using SQLITE_STAT Tables]" for
further information.</p>

<p> ^Statistics gathered by ANALYZE are not automatically updated as
the content of the database changes.  If the content of the database
changes significantly, or if the database schema changes, then one should
consider rerunning the ANALYZE command in order to update the statistics.</p>

<p> The query planner might not notice manual changes to the
sqlite_stat1 and/or sqlite_stat3 tables.  ^An application
can force the query planner to reread the statistics tables by running
<b>ANALYZE sqlite_master</b>. </p>

<p> 

<tcl>
##############################################################################
Section {ATTACH DATABASE} attach *ATTACH

BubbleDiagram attach-stmt 1
</tcl>
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
3030
3031
3032
3033
3034
3035
3036
3037
3038
3039
<i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of
<i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a
dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^

<p>^If the join-op is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. ^(The "CROSS JOIN" join operator produces the same data as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
<a href=optoverview.html#manctrl>handled slightly differently by the query
optimizer</a>. Otherwise, it is the cartesian product modified 
according to one or more of the following bullet points: 

<ul>
  <li> <p>^(If there is an ON clause specified, then the ON expression is
       evaluated for each row of the cartesian product as a 
       [boolean expression]. All rows for which the expression evaluates to 
       false are excluded from the dataset.)^








|
<
<
<
<
|







3024
3025
3026
3027
3028
3029
3030
3031




3032
3033
3034
3035
3036
3037
3038
3039
<i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of
<i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a
dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^

<p>^If the join-op is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma
(",") and there is no ON or USING clause, then the result of the join is
simply the cartesian product of the left and right-hand datasets. 
If join-op does have ON or USING clauses, those are handled according to




the following bullet points:

<ul>
  <li> <p>^(If there is an ON clause specified, then the ON expression is
       evaluated for each row of the cartesian product as a 
       [boolean expression]. All rows for which the expression evaluates to 
       false are excluded from the dataset.)^

3072
3073
3074
3075
3076
3077
3078
3079
















3080
3081
3082
3083

3084
3085
3086
3087
3088
3089
3090
       dataset.  
</ul>

<p>^(When more than two tables are joined together as part of a FROM clause,
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^
       

















<p><b>2. WHERE clause filtering.</b>
<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>


<p>^(If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the input data as a [boolean expression]. All rows for which the
WHERE clause expression evaluates to false are excluded from the dataset before
continuing.)^

<p><b>3. Generation of the set of result rows.</b>







|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
|


>







3072
3073
3074
3075
3076
3077
3078
3079
3080
3081
3082
3083
3084
3085
3086
3087
3088
3089
3090
3091
3092
3093
3094
3095
3096
3097
3098
3099
3100
3101
3102
3103
3104
3105
3106
3107
       dataset.  
</ul>

<p>^(When more than two tables are joined together as part of a FROM clause,
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^

<tcl>hd_fragment crossjoin {treats the CROSS JOIN operator specially}</tcl>
<p><b>Side note: Special handling of CROSS JOIN.</b>
^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. They are completely interchangable in SQLite.
^(The "CROSS JOIN" join operator produces the same result as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
<a href=optoverview.html#crossjoin>handled differently by the query
optimizer</a> in that it prevents the query optimizer from reordering
the tables in the join.  An application programmer can use the CROSS JOIN 
operator to directly influense the algorithm that is chosen to implement
the SELECT statement.  Avoid using CROSS JOIN except in specific situations 
where manual control of the query optimizer is desired.  Avoid using
CROSS JOIN early in the development of an application as doing so is
a <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature
optimization</a>.  The special handling of CROSS JOIN is an SQLite-specific
feature and is not a part of standard SQL.
       

<tcl>hd_fragment whereclause</tcl>
<tcl>hd_keywords {WHERE clause}</tcl>
<p><b>2. WHERE clause filtering.</b>

<p>^(If a WHERE clause is specified, the WHERE expression is evaluated for 
each row in the input data as a [boolean expression]. All rows for which the
WHERE clause expression evaluates to false are excluded from the dataset before
continuing.)^

<p><b>3. Generation of the set of result rows.</b>
3469
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479

3480
3481
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495


<tcl>
##############################################################################
Section {INDEXED BY} indexedby {{INDEXED BY} {NOT INDEXED}}

</tcl>
<p>^The INDEXED BY phrase is an SQL extension found only in SQLite which can
be used to verify that the correct indices are being used on a [DELETE],
[SELECT], or [UPDATE] statement.
^The INDEXED BY phrase always follows the name of a table that SQLite will

be reading.  The INDEXED BY phrase can be seen in the following syntax
diagrams:</p>

<tcl>
BubbleDiagram qualified-table-name
BubbleDiagram single-source
</tcl>

<p>^The "INDEXED BY index-name" clause specifies that the named index
must be used in order to look up values on the preceding table.
^If index-name does not exist or cannot be used for the query, then
the preparation of the SQL statement fails.
^(The "NOT INDEXED" clause specifies that no index shall be used when
accessing the preceding table, including implied indices create by
UNIQUE and PRIMARY KEY constraints.  However, the INTEGER PRIMARY KEY
can still be used to look up entries even when "NOT INDEXED" is specified.)^</p>







|
<
|
|
>
|







|







3486
3487
3488
3489
3490
3491
3492
3493

3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
3508
3509
3510
3511
3512


<tcl>
##############################################################################
Section {INDEXED BY} indexedby {{INDEXED BY} {NOT INDEXED}}

</tcl>
<p>^The INDEXED BY phrase forces the [SQLite query planner] to use a

particular named index on a [DELETE], [SELECT], or [UPDATE] statement.
The INDEXED BY phrase is an extension that is particular to SQLite and
is not portable to other SQL database engines.
The INDEXED BY phrase can be seen in the following syntax
diagrams:</p>

<tcl>
BubbleDiagram qualified-table-name
BubbleDiagram single-source
</tcl>

<p>^The "INDEXED BY index-name" phrase specifies that the named index
must be used in order to look up values on the preceding table.
^If index-name does not exist or cannot be used for the query, then
the preparation of the SQL statement fails.
^(The "NOT INDEXED" clause specifies that no index shall be used when
accessing the preceding table, including implied indices create by
UNIQUE and PRIMARY KEY constraints.  However, the INTEGER PRIMARY KEY
can still be used to look up entries even when "NOT INDEXED" is specified.)^</p>
3512
3513
3514
3515
3516
3517
3518













3519
3520
3521
3522
3523
3524

3525
3526
3527
3528
3529
3530
3531
Developers are admonished to omit all use of INDEXED BY during
application design, implementation, testing, and tuning.  If
INDEXED BY is to be used at all, it should be inserted at the very
end of the development process when "locking down" a design.</p>

<h3>See Also:</h3>














<p>The [sqlite3_stmt_status()] C/C++ interface together with the
[SQLITE_STMTSTATUS_FULLSCAN_STEP] and [SQLITE_STMTSTATUS_SORT] verbs
can be used to detect at run-time when an SQL statement is not
making effective use of indices.  Many applications may prefer to
use the [sqlite3_stmt_status()] interface to detect index misuse
rather than the INDEXED BY phrase described here.</p>


<tcl>
#############################################################################
# A list of keywords.  A asterisk occurs after the keyword if it is on
# the fallback list.
#
set keyword_list [lsort {







>
>
>
>
>
>
>
>
>
>
>
>
>
|





>







3529
3530
3531
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
3557
3558
3559
3560
3561
3562
Developers are admonished to omit all use of INDEXED BY during
application design, implementation, testing, and tuning.  If
INDEXED BY is to be used at all, it should be inserted at the very
end of the development process when "locking down" a design.</p>

<h3>See Also:</h3>

<ol>
<li><p>The [query planner checklist] describes steps that application
developers should following to help resolve query planner problems.
Notice the that the use of INDEXED BY is a last resort, to be used only
when all other measures fail.</p>

<li><p>[upluscontrol|The unary + operator]
can be used to disqualify terms in the WHERE clause from use by indices.
Careful use of unary + can sometimes help prevent the query planner from
choosing a poor index without restricting it to using one specific index.
Careful placement of unary + operators is a better method for controlling 
which indices are used a query.</p>

<li><p>The [sqlite3_stmt_status()] C/C++ interface together with the
[SQLITE_STMTSTATUS_FULLSCAN_STEP] and [SQLITE_STMTSTATUS_SORT] verbs
can be used to detect at run-time when an SQL statement is not
making effective use of indices.  Many applications may prefer to
use the [sqlite3_stmt_status()] interface to detect index misuse
rather than the INDEXED BY phrase described here.</p>
</ol>

<tcl>
#############################################################################
# A list of keywords.  A asterisk occurs after the keyword if it is on
# the fallback list.
#
set keyword_list [lsort {
Changes to pages/optoverview.in.
1
2
3
4
5
6
7
8
9
<title>The SQLite Query Optimizer Overview</title>
<tcl>hd_keywords {optimizer} {query planner}</tcl>

<tcl>
proc CODE {text} {
  hd_puts "<blockquote><pre>"
  hd_puts $text
  hd_puts "</pre></blockquote>"
}

|







1
2
3
4
5
6
7
8
9
<title>The SQLite Query Optimizer Overview</title>
<tcl>hd_keywords {optimizer} {query planner} {SQLite query planner}</tcl>

<tcl>
proc CODE {text} {
  hd_puts "<blockquote><pre>"
  hd_puts $text
  hd_puts "</pre></blockquote>"
}
37
38
39
40
41
42
43



44

45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61









62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
  } else {
    set num $level(1)
    for {set i 2} {$i<=$n} {incr i} {
      append num .$level($i)
    }
  }
  incr n 1



  hd_puts "<h$n>$num $name</h$n>"

}

HEADING 0 {The SQLite Query Planner}

PARAGRAPH {
  This document provides overview of how the query planner and optimizer
  for SQLite works.
}

PARAGRAPH {
  Given a single SQL statement, there might be dozens, hundreds, or even
  thousands of ways to implement that statement, depending on the complexity
  of the statement itself and of the underlying database schema.  The 
  task of the query planner is to select an algorithm from among the many
  choices that provides the answer with a minimum of disk I/O and CPU
  overhead.
}










HEADING 1 {WHERE clause analysis} where_clause

PARAGRAPH {
  ^The WHERE clause on a query is broken up into "terms" where each term
  is separated from the others by an AND operator.
  If the WHERE clause is composed of constraints separate by the OR
  operator then the entire clause is considered to be a single "term"
  to which the <a href="#or_opt">OR-clause optimization</a> is applied.
}
PARAGRAPH {
  ^All terms of the WHERE clause are analyzed to see if they can be
  satisfied using indices.
  ^Terms that cannot be satisfied through the use of indices become
  tests that are evaluated against each row of the relevant input
  tables.  ^No tests are done for terms that are completely satisfied by
  indices.  ^Sometimes
  one or more terms will provide hints to indices but still must be
  evaluated against each row of the input tables.
}

PARAGRAPH {
  ^The analysis of a term might cause new "virtual" terms to
  be added to the WHERE clause.  ^Virtual terms can be used with
  indices to restrict a search.  ^But virtual terms never generate code
  that is tested against input rows.
}

PARAGRAPH {
  ^(To be usable by an index a term must be of one of the following
  forms:
}
SYNTAX {
  /column/ = /expression/
  /column/ > /expression/
  /column/ >= /expression/







>
>
>
|
>

















>
>
>
>
>
>
>
>
>













<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
















88
89
90
91
92
93
94
  } else {
    set num $level(1)
    for {set i 2} {$i<=$n} {incr i} {
      append num .$level($i)
    }
  }
  incr n 1
  if {$n==1} {
    hd_puts "<h$n align='center'>$num $name</h$n>"
  } else {
    hd_puts "<h$n>$num $name</h$n>"
  }
}

HEADING 0 {The SQLite Query Planner}

PARAGRAPH {
  This document provides overview of how the query planner and optimizer
  for SQLite works.
}

PARAGRAPH {
  Given a single SQL statement, there might be dozens, hundreds, or even
  thousands of ways to implement that statement, depending on the complexity
  of the statement itself and of the underlying database schema.  The 
  task of the query planner is to select an algorithm from among the many
  choices that provides the answer with a minimum of disk I/O and CPU
  overhead.
}

PARAGRAPH {
  With release 3.8.0, the SQLite query planner was reimplemented as the
  [Next Generation Query Planner] or "NGQP".  All of the features, techniques,
  and algorithms described in this document are applicable to both the
  pre-3.8.0 legacy query planner and to the NGQP.  For further information on
  how the NGQP differs from the legacy query planner, see the 
  [NGQP | detailed description of the NGQP].
}

HEADING 1 {WHERE clause analysis} where_clause

PARAGRAPH {
  ^The WHERE clause on a query is broken up into "terms" where each term
  is separated from the others by an AND operator.
  If the WHERE clause is composed of constraints separate by the OR
  operator then the entire clause is considered to be a single "term"
  to which the <a href="#or_opt">OR-clause optimization</a> is applied.
}
PARAGRAPH {
  ^All terms of the WHERE clause are analyzed to see if they can be
  satisfied using indices.
















  ^(To be usable by an index a term must be of one of the following
  forms:
}
SYNTAX {
  /column/ = /expression/
  /column/ > /expression/
  /column/ >= /expression/
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
CODE {
  CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
}
PARAGRAPH {
  Then the index might be used if the initial columns of the index
  (columns a, b, and so forth) appear in WHERE clause terms.)^
  ^The initial columns of the index must be used with
  the *=* or *IN* operators.  
  ^The right-most column that is used can employ inequalities.  
  ^For the right-most
  column of an index that is used, there can be up to two inequalities
  that must sandwich the allowed values of the column between two extremes.
}
PARAGRAPH {
  ^It is not necessary for every column of an index to appear in a







|







109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
CODE {
  CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
}
PARAGRAPH {
  Then the index might be used if the initial columns of the index
  (columns a, b, and so forth) appear in WHERE clause terms.)^
  ^The initial columns of the index must be used with
  the *=* or *IN* or *IS NULL* operators.  
  ^The right-most column that is used can employ inequalities.  
  ^For the right-most
  column of an index that is used, there can be up to two inequalities
  that must sandwich the allowed values of the column between two extremes.
}
PARAGRAPH {
  ^It is not necessary for every column of an index to appear in a
197
198
199
200
201
202
203
204
205
206
207
208
209


210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
PARAGRAPH {
  ^(If a term of the WHERE clause is of the following form:
}
SYNTAX {
  /expr1/ BETWEEN /expr2/ AND /expr3/
}
PARAGRAPH {
  Then two virtual terms are added as follows:
}
SYNTAX {
  /expr1/ >= /expr2/ AND /expr1/ <= /expr3/
}
PARAGRAPH {)^


  ^If both virtual terms end up being used as constraints on an index,
  then the original BETWEEN term is omitted and the corresponding test
  is not performed on input rows.
  ^Thus if the BETWEEN term ends up being used as an index constraint
  no tests are ever performed on that term.
  ^On the other hand, the
  virtual terms themselves never causes tests to be performed on
  input rows.
  ^Thus if the BETWEEN term is not used as an index constraint and
  instead must be used to test input rows, the <i>expr1</i> expression is
  only evaluated once.
}

HEADING 1 {OR optimizations} or_opt
hd_keywords {or optimization}

PARAGRAPH {
  WHERE clause constraints that are connected by OR instead of AND are
  handled in one of two way.
  ^(If a term consists of multiple subterms containing a common column
  name and separated by OR, like this:
}
SYNTAX {
  /column/ = /expr1/ OR /column/ = /expr2/ OR /column/ = /expr3/ OR ...
}
PARAGRAPH {
  Then that term is rewritten as follows:
}
SYNTAX {
  /column/ IN (/expr1/,/expr2/,/expr3/,/expr4/,...)
}
PARAGRAPH {)^
  ^The rewritten term then might go on to constrain an index using the
  normal rules for *IN* operators.  ^Note that <i>column</i> must be
  the same column in every OR-connected subterm,
  although the column can occur on either the left or the right side of
  the *=* operator.







|





>
>

















|
|










|







194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
PARAGRAPH {
  ^(If a term of the WHERE clause is of the following form:
}
SYNTAX {
  /expr1/ BETWEEN /expr2/ AND /expr3/
}
PARAGRAPH {
  Then two "virtual" terms are added as follows:
}
SYNTAX {
  /expr1/ >= /expr2/ AND /expr1/ <= /expr3/
}
PARAGRAPH {)^
  ^Virtual terms are used for analysis only and do not cause any VDBE
  code to be generated.
  ^If both virtual terms end up being used as constraints on an index,
  then the original BETWEEN term is omitted and the corresponding test
  is not performed on input rows.
  ^Thus if the BETWEEN term ends up being used as an index constraint
  no tests are ever performed on that term.
  ^On the other hand, the
  virtual terms themselves never causes tests to be performed on
  input rows.
  ^Thus if the BETWEEN term is not used as an index constraint and
  instead must be used to test input rows, the <i>expr1</i> expression is
  only evaluated once.
}

HEADING 1 {OR optimizations} or_opt
hd_keywords {or optimization}

PARAGRAPH {
  WHERE clause constraints that are connected by OR instead of AND can
  be handled in two different ways.
  ^(If a term consists of multiple subterms containing a common column
  name and separated by OR, like this:
}
SYNTAX {
  /column/ = /expr1/ OR /column/ = /expr2/ OR /column/ = /expr3/ OR ...
}
PARAGRAPH {
  Then that term is rewritten as follows:
}
SYNTAX {
  /column/ IN (/expr1/,/expr2/,/expr3/,...)
}
PARAGRAPH {)^
  ^The rewritten term then might go on to constrain an index using the
  normal rules for *IN* operators.  ^Note that <i>column</i> must be
  the same column in every OR-connected subterm,
  although the column can occur on either the left or the right side of
  the *=* operator.
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
  *a=5* or *x>y* or they can be LIKE or BETWEEN expressions, or a subterm
  can be a parenthesized list of AND-connected sub-subterms.
  ^Each subterm is analyzed as if it were itself the entire WHERE clause
  in order to see if the subterm is indexable by itself.
  ^If <u>every</u> subterm of an OR clause is separately indexable
  then the OR clause might be coded such that a separate index is used
  to evaluate each term of the OR clause.  One way to think about how
  SQLite uses separate indices foreach each OR clause term is to imagine
  that the WHERE clause where rewritten as follows:
}
SYNTAX {
  rowid IN (SELECT rowid FROM /table/ WHERE /expr1/
            UNION SELECT rowid FROM /table/ WHERE /expr2/
            UNION SELECT rowid FROM /table/ WHERE /expr3/)
}







|







257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
  *a=5* or *x>y* or they can be LIKE or BETWEEN expressions, or a subterm
  can be a parenthesized list of AND-connected sub-subterms.
  ^Each subterm is analyzed as if it were itself the entire WHERE clause
  in order to see if the subterm is indexable by itself.
  ^If <u>every</u> subterm of an OR clause is separately indexable
  then the OR clause might be coded such that a separate index is used
  to evaluate each term of the OR clause.  One way to think about how
  SQLite uses separate indices for each OR clause term is to imagine
  that the WHERE clause where rewritten as follows:
}
SYNTAX {
  rowid IN (SELECT rowid FROM /table/ WHERE /expr1/
            UNION SELECT rowid FROM /table/ WHERE /expr2/
            UNION SELECT rowid FROM /table/ WHERE /expr3/)
}
486
487
488
489
490
491
492







493
494
495

496
497
498
499
500
501
502
  ^Inner joins can be freely reordered.  ^However a left outer join is
  neither commutative nor associative and hence will not be reordered.
  ^Inner joins to the left and right of the outer join might be reordered
  if the optimizer thinks that is advantageous but the outer joins are
  always evaluated in the order in which they occur.
}
PARAGRAPH {







  ^When selecting the order of tables in a join, SQLite uses a greedy
  algorithm that runs in polynomial (O(N&sup2;)) time.  ^Because of this,
  SQLite is able to efficiently plan queries with 50- or 60-way joins.

}
PARAGRAPH {
  Join reordering is automatic and usually works well enough that
  programmers do not have to think about it, especially if [ANALYZE]
  has been used to gather statistics about the available indices.
  But occasionally some hints from the programmer are needed.
  Consider, for example, the following schema:







>
>
>
>
>
>
>
|
|
|
>







485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
  ^Inner joins can be freely reordered.  ^However a left outer join is
  neither commutative nor associative and hence will not be reordered.
  ^Inner joins to the left and right of the outer join might be reordered
  if the optimizer thinks that is advantageous but the outer joins are
  always evaluated in the order in which they occur.
}
PARAGRAPH {
  SQLite [treats the CROSS JOIN operator specially].
  The CROSS JOIN operator is commutative in theory.  But SQLite chooses to
  never reorder tables in a CROSS JOIN.  This provides a mechanism
  by which the programmer can force SQLite to choose a particular loop nesting
  order.  
}
PARAGRAPH {
  ^When selecting the order of tables in a join, SQLite uses an efficient
  polynomial-time algorithm.  ^Because of this,
  SQLite is able to plan queries with 50- or 60-way joins in a matter of
  microseconds.
}
PARAGRAPH {
  Join reordering is automatic and usually works well enough that
  programmers do not have to think about it, especially if [ANALYZE]
  has been used to gather statistics about the available indices.
  But occasionally some hints from the programmer are needed.
  Consider, for example, the following schema:
532
533
534
535
536
537
538

539
540
541
542
543
544
545
546
547
548

549
550
551
552
553
554
555
  This query asks for is all information about edges that go from
  nodes labeled "alice" to nodes labeled "bob".
  The query optimizer in SQLite has basically two choices on how to
  implement this query.  (There are actually six different choices, but
  we will only consider two of them here.)
  Pseudocode below demonstrating these two choices.
}

PARAGRAPH {Option 1:}
CODE {
  foreach n1 where n1.name='alice' do:
    foreach n2 where n2.name='bob' do:
      foreach e where e.orig=n1.id and e.dest=n2.id
        return n1.*, n2.*, e.*
      end
    end
  end
}

PARAGRAPH {Option 2:}
CODE {
  foreach n1 where n1.name='alice' do:
    foreach e where e.orig=n1.id do:
      foreach n2 where n2.id=e.dest and n2.name='bob' do:
        return n1.*, n2.*, e.*
      end







>










>







539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
  This query asks for is all information about edges that go from
  nodes labeled "alice" to nodes labeled "bob".
  The query optimizer in SQLite has basically two choices on how to
  implement this query.  (There are actually six different choices, but
  we will only consider two of them here.)
  Pseudocode below demonstrating these two choices.
}
hd_fragment option1
PARAGRAPH {Option 1:}
CODE {
  foreach n1 where n1.name='alice' do:
    foreach n2 where n2.name='bob' do:
      foreach e where e.orig=n1.id and e.dest=n2.id
        return n1.*, n2.*, e.*
      end
    end
  end
}
hd_fragment option2
PARAGRAPH {Option 2:}
CODE {
  foreach n1 where n1.name='alice' do:
    foreach e where e.orig=n1.id do:
      foreach n2 where n2.id=e.dest and n2.name='bob' do:
        return n1.*, n2.*, e.*
      end
598
599
600
601
602
603
604
605

606
607
608
609
610
611
612
613
614
615

616
617
618
619
620
621
622
623
624
625
626
627
628




629
630
631


632
633





634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663

664
665
666
667
668
669
670
671






672
673
674
675
676
677
678
  SQLite choose by default?  ^(As of version 3.6.18, without running [ANALYZE],
  SQLite will choose option 2.)^
  ^But if the [ANALYZE] command is run in order to gather statistics,
  a different choice might be made if the statistics indicate that the
  alternative is likely to run faster.
}

HEADING 2 {Manual Control Of Query Plans} manctrl


PARAGRAPH {
  SQLite provides the ability for advanced programmers to exercise control
  over the query plan chosen by the optimizer. One method for doing this
  is to fudge the [ANALYZE] results in the <b>sqlite_stat1</b> and
  <b>sqlite_stat2</b> tables.  That approach is not recommended except
  for the one scenario described in the following paragraph.
}
PARAGRAPH {
  For a program that uses an SQLite database as its application file

  format, when a new database instances is first created the [ANALYZE]
  command is ineffective because the database contain no data from which
  to gather statistics.  In that case, one could construct a large prototype
  database containing typical data during development and run the 
  [ANALYZE] command on this prototype database to gather statistics,
  then save the prototype statistics as part of the application.
  After deployment, when the application goes to create a new database file,
  it can run the [ANALYZE] command in order to create the <b>sqlite_stat1</b>
  and <b>sqlite_stat2</b> tables, then copy the precomputed statistics obtained
  from the prototype database into these new statistics tables.
  In that way, statistics from large working data sets can be preloaded
  into newly created application files.
}




PARAGRAPH {
  If you really must take manual control of join loop nesting order,
  the preferred method is to use some peculiar (though valid) SQL syntax


  to specify the join.  ^If you use the keyword CROSS in a join, then 
  the two tables connected by that join will not be reordered.





  ^(So in the query, the optimizer is free to reorder the tables of
  the FROM clause anyway it sees fit:
}
CODE {
  SELECT *
    FROM node AS n1,
         edge AS e,
         node AS n2
   WHERE n1.name = 'alice'
     AND n2.name = 'bob'
     AND e.orig = n1.id
     AND e.dest = n2.id;
}
PARAGRAPH {)^
  ^(But in the following logically equivalent formulation of the query,
  the substitution of "CROSS JOIN" for the "," means that the order
  of tables must be N1, E, N2.
}
CODE {
  SELECT *
    FROM node AS n1 CROSS JOIN
         edge AS e CROSS JOIN
         node AS n2
   WHERE n1.name = 'alice'
     AND n2.name = 'bob'
     AND e.orig = n1.id
     AND e.dest = n2.id;
}
PARAGRAPH {)^
  Hence, in the second form, the query plan must be option 2.  ^Note that

  you must use the keyword CROSS in order to disable the table reordering
  optimization; INNER JOIN, NATURAL JOIN, JOIN, and other similar
  combinations work just like a comma join in that the optimizer is
  free to reorder tables as it sees fit. (Table reordering is also
  disabled on an outer join, but that is because outer joins are not
  associative or commutative. Reordering tables in outer joins changes
  the result.)
}







HEADING 1 {Choosing between multiple indices} multi_index

PARAGRAPH {
  Each table in the FROM clause of a query can use at most one index
  (except when the <a href="#or_opt">OR-clause optimization</a> comes into
  play)







|
>




|
|
|


|
>
|






|
|




>
>
>
>

<
<
>
>
|
<
>
>
>
>
>
|
|












|














|
>
|




|


>
>
>
>
>
>







607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644


645
646
647

648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
  SQLite choose by default?  ^(As of version 3.6.18, without running [ANALYZE],
  SQLite will choose option 2.)^
  ^But if the [ANALYZE] command is run in order to gather statistics,
  a different choice might be made if the statistics indicate that the
  alternative is likely to run faster.
}

HEADING 2 {Manual Control Of Query Plans Using SQLITE_STAT Tables} \
  manctrl {Manual Control Of Query Plans Using SQLITE_STAT Tables}

PARAGRAPH {
  SQLite provides the ability for advanced programmers to exercise control
  over the query plan chosen by the optimizer. One method for doing this
  is to fudge the [ANALYZE] results in the [sqlite_stat1] and
  [sqlite_stat3] tables.  That approach is not recommended except
  for the one scenario described in the next paragraph.
}
PARAGRAPH {
  For a program that uses an SQLite database as its 
  [application file-format],
  when a new database instances is first created the [ANALYZE]
  command is ineffective because the database contain no data from which
  to gather statistics.  In that case, one could construct a large prototype
  database containing typical data during development and run the 
  [ANALYZE] command on this prototype database to gather statistics,
  then save the prototype statistics as part of the application.
  After deployment, when the application goes to create a new database file,
  it can run the [ANALYZE] command in order to create the [sqlite_stat1]
  and [sqlite_stat3] tables, then copy the precomputed statistics obtained
  from the prototype database into these new statistics tables.
  In that way, statistics from large working data sets can be preloaded
  into newly created application files.
}

HEADING 2 {Manual Control Of Query Plans Using CROSS JOIN} \
  crossjoin {Manual Control Of Query Plans Using CROSS JOIN} {CROSS JOIN}

PARAGRAPH {


  Programmers can force SQLite to use a particular loop nesting order
  for a join by using the CROSS JOIN operator instead of just JOIN, 
  INNER JOIN, NATURAL JOIN, or a "," join.  Though CROSS JOINs are

  communitive in theory, SQLite chooses to never reorder the tables in
  a CROSS JOIN.  Hence, the left table of a CROSS JOIN will always be
  in an outer loop relative to the right table.
}
PARAGRAPH {
  ^(In the following query, the optimizer is free to reorder the 
  tables of FROM clause anyway it sees fit:
}
CODE {
  SELECT *
    FROM node AS n1,
         edge AS e,
         node AS n2
   WHERE n1.name = 'alice'
     AND n2.name = 'bob'
     AND e.orig = n1.id
     AND e.dest = n2.id;
}
PARAGRAPH {)^
  ^(But in the following logically equivalent formulation of the same query,
  the substitution of "CROSS JOIN" for the "," means that the order
  of tables must be N1, E, N2.
}
CODE {
  SELECT *
    FROM node AS n1 CROSS JOIN
         edge AS e CROSS JOIN
         node AS n2
   WHERE n1.name = 'alice'
     AND n2.name = 'bob'
     AND e.orig = n1.id
     AND e.dest = n2.id;
}
PARAGRAPH {)^
  In the latter query, the query plan must be 
  <a href="#option2">option 2</a>.  ^Note that
  you must use the keyword "CROSS" in order to disable the table reordering
  optimization; INNER JOIN, NATURAL JOIN, JOIN, and other similar
  combinations work just like a comma join in that the optimizer is
  free to reorder tables as it sees fit. (Table reordering is also
  disabled on an outer join, but that is because outer joins are not
  associative or commutative. Reordering tables in in OUTER JOIN changes
  the result.)
}
PARAGRAPH {
  See "[The Fossil NGQP Upgrade Case Study]" for another real-world example
  of using CROSS JOIN to manually control the nesting order of a join.
  The [query planner checklist] found later in the same document provides
  further guidance on manual control of the query planner.
}

HEADING 1 {Choosing between multiple indices} multi_index

PARAGRAPH {
  Each table in the FROM clause of a query can use at most one index
  (except when the <a href="#or_opt">OR-clause optimization</a> comes into
  play)
710
711
712
713
714
715
716
717
718
719
720
721
722


723
724
725
726
727
728
729
730
731
732
733
734
735
  changes so after making significant changes it might be prudent to
  rerun [ANALYZE].
  ^The results of an ANALYZE command are only available to database connections
  that are opened after the ANALYZE command completes.
}
PARAGRAPH {
  The various <b>sqlite_stat</b><i>N</i> tables contain information on how
  selective the various indices are.  ^(For example, the <b>sqlite_stat1</b>
  table might indicate that an equality constraint on column x reduces the
  search space to 10 rows on average, whereas an equality constraint on
  column y reduces the search space to 3 rows on average.  In that case,
  SQLite would prefer to use index ex2i2 since that index.)^
}


PARAGRAPH {
  ^Terms of the WHERE clause can be manually disqualified for use with
  indices by prepending a unary *+* operator to the column name.  ^The
  unary *+* is a no-op and will not slow down the evaluation of the test
  specified by the term.
  But it will prevent the term from constraining an index.
  ^(So, in the example above, if the query were rewritten as:
}
CODE {
  SELECT z FROM ex2 WHERE +x=5 AND y=6;
}
PARAGRAPH {
  The *+* operator on the *x* column will prevent that term from 







|



|

>
>



|
|
|







736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
  changes so after making significant changes it might be prudent to
  rerun [ANALYZE].
  ^The results of an ANALYZE command are only available to database connections
  that are opened after the ANALYZE command completes.
}
PARAGRAPH {
  The various <b>sqlite_stat</b><i>N</i> tables contain information on how
  selective the various indices are.  ^(For example, the [sqlite_stat1]
  table might indicate that an equality constraint on column x reduces the
  search space to 10 rows on average, whereas an equality constraint on
  column y reduces the search space to 3 rows on average.  In that case,
  SQLite would prefer to use index ex2i2 since that index is more selective.)^
}
HEADING 2 {Disqualifying WHERE Clause Terms Using Unary-"+"} \
   {uplus} {*upluscontrol}
PARAGRAPH {
  ^Terms of the WHERE clause can be manually disqualified for use with
  indices by prepending a unary *+* operator to the column name.  ^The
  unary *+* is a no-op and will not generate any byte code in the prepared
  statement.
  But the unary *+* operator will prevent the term from constraining an index.
  ^(So, in the example above, if the query were rewritten as:
}
CODE {
  SELECT z FROM ex2 WHERE +x=5 AND y=6;
}
PARAGRAPH {
  The *+* operator on the *x* column will prevent that term from 
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
  reduce the search space by a factor of only 10.  So the ex2i1 index
  should be preferred.
}
PARAGRAPH {
  ^SQLite will make this determination, but only if it has been compiled
  with [SQLITE_ENABLE_STAT3].  ^The [SQLITE_ENABLE_STAT3] option causes
  the [ANALYZE] command to collect a histogram of column content in the
  <b>sqlite_stat3</b> table and to use this histogram to make a better
  guess at the best query to use for range constraints such as the above.
}
PARAGRAPH {
  ^The histogram data is only useful if the right-hand side of the constraint
  is a simple compile-time constant or [parameter] and not an expression.
}
PARAGRAPH {







|







796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
  reduce the search space by a factor of only 10.  So the ex2i1 index
  should be preferred.
}
PARAGRAPH {
  ^SQLite will make this determination, but only if it has been compiled
  with [SQLITE_ENABLE_STAT3].  ^The [SQLITE_ENABLE_STAT3] option causes
  the [ANALYZE] command to collect a histogram of column content in the
  [sqlite_stat3] table and to use this histogram to make a better
  guess at the best query to use for range constraints such as the above.
}
PARAGRAPH {
  ^The histogram data is only useful if the right-hand side of the constraint
  is a simple compile-time constant or [parameter] and not an expression.
}
PARAGRAPH {
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812






813
814
815
816
817
818
819
820
821
822
823










824
825
826
827
828
829
830
PARAGRAPH {
  Here the inequalities are on columns x and y which are not the
  left-most index columns.  ^Hence, the histogram data which is collected no
  left-most column of indices is useless in helping to choose between the
  range constraints on columns x and y.
}

HEADING 1 {Avoidance of table lookups} index_only

PARAGRAPH {
  When doing an indexed lookup of a row, the usual procedure is to
  do a binary search on the index to find the index entry, then extract
  the [rowid] from the index and use that [rowid] to do a binary search on
  the original table.  Thus a typical indexed lookup involves two
  binary searches.
  ^If, however, all columns that were to be fetched from the table are
  already available in the index itself, SQLite will use the values
  contained in the index and will never look up the original table
  row.  This saves one binary search for each row and can make many
  queries run twice as fast.
}







HEADING 1 {ORDER BY optimizations} order_by

PARAGRAPH {
  ^SQLite attempts to use an index to satisfy the ORDER BY clause of a
  query when possible.
  ^When faced with the choice of using an index to satisfy WHERE clause
  constraints or satisfying an ORDER BY clause, SQLite does the same
  work analysis described above
  and chooses the index that it believes will result in the fastest answer.











}

HEADING 1 {Subquery flattening} flattening
hd_keywords {flattening optimization}

PARAGRAPH {
  When a subquery occurs in the FROM clause of a SELECT, the simplest







|













>
>
>
>
>
>










|
>
>
>
>
>
>
>
>
>
>







820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
PARAGRAPH {
  Here the inequalities are on columns x and y which are not the
  left-most index columns.  ^Hence, the histogram data which is collected no
  left-most column of indices is useless in helping to choose between the
  range constraints on columns x and y.
}

HEADING 1 {Covering Indices}

PARAGRAPH {
  When doing an indexed lookup of a row, the usual procedure is to
  do a binary search on the index to find the index entry, then extract
  the [rowid] from the index and use that [rowid] to do a binary search on
  the original table.  Thus a typical indexed lookup involves two
  binary searches.
  ^If, however, all columns that were to be fetched from the table are
  already available in the index itself, SQLite will use the values
  contained in the index and will never look up the original table
  row.  This saves one binary search for each row and can make many
  queries run twice as fast.
}

PARAGRAPH {
  When an index contains all of the data needed for a query and when the
  original table never needs to be consulted, we call that index a
  "covering index".
}

HEADING 1 {ORDER BY optimizations} order_by

PARAGRAPH {
  ^SQLite attempts to use an index to satisfy the ORDER BY clause of a
  query when possible.
  ^When faced with the choice of using an index to satisfy WHERE clause
  constraints or satisfying an ORDER BY clause, SQLite does the same
  work analysis described above
  and chooses the index that it believes will result in the fastest answer.
}

PARAGRAPH {
  ^SQLite will also attempt to use indices to help satisfy GROUP BY clauses
  and the DISTINCT keyword.  If the nested loops of the join can be arranged
  such that are equivalent for the GROUP BY or for the DISTINCT are
  consecutive, then the GROUP BY or DISTINCT logic can determine if the 
  current row is part of the same group or if the current row is distinct
  simply by comparing the current row to the previous row.
  This can be much faster than the alternative of comparing each row to
  all prior rows.
}

HEADING 1 {Subquery flattening} flattening
hd_keywords {flattening optimization}

PARAGRAPH {
  When a subquery occurs in the FROM clause of a SELECT, the simplest
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
}

HEADING 1 {Automatic Indices} autoindex {automatic indexing} {Automatic indexing}

PARAGRAPH {
  ^(When no indices are available to aid the evaluation of a query, SQLite
  might create an automatic index that lasts only for the duration
  of a single SQL statement and use that index to help boost the query
  performance.)^  Since the cost of constructing the automatic index is
  O(NlogN) (where N is the number of entries in the table) and the cost of
  doing a full table scan is only O(N), an automatic index will
  only be created if SQLite expects that the lookup will be run more than
  logN times during the course of the SQL statement. Consider an example:
}
CODE {
  CREATE TABLE t1(a,b);







|
|







992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
}

HEADING 1 {Automatic Indices} autoindex {automatic indexing} {Automatic indexing}

PARAGRAPH {
  ^(When no indices are available to aid the evaluation of a query, SQLite
  might create an automatic index that lasts only for the duration
  of a single SQL statement.)^
  Since the cost of constructing the automatic index is
  O(NlogN) (where N is the number of entries in the table) and the cost of
  doing a full table scan is only O(N), an automatic index will
  only be created if SQLite expects that the lookup will be run more than
  logN times during the course of the SQL statement. Consider an example:
}
CODE {
  CREATE TABLE t1(a,b);
988
989
990
991
992
993
994
995



996
997









998
  of the t1.b column.  If each table contains N rows, SQLite expects that
  the subquery will run N times, and hence it will believe it is faster
  to construct an automatic, transient index on t2 first and then using
  that index to satisfy the N instances of the subquery.
}
PARAGRAPH {
  The automatic indexing capability can be disabled at run-time using
  the [automatic_index pragma] and can be omitted from the build at



  compile-time using the [SQLITE_OMIT_AUTOMATIC_INDEX] compile-time option.
}









</tcl>







|
>
>
>
|

>
>
>
>
>
>
>
>
>

1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
  of the t1.b column.  If each table contains N rows, SQLite expects that
  the subquery will run N times, and hence it will believe it is faster
  to construct an automatic, transient index on t2 first and then using
  that index to satisfy the N instances of the subquery.
}
PARAGRAPH {
  The automatic indexing capability can be disabled at run-time using
  the [automatic_index pragma].  Automatic indexing is turned on by
  default, but this can be changed so that automatic indexing is off
  by default using the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option.
  The ability to create automatic indices can be completely disabled by
  compiling with the [SQLITE_OMIT_AUTOMATIC_INDEX] compile-time option.
}
PARAGRAPH {
  In SQLite version 3.8.0, an [SQLITE_WARNING_AUTOINDEX] message is sent
  to the [error log] every time a statement is prepared that uses an
  automatic index.  Application developers can and should use these warnings
  to identify the need for new persistent indices in the schema.
}
PARAGRAPH {
  Future releases of SQLite may disable automatic indices by default.
}
</tcl>
Changes to pages/pragma.in.
136
137
138
139
140
141
142
143

144
145
146
147
148
149
150
}

Pragma {automatic_index} {
    <p>^(<b>PRAGMA automatic_index;
     <br>PRAGMA automatic_index = </b><i>boolean</i><b>;</b></p>

    <p>Query, set, or clear the [automatic indexing] capability.)^
    <p>^[Automatic indexing] is enabled by default.

}

Pragma {auto_vacuum} {
    <p><b>PRAGMA auto_vacuum;<br>
          PRAGMA auto_vacuum = </b>
           <i>0 | NONE | 1 | FULL | 2 | INCREMENTAL</i><b>;</b></p>








|
>







136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
}

Pragma {automatic_index} {
    <p>^(<b>PRAGMA automatic_index;
     <br>PRAGMA automatic_index = </b><i>boolean</i><b>;</b></p>

    <p>Query, set, or clear the [automatic indexing] capability.)^
    <p>^[Automatic indexing] is enabled by default as of version 3.7.17,
    but this might change in future releases of SQLite.
}

Pragma {auto_vacuum} {
    <p><b>PRAGMA auto_vacuum;<br>
          PRAGMA auto_vacuum = </b>
           <i>0 | NONE | 1 | FULL | 2 | INCREMENTAL</i><b>;</b></p>

Changes to pages/queryplanner-ng.in.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
}
proc code {txt} {
  hd_puts "<center><table><tr><td><pre>\n"
  regsub {^[ \n]*\n} $txt {} txt
  hd_puts [string trimright $txt]\n
  hd_puts "</pre></table></center>\n"
}
hd_keywords {next generation query planner}
</tcl>
<h1 align="center">The Next Generation Query Planner</h1>

<h2>1.0 Introduction</h2>

<p>
The task of the "query planner" is to figure







|







8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
}
proc code {txt} {
  hd_puts "<center><table><tr><td><pre>\n"
  regsub {^[ \n]*\n} $txt {} txt
  hd_puts [string trimright $txt]\n
  hd_puts "</pre></table></center>\n"
}
hd_keywords {next generation query planner} {Next Generation Query Planner} NGQP
</tcl>
<h1 align="center">The Next Generation Query Planner</h1>

<h2>1.0 Introduction</h2>

<p>
The task of the "query planner" is to figure
31
32
33
34
35
36
37
38




39
40
41
42
43
44
45
46
47
48
49
50
51
52

53
54
55
56
57
58
59
60
61
62
63
64
solves those problems.</p>

<p>The NGQP is almost always better than the legacy query planner.
However, there may exist legacy applications that unknowingly depend on 
undefined and/or suboptimal behavior in the legacy query planner, and
upgrading to the NGQP on those legacy applications could cause performance
regressions.  This risk is considered and a checklist is provided
for reducing the risk and fixing any issues that do arise.</p>





<h2>2.0 Background</h2>

<p>For simple queries against a single table with few indices, there is usually
an obvious choice for the best algorithm.
But for larger and more complex queries, such as
multi-way joins with many indices
and subqueries, there can be hundreds, thousands, or millions
of reasonable algorithms for computing the result.
The job of the query planner is to chose a single "best" query plan from
this multitude of possibilities.</p>

<p>The existence of a query planner is what makes SQL database engines
so amazingly useful and powerful.

The query planner frees the programmer from the chore of selecting
a particular query plan, and thereby allows the programmer to
focus more mental energy on higher-level application issues and on 
providing more value to the end user.  For simple queries where the choice
of query plans is obvious, this is convenient but not hugely important.
But as applications and schemas and queries grow more complex, a
clever query planner can greatly speed and simplify the work of application
development. 
There is amazing power in being about to tell
the database engine what content is desired, and then let the database
engine figure out the best way to retrieve that content.</p>








|
>
>
>
>














>




|







31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
solves those problems.</p>

<p>The NGQP is almost always better than the legacy query planner.
However, there may exist legacy applications that unknowingly depend on 
undefined and/or suboptimal behavior in the legacy query planner, and
upgrading to the NGQP on those legacy applications could cause performance
regressions.  This risk is considered and a checklist is provided
for reducing the risk and for fixing any issues that do arise.</p>

<p>This document focuses on the NGQP.  For a more general overview of the
SQLite query planner that encompasses the entire history of SQLite, see
"[query planner | The SQLite Query Optimizer Overview]".</p>

<h2>2.0 Background</h2>

<p>For simple queries against a single table with few indices, there is usually
an obvious choice for the best algorithm.
But for larger and more complex queries, such as
multi-way joins with many indices
and subqueries, there can be hundreds, thousands, or millions
of reasonable algorithms for computing the result.
The job of the query planner is to chose a single "best" query plan from
this multitude of possibilities.</p>

<p>The existence of a query planner is what makes SQL database engines
so amazingly useful and powerful.
(This is true of all SQL database engines, not just SQLite.)
The query planner frees the programmer from the chore of selecting
a particular query plan, and thereby allows the programmer to
focus more mental energy on higher-level application issues and on 
providing more value to the end user.  For simple queries where the choice
of query plan is obvious, this is convenient but not hugely important.
But as applications and schemas and queries grow more complex, a
clever query planner can greatly speed and simplify the work of application
development. 
There is amazing power in being about to tell
the database engine what content is desired, and then let the database
engine figure out the best way to retrieve that content.</p>

97
98
99
100
101
102
103
104

105
106
107
108
109
110
111
<ol type="a">
<li>the database schema does not change in significant ways such as 
    adding or dropping indices,</li>
<li>the ANALYZE command is not run, </li>
<li>SQLite is not compiled with [SQLITE_ENABLE_STAT3], and</li>
<li>the same version of SQLite is used.</li>
</ol>
This stability guarantee means that if all of your queries run efficiently

during testing, and if your application does not change the schema,
then SQLite will not suddenly decide to start using a different
query plan, possibly causing a performance problem, after your application 
is released to users.  If your application works in the lab, it
will continue working the same way after deployment.</p>

<p>Enterprise-class client/server SQL database engines do not normally 







|
>







102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
<ol type="a">
<li>the database schema does not change in significant ways such as 
    adding or dropping indices,</li>
<li>the ANALYZE command is not run, </li>
<li>SQLite is not compiled with [SQLITE_ENABLE_STAT3], and</li>
<li>the same version of SQLite is used.</li>
</ol>
The SQLite stability guarantee means that if all of your queries run 
efficiently
during testing, and if your application does not change the schema,
then SQLite will not suddenly decide to start using a different
query plan, possibly causing a performance problem, after your application 
is released to users.  If your application works in the lab, it
will continue working the same way after deployment.</p>

<p>Enterprise-class client/server SQL database engines do not normally 
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134

135
136
137
138
139
140
141
for the evolving structure of the data.  But sometimes the new query plan will
cause a performance reduction.  With a client/server database engine, there
is typically a Database Administrator (DBA) on hand to deal with these 
rare problems as they come up.  But DBAs are not available to fix problems 
in an embedded database like SQLite, and hence SQLite is careful to
ensure that plans do not change unexpectedly after deployment.</p>

<p>This stability guarantee applies equally to the legacy query planner in
SQLite and to the NGQP.</p>

<p>It is important to note that changing versions of SQLite might cause
changes in query plans, as new enhancements are added to the query planner.
The same version of SQLite it will
always pick the same query plan, but if you relink your application to use
a different version of SQLite, then query plans might change.  In rare
cases, this might lead to a performance regression.  This is one reason

you should consider statically linking your applications against SQLite 
rather than try to use a system-wide SQLite shared library which might 
change without your knowledge or control.</p>

<h2>3.0 A Difficult Case</h2>

<p>







|
|






|
>







125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
for the evolving structure of the data.  But sometimes the new query plan will
cause a performance reduction.  With a client/server database engine, there
is typically a Database Administrator (DBA) on hand to deal with these 
rare problems as they come up.  But DBAs are not available to fix problems 
in an embedded database like SQLite, and hence SQLite is careful to
ensure that plans do not change unexpectedly after deployment.</p>

<p>The SQLite stability guarantee applies equally to the legacy query planner 
and to the NGQP.</p>

<p>It is important to note that changing versions of SQLite might cause
changes in query plans, as new enhancements are added to the query planner.
The same version of SQLite it will
always pick the same query plan, but if you relink your application to use
a different version of SQLite, then query plans might change.  In rare
cases, an SQLite version change might lead to a performance regression.  
This is one reason
you should consider statically linking your applications against SQLite 
rather than try to use a system-wide SQLite shared library which might 
change without your knowledge or control.</p>

<h2>3.0 A Difficult Case</h2>

<p>
354
355
356
357
358
359
360

361
362
363
364
365
366
367
368
369
370
371
372
373


374
375
376
377
378
379
380
a small risk of introducing performance regressions.  The problem here is
not that the NGQP is incorrect or buggy or inferior to the legacy query
planner.  Given accurate cost estimates, the NGQP will always pick a better
plan than older query planners.  The problem is that cost estimates might
sometimes be inaccurate and the legacy query planner just happened to stumbled 
over a good plan by stupid luck while the NGQP is not as lucky.</p>


<h3>4.1 Case Study: Upgrading Fossil to the NGQP</h3>

<p>The <a href="http://www.fossil-scm.org/">Fossil DVCS</a> is the version
control system used to track all of the SQLite source code.
A Fossil repository is an SQLite database file.
(Readers are invited to ponder this recursion as an independent exercise.)
</p>

<p>Fossil is both the version-control system for SQLite and also a test
platform for SQLite.  Whenever changes or enhancements are made to SQLite, 
Fossil is one of the first applications to test and evaluate those
changes.  So Fossil was an early adopter of the NGQP.  And NGQP caused a
performance issue in Fossil, as described in the sequel.</p>



<p>One of the many reports that Fossil makes available is a timeline of
changes to a single branch showing all merges in and out of that branch.  See
<a href="http://www.sqlite.org/src/timeline?nd&n=200&r=trunk">http://www.sqlite.org/src/timeline?nd&n=200&r=trunk</a>
for a typical
example of such a report.  Generating such a report normally takes just
a few milliseconds.  But after upgrading to the NGQP we noticed that







>






<
<
|
|

|
|
>
>







361
362
363
364
365
366
367
368
369
370
371
372
373
374


375
376
377
378
379
380
381
382
383
384
385
386
387
388
a small risk of introducing performance regressions.  The problem here is
not that the NGQP is incorrect or buggy or inferior to the legacy query
planner.  Given accurate cost estimates, the NGQP will always pick a better
plan than older query planners.  The problem is that cost estimates might
sometimes be inaccurate and the legacy query planner just happened to stumbled 
over a good plan by stupid luck while the NGQP is not as lucky.</p>

<tcl>hd_fragment fossilcasestudy {The Fossil NGQP Upgrade Case Study}</tcl>
<h3>4.1 Case Study: Upgrading Fossil to the NGQP</h3>

<p>The <a href="http://www.fossil-scm.org/">Fossil DVCS</a> is the version
control system used to track all of the SQLite source code.
A Fossil repository is an SQLite database file.
(Readers are invited to ponder this recursion as an independent exercise.)


Fossil is both the version-control system for SQLite and a test
platform for SQLite.  Whenever enhancements are made to SQLite, 
Fossil is one of the first applications to test and evaluate those
enhancements.  So Fossil was an early adopter of the NGQP.</p>

<p>Unfortunately, the NGQP caused a
performance regression in Fossil.</p>

<p>One of the many reports that Fossil makes available is a timeline of
changes to a single branch showing all merges in and out of that branch.  See
<a href="http://www.sqlite.org/src/timeline?nd&n=200&r=trunk">http://www.sqlite.org/src/timeline?nd&n=200&r=trunk</a>
for a typical
example of such a report.  Generating such a report normally takes just
a few milliseconds.  But after upgrading to the NGQP we noticed that
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
The first condition causes all of the trunk check-ins to be displayed and
the second and third cause check-ins that merge into or are derived from
the trunk to also be included.
The three conditions are implemented by the three OR-connected
EXISTS statements in the WHERE clause of the query.
The slowdown that occurred with the NGQP was caused by the second and
third conditions.  The problem is the same in each, so we will examine
just the second conditions.
The subquery of the second condition can be rewritten (with minor
and immaterial simplifications) as follows:</p>

<blockquote><pre>
SELECT 1
  FROM plink JOIN tagxref ON tagxref.rid=plink.cid
 WHERE tagxref.tagid=$trunk







|







440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
The first condition causes all of the trunk check-ins to be displayed and
the second and third cause check-ins that merge into or are derived from
the trunk to also be included.
The three conditions are implemented by the three OR-connected
EXISTS statements in the WHERE clause of the query.
The slowdown that occurred with the NGQP was caused by the second and
third conditions.  The problem is the same in each, so we will examine
just the second one.
The subquery of the second condition can be rewritten (with minor
and immaterial simplifications) as follows:</p>

<blockquote><pre>
SELECT 1
  FROM plink JOIN tagxref ON tagxref.rid=plink.cid
 WHERE tagxref.tagid=$trunk
512
513
514
515
516
517
518
519

520
521
522
523
524
525
526
527
so the first field of TAGXREF_I1 will be
of little help in narrowing down the search.
</p>

<p>
The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
gathers statistics on the quality of the various indices and stores the

result in SQLITE_STAT1 table.  Having access to this statistical information,
the NGQP easily choses algorithm-1 as the best algorithm, by a wide
margin.</p>

<p>Why didn't the legacy query planner choose algorithm-2?
Easy: because the NN algorithm
never even considered algorithm-2.  Graphs of the planning
problem look like this:</p>







|
>
|







520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
so the first field of TAGXREF_I1 will be
of little help in narrowing down the search.
</p>

<p>
The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
gathers statistics on the quality of the various indices and stores those
statistics in [SQLITE_STAT1] table.  
Having access to this statistical information,
the NGQP easily choses algorithm-1 as the best algorithm, by a wide
margin.</p>

<p>Why didn't the legacy query planner choose algorithm-2?
Easy: because the NN algorithm
never even considered algorithm-2.  Graphs of the planning
problem look like this:</p>
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593

<h3>4.2 Fixing The Problem</h3>

<p>Running [ANALYZE] on the repository database immediately fixed the
performance problem.  However, we want Fossil to be robust and to always
work quickly regardless of whether or not its repository has been analyzed.
For this reason, the query was modify to use the CROSS JOIN operator 
instead of simply JOIN.
(<a href="http://www.fossil-scm.org/fossil/fdiff?v1=c498d980579e18f8&v2=2a35f6ffed42e024&sbs=1">The
difference</a>)
SQLite will not reorder the tables of a CROSS JOIN.
This is a long-standing feature of SQLite that is specifically designed
to allow knowledgeable programmers
to enforce a particular loop nesting order.  Once the join
was changed to CROSS JOIN (the addition of a single keyword) the NGQP was
forced to chose the faster algorithm-1 regardless of whether or not
statistical information had been gathered using ANALYZE.</p>

<p>We say that algorithm-1 "faster", but this
is not strictly true.  Algorithm-1 is faster in common repositories, but
it is possible to construct a repository in which
every check-in is on a different uniquely-named branch and
all check-ins are children of the root check-in.
In that case, TAGXREF_I1 would become more selective
than PLINK_I1 and algorithm-2 really would be the faster choice.
However such repositories are very unlikely to appear in
practice and so hard-coding the loop nested order using the
CROSS JOIN syntax is a reasonable solution
to the problem i nthis case.</p>

<tcl>hd_fragment howtofix</tcl>
<h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2>

<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run accross any problems in your application.
If you are not having performance issues, the you do not need to worry







|
<
<


















|

|







565
566
567
568
569
570
571
572


573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600

<h3>4.2 Fixing The Problem</h3>

<p>Running [ANALYZE] on the repository database immediately fixed the
performance problem.  However, we want Fossil to be robust and to always
work quickly regardless of whether or not its repository has been analyzed.
For this reason, the query was modify to use the CROSS JOIN operator 
instead of the plain JOIN operator.


SQLite will not reorder the tables of a CROSS JOIN.
This is a long-standing feature of SQLite that is specifically designed
to allow knowledgeable programmers
to enforce a particular loop nesting order.  Once the join
was changed to CROSS JOIN (the addition of a single keyword) the NGQP was
forced to chose the faster algorithm-1 regardless of whether or not
statistical information had been gathered using ANALYZE.</p>

<p>We say that algorithm-1 "faster", but this
is not strictly true.  Algorithm-1 is faster in common repositories, but
it is possible to construct a repository in which
every check-in is on a different uniquely-named branch and
all check-ins are children of the root check-in.
In that case, TAGXREF_I1 would become more selective
than PLINK_I1 and algorithm-2 really would be the faster choice.
However such repositories are very unlikely to appear in
practice and so hard-coding the loop nested order using the
CROSS JOIN syntax is a reasonable solution
to the problem in this case.</p>

<tcl>hd_fragment howtofix {query planner checklist}</tcl>
<h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2>

<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run accross any problems in your application.
If you are not having performance issues, the you do not need to worry
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648











649
650
651
652
653
654
655
656
657
658

<li><p><b>Avoid creating low-quality indices.</b>.
A low-quality index (for the purpose of this checklist) is one where
there are more than 10 or 20 rows in the table that have the same value
for the left-most column of the index.  In particular, avoid using
boolean or "enum" columns as the left-most columns of your indices.</p>

<p>The performance problem in Fossil
described in the previous section came about because there were over
ten thousands entries in the TAGXREF table with the same value for the 
left-most column (the TAGID column) of the TAGXREF_I1 index.</p>

<li><p><b>If you must use a low-quality index, be sure to run ANALYZE.</b>
Low-quality indices will not confuse the query planner as long as the
query planner knows that the indices are of low quality.  And the way
the query planner knows this is by the content of the SQLITE_STAT1 table,
normally computed by the ANALYZE command.</p>

<p>Of course, ANALYZE only works effectively if you have a significant 
amount of content in your database in the first place.  When creating a 
new database that you expect to accumulate a lot of data, you can run 
the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table,
then prepopulate the SQLITE_STAT1 table (using ordinary INSERT statements)
with content that describes a typical
database for your application - perhaps content that you extracted after
running ANALYZE on a well-populated template database in the lab.</p>

<li><p><b>Instrument your code.</b>
Add logic that lets you know quickly and easily which queries are taking
too much time.  Then work on just those specific queries.</p>

<li><p><b>Use the CROSS JOIN syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in
unanalyzed database.</b>
SQLite treats the keyword CROSS JOIN specially, forcing the table to the left
to be an an outer loop relative to the table on the right.</p>

<p>Avoid this step if possible, as it defeats one of the huge advantages
of the whole SQL language concept, specifically that the application 
programmer does not need to get involved with query planning.  If you
do use CROSS JOIN, wait until late in your development cycle to do so,
and comment the use of CROSS JOIN carefully so that you can take it out
later if possible.  Avoid using CROSS JOIN early in the development
cycle as doing so is a premature optimization, which is well known to
be <a href="http://c2.com/cgi/wiki?PrematureOptimization">the root of
all evil</a>.</p>












<li><p><b>Use the [INDEXED BY] syntax to enforce the selection of
particular indices on problem queries.</b>
As with the previous bullet, avoid this step if possible, and 
especially avoid doing this early in development as it is clearly a
premature optimization.</p>
</ol>

<h2>6.0 Summary</h2>

<p>The query planner in SQLite normally does a terrific job of selecting







|
|



|


|
|














|
|

|
|











>
>
>
>
>
>
>
>
>
>
>


|







609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676

<li><p><b>Avoid creating low-quality indices.</b>.
A low-quality index (for the purpose of this checklist) is one where
there are more than 10 or 20 rows in the table that have the same value
for the left-most column of the index.  In particular, avoid using
boolean or "enum" columns as the left-most columns of your indices.</p>

<p>The Fossil performance problem described in the previous section of
this document arose because there were over
ten thousands entries in the TAGXREF table with the same value for the 
left-most column (the TAGID column) of the TAGXREF_I1 index.</p>

<li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b>
Low-quality indices will not confuse the query planner as long as the
query planner knows that the indices are of low quality.  And the way
the query planner knows this is by the content of the [SQLITE_STAT1] table,
which is computed by the ANALYZE command.</p>

<p>Of course, ANALYZE only works effectively if you have a significant 
amount of content in your database in the first place.  When creating a 
new database that you expect to accumulate a lot of data, you can run 
the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table,
then prepopulate the SQLITE_STAT1 table (using ordinary INSERT statements)
with content that describes a typical
database for your application - perhaps content that you extracted after
running ANALYZE on a well-populated template database in the lab.</p>

<li><p><b>Instrument your code.</b>
Add logic that lets you know quickly and easily which queries are taking
too much time.  Then work on just those specific queries.</p>

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an an outer loop relative to the table on the right.</p>

<p>Avoid this step if possible, as it defeats one of the huge advantages
of the whole SQL language concept, specifically that the application 
programmer does not need to get involved with query planning.  If you
do use CROSS JOIN, wait until late in your development cycle to do so,
and comment the use of CROSS JOIN carefully so that you can take it out
later if possible.  Avoid using CROSS JOIN early in the development
cycle as doing so is a premature optimization, which is well known to
be <a href="http://c2.com/cgi/wiki?PrematureOptimization">the root of
all evil</a>.</p>

<li><p><b>Use unary "+" operators to disqualify WHERE clause terms.</b>
If the query planner insists of selecting a poor-quality index for a particular
query when a much higher-quality index is available, then
[upluscontrol | careful use of unary "+" operators] in the WHERE clause
can force the query planner away from the poor-quality index.
Avoid using this trick if at all possible, and especially avoid it
early in the application development cycle.  Beware that
adding a unary "+" operator to an equality expression might change
the result of that expression if 
<a href="datatype3.html#affinity">type affinity</a> is involved.</p>

<li><p><b>Use the [INDEXED BY] syntax to enforce the selection of
particular indices on problem queries.</b>
As with the previous two bullets, avoid this step if possible, and 
especially avoid doing this early in development as it is clearly a
premature optimization.</p>
</ol>

<h2>6.0 Summary</h2>

<p>The query planner in SQLite normally does a terrific job of selecting