Documentation Source Text

Check-in [d8db9f9b5c]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Query planner documentation update.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: d8db9f9b5c838302ba3e03a0ae3ee35c57c495a3
User & Date: drh 2009-08-31 13:47:55
Context
2009-09-01
12:27
Make the "developers page" a hard-coded document, not a wiki page in CVSTrac. check-in: faaa8cbcf3 user: drh tags: trunk
2009-08-31
13:47
Query planner documentation update. check-in: d8db9f9b5c user: drh tags: trunk
2009-08-26
13:13
Tweaks to the testing documentation. check-in: 3b5097e255 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/compile.in.

382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
COMPILE_OPTION {SQLITE_ENABLE_STAT2} {
  This option adds additional logic to the [ANALYZE] command and to
  the [query planner] that can help SQLite to chose a better query plan
  under certain situations.  The [ANALYZE] command is enhanced to collect
  a 10-sample histogram of the data in each index and store that histogram
  in the <b>sqlite_stat2</b> table.  The query planner will then use the
  histogram data to help it estimate how many rows will be selected by a
  range or bound constraint (an inequality constraint) in a WHERE clause.
}

COMPILE_OPTION {SQLITE_ENABLE_UPDATE_DELETE_LIMIT} {
  This option enables an optional ORDER BY and LIMIT clause on 
  [UPDATE] and [DELETE] statements.

  <p>If this option is defined, then it must also be 







|







382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
COMPILE_OPTION {SQLITE_ENABLE_STAT2} {
  This option adds additional logic to the [ANALYZE] command and to
  the [query planner] that can help SQLite to chose a better query plan
  under certain situations.  The [ANALYZE] command is enhanced to collect
  a 10-sample histogram of the data in each index and store that histogram
  in the <b>sqlite_stat2</b> table.  The query planner will then use the
  histogram data to help it estimate how many rows will be selected by a
  <a href="optoverview.html#rangequery">range constraint</a> in a WHERE clause.
}

COMPILE_OPTION {SQLITE_ENABLE_UPDATE_DELETE_LIMIT} {
  This option enables an optional ORDER BY and LIMIT clause on 
  [UPDATE] and [DELETE] statements.

  <p>If this option is defined, then it must also be 

Changes to pages/optoverview.in.

38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
...
109
110
111
112
113
114
115
116
117

118
119
120
121
122
123
124
125
126
127
128
129
130
131














132
133
134
135
136
137
138
139
140
141



































142
143
144
145
146
147
148
...
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534



535
536

537
538
539
540
541
542
543
544
545
...
580
581
582
583
584
585
586
587


588
589
590
591
592
593
594
...
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
...
653
654
655
656
657
658
659

























































660
661
662
663
664
665
666
      append num .$level($i)
    }
  }
  incr n 1
  hd_puts "<h$n>$num $name</h$n>"
}

HEADING 0 {The SQLite Query Planner/Optimizer Overview}

PARAGRAPH {
  This document provides overview of how the query planner and optimizer
  for SQLite works.
}

PARAGRAPH {
................................................................................
}
CODE {
  CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
}
PARAGRAPH {
  Then the index might be used if the initial columns of the index
  (columns a, b, and so forth) appear in WHERE clause terms.
  All index columns must be used with
  the *=* or *IN* operators except for

  the right-most column which can use inequalities.  For the right-most
  column of an index that is used, there can be up to two inequalities
  that must sandwich the allowed values of the column between two extremes.
}
PARAGRAPH {
  It is not necessary for every column of an index to appear in a
  WHERE clause term in order for that index to be used. 
  But there can not be gaps in the columns of the index that are used.
  Thus for the example index above, if there is no WHERE clause term
  that constraints column c, then terms that constrain columns a and b can
  be used with the index but not terms that constraint columns d through z.
  Similarly, no index column will be used (for indexing purposes)
  that is to the right of a 
  column that is constrained only by inequalities.














  For the index above and WHERE clause like this:
}
CODE {
  ... WHERE a=5 AND b IN (1,2,3) AND c>12 AND d='hello'
}
PARAGRAPH {
  Only columns a, b, and c of the index would be usable.  The d column
  would not be usable because it occurs to the right of c and c is
  constrained only by inequalities.
}




































HEADING 1 {The BETWEEN optimization} between_opt

PARAGRAPH {
  If a term of the WHERE clause is of the following form:
}
SYNTAX {
................................................................................
HEADING 2 {Manual Control Of Query Plans}

PARAGRAPH {
  SQLite provides the ability for advanced programmers to exercise control
  over the query plan chosen by the optimizer. One method for doing this
  is to fudge the [ANALYZE] results in the <b>sqlite_stat1</b> and
  <b>sqlite_stat2</b> tables.  That approach is not recommended except
  in one scenario.
}
PARAGRAPH {
  For an program that uses an SQLite database as its application file
  format, one development approach would be to construct a large
  database containing typical data and run the [ANALYZE] command to
  gather statistics. Then modify the program so that when it creates
  new SQLite databases, it runs the [ANALYZE] command during schema
  creation in order to create the <b>sqlite_stat1</b> and



  <b>sqlite_stat2</b> tables, then loads those tables up with content
  that was saved from trial runs during development.  In that way,

  statistics from large working data sets can be preloaded into newly
  created application files.
}
PARAGRAPH {
  If you really must take manual control of join loop nesting order,
  the preferred method is to use some peculiar (though valid) SQL syntax
  to specify the join. If you use the keyword CROSS in a join, then 
  the two tables connected by that join will not be reordered.
  So in the query, the optimizer is free to reorder the tables of
................................................................................
  associative or commutative. Reordering tables in outer joins changes
  the result.)
}

HEADING 1 {Choosing between multiple indices} multi_index

PARAGRAPH {
  Each table in the FROM clause of a query can use at most one index,


  and SQLite strives to use at least one index on each table.  Sometimes,
  two or more indices might be candidates for use on a single table.
  For example:
}
CODE {
  CREATE TABLE ex2(x,y,z);
  CREATE INDEX ex2i1 ON ex2(x);
................................................................................
  The content of these tables is not updated as the database
  changes so after making significant changes it might be prudent to
  rerun [ANALYZE].
  The results of an ANALYZE command are only available to database connections
  that are opened after the ANALYZE command completes.
}
PARAGRAPH {
  Once created, the various <b>sqlite_stat</b><i>N</i> tables
  cannot be dropped.  But their
  content can be viewed, modified, or erased.  Erasing the entire content
  of the statistics table has the effect of undoing the ANALYZE command.
  Changing the content of the statistics tables can get the optimizer
  deeply confused and cause it to make silly index choices.  Hence, making
  updates to the statistics table (except by running [ANALYZE]) is
  not recommended.
}
PARAGRAPH {
  Terms of the WHERE clause can be manually disqualified for use with
  indices by prepending a unary *+* operator to the column name.  The
  unary *+* is a no-op and will not slow down the evaluation of the test
  specified by the term.
  But it will prevent the term from constraining an index.
................................................................................
  the meaning of an expression.
  In the example above,
  if column *x* has <a href="datatype3.html#affinity">TEXT affinity</a>
  then the comparison "x=5" will be done as text.  But the *+* operator
  removes the affinity.  So the comparison "+x=5" will compare the text
  in column *x* with the numeric value 5 and will always be false.
}


























































HEADING 1 {Avoidance of table lookups} index_only

PARAGRAPH {
  When doing an indexed lookup of a row, the usual procedure is to
  do a binary search on the index to find the index entry, then extract
  the [rowid] from the index and use that [rowid] to do a binary search on







|







 







|
|
>
|













>
>
>
>
>
>
>
>
>
>
>
>
>
>










>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







|



|
|
|
|
|
>
>
>
|
<
>
|
|







 







|
>
>







 







|
|
|
|
|
|
<
<







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
...
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
...
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588

589
590
591
592
593
594
595
596
597
598
...
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
...
673
674
675
676
677
678
679
680
681
682
683
684
685


686
687
688
689
690
691
692
...
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
      append num .$level($i)
    }
  }
  incr n 1
  hd_puts "<h$n>$num $name</h$n>"
}

HEADING 0 {The SQLite Query Planner}

PARAGRAPH {
  This document provides overview of how the query planner and optimizer
  for SQLite works.
}

PARAGRAPH {
................................................................................
}
CODE {
  CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z);
}
PARAGRAPH {
  Then the index might be used if the initial columns of the index
  (columns a, b, and so forth) appear in WHERE clause terms.
  The initial columns of the index must be used with
  the *=* or *IN* operators.  
  The right-most column that is used can employ inequalities.  
  For the right-most
  column of an index that is used, there can be up to two inequalities
  that must sandwich the allowed values of the column between two extremes.
}
PARAGRAPH {
  It is not necessary for every column of an index to appear in a
  WHERE clause term in order for that index to be used. 
  But there can not be gaps in the columns of the index that are used.
  Thus for the example index above, if there is no WHERE clause term
  that constraints column c, then terms that constrain columns a and b can
  be used with the index but not terms that constraint columns d through z.
  Similarly, no index column will be used (for indexing purposes)
  that is to the right of a 
  column that is constrained only by inequalities.
}
HEADING 2 {Index term usage examples}
PARAGRAPH {
  For the index above and WHERE clause like this:
}
CODE {
  ... WHERE a=5 AND b IN (1,2,3) AND c IS NULL AND d='hello'
}
PARAGRAPH {
  The first four columns a, b, c, and d of the index would be usable since
  those four columns form a prefix of the index and are all bound by
  equality constraints.
}
PARAGRAPH {
  For the index above and WHERE clause like this:
}
CODE {
  ... WHERE a=5 AND b IN (1,2,3) AND c>12 AND d='hello'
}
PARAGRAPH {
  Only columns a, b, and c of the index would be usable.  The d column
  would not be usable because it occurs to the right of c and c is
  constrained only by inequalities.
}
PARAGRAPH {
  For the index above and WHERE clause like this:
}
CODE {
  ... WHERE a=5 AND b IN (1,2,3) AND d='hello'
}
PARAGRAPH {
  Only columns a and b of the index would be usable.  The d column
  would not be usable because column c is not constrained and there can
  be no gaps in the set of columns that usable by the index.
}
PARAGRAPH {
  For the index above and WHERE clause like this:
}
CODE {
  ... WHERE b IN (1,2,3) AND c NOT NULL AND d='hello'
}
PARAGRAPH {
  The index is not usable at all becaues the left-most column of the
  index (column "a") is not constrained.  Assuming there are no other
  indices, the query above would result in a full table scan.
}
PARAGRAPH {
  For the index above and WHERE clause like this:
}
CODE {
  ... WHERE a=5 OR b IN (1,2,3) OR c NOT NULL OR d='hello'
}
PARAGRAPH {
  The index is not usable because the WHERE clause terms are connected
  by OR instead of AND. This query would result in a full table scan.
  However, if three additional indices where added that contained columns
  b, c, and d as their left-most columns, then the
  <a href="#or_opt">OR-clause optimization</a> might apply.
}

HEADING 1 {The BETWEEN optimization} between_opt

PARAGRAPH {
  If a term of the WHERE clause is of the following form:
}
SYNTAX {
................................................................................
HEADING 2 {Manual Control Of Query Plans}

PARAGRAPH {
  SQLite provides the ability for advanced programmers to exercise control
  over the query plan chosen by the optimizer. One method for doing this
  is to fudge the [ANALYZE] results in the <b>sqlite_stat1</b> and
  <b>sqlite_stat2</b> tables.  That approach is not recommended except
  for the one scenario described in the following paragraph.
}
PARAGRAPH {
  For an program that uses an SQLite database as its application file
  format, when a new database instances is first created the [ANALYZE]
  command is ineffective because the database contain no data from which
  to gather statistics.  In that case, one could construct a large prototype
  database containing typical data during development and run the 
  [ANALYZE] command on this prototype database to gather statistics,
  then save the prototype statistics as part of the application.
  After deployment, when the application goes to create a new database file,
  it can run the [ANALYZE] command in order to create the <b>sqlite_stat1</b>
  and <b>sqlite_stat2</b> tables, then copy the precomputed statistics obtained

  from the prototype database into these new statistics tables.
  In that way, statistics from large working data sets can be preloaded
  into newly created application files.
}
PARAGRAPH {
  If you really must take manual control of join loop nesting order,
  the preferred method is to use some peculiar (though valid) SQL syntax
  to specify the join. If you use the keyword CROSS in a join, then 
  the two tables connected by that join will not be reordered.
  So in the query, the optimizer is free to reorder the tables of
................................................................................
  associative or commutative. Reordering tables in outer joins changes
  the result.)
}

HEADING 1 {Choosing between multiple indices} multi_index

PARAGRAPH {
  Each table in the FROM clause of a query can use at most one index
  (except when the <a href="#or_opt">OR-clause optimization</a> comes into
  paly)
  and SQLite strives to use at least one index on each table.  Sometimes,
  two or more indices might be candidates for use on a single table.
  For example:
}
CODE {
  CREATE TABLE ex2(x,y,z);
  CREATE INDEX ex2i1 ON ex2(x);
................................................................................
  The content of these tables is not updated as the database
  changes so after making significant changes it might be prudent to
  rerun [ANALYZE].
  The results of an ANALYZE command are only available to database connections
  that are opened after the ANALYZE command completes.
}
PARAGRAPH {
  The various <b>sqlite_stat</b><i>N</i> tables contain information on how
  selective the various indices are.  For example, the <b>sqlite_stat1</b>
  table might indicate that an equality constraint on column x reduces the
  search space to 10 rows on average, whereas an equality constraint on
  column y reduces the search space to 3 rows on average.  In that case,
  SQLite would prefer to use index ex2i2 since that index.


}
PARAGRAPH {
  Terms of the WHERE clause can be manually disqualified for use with
  indices by prepending a unary *+* operator to the column name.  The
  unary *+* is a no-op and will not slow down the evaluation of the test
  specified by the term.
  But it will prevent the term from constraining an index.
................................................................................
  the meaning of an expression.
  In the example above,
  if column *x* has <a href="datatype3.html#affinity">TEXT affinity</a>
  then the comparison "x=5" will be done as text.  But the *+* operator
  removes the affinity.  So the comparison "+x=5" will compare the text
  in column *x* with the numeric value 5 and will always be false.
}

HEADING 2 {Range Queries} rangequery

PARAGRAPH {
  Consider a slightly different scenario:
}
CODE {
  CREATE TABLE ex2(x,y,z);
  CREATE INDEX ex2i1 ON ex2(x);
  CREATE INDEX ex2i2 ON ex2(y);
  SELECT z FROM ex2 WHERE x BETWEEN 1 AND 100 AND y BETWEEN 1 AND 100;
}
PARAGRAPH {
  Further suppose that column x contains values spread out
  between 0 and 1,000,000 and column y contains values
  that span between 0 and 1,000.  In that scenario,
  the range constraint on column x should reduce the search space by
  a factor of 10,000 whereas the range constraint on column y should
  reduce the search space by a factor of only 10.  So the ex2i1 index
  should be preferred.
}
PARAGRAPH {
  SQLite will make this determination, but only if it has been compiled
  with [SQLITE_ENABLE_STAT2].  The [SQLITE_ENABLE_STAT2] option causes
  the [ANALYZE] command to collect a histogram of column content in the
  <b>sqlite_stat2</b> table and to use this histogram to make a better
  guess at the best query to use for range constraints such as the above.
}
PARAGRAPH {
  The histogram data is only useful if the right-hand side of the constraint
  is a compile-time constant.  Consider this query:
}
CODE {
  SELECT z FROM ex2 WHERE x BETWEEN ? AND ? AND y BETWEEN ? AND ?
}
PARAGRAPH {
  Because the bounds on columns x and y are [parameters] and are unknown
  to the query planner, SQLite has no way of using the histogram data in
  <b>sqlite_stat2</b> and so the index choice falls back to being
  arbitrary.
}
PARAGRAPH {
  Another limitation of the histogram data is that it only applies to the
  left-most column on an index.  Consider this scenario:
}
CODE {
  CREATE TABLE ex3(w,x,y,z);
  CREATE INDEX ex3i1 ON ex2(w, x);
  CREATE INDEX ex3i2 ON ex2(w, y);
  SELECT z FROM ex3 WHERE w=5 AND x BETWEEN 1 AND 100 AND y BETWEEN 1 AND 100;
}
PARAGRAPH {
  Here the inequalities are on columns x and y which are not the
  left-most index columns.  Hence, the histogram data which is collected no
  left-most column of indices is useless in helping to choose between the
  range constraints on columns x and y.
}

HEADING 1 {Avoidance of table lookups} index_only

PARAGRAPH {
  When doing an indexed lookup of a row, the usual procedure is to
  do a binary search on the index to find the index entry, then extract
  the [rowid] from the index and use that [rowid] to do a binary search on