Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Lots of tweaking and enhancements to the documentation, especially regarding the query planner and ways of controlling the query planner. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
641e107a6680cc33216443b007804205 |
User & Date: | drh 2013-07-01 16:07:50.618 |
Context
2013-07-01
| ||
20:52 | Fix a typo in an example in the fts4aux virtual table documentation. (check-in: 6743aed594 user: drh tags: trunk) | |
16:07 | Lots of tweaking and enhancements to the documentation, especially regarding the query planner and ways of controlling the query planner. (check-in: 641e107a66 user: drh tags: trunk) | |
2013-06-26
| ||
14:35 | Updates to the changes.html page. Fix typos on the NGQP documentation. (check-in: e0c107dec3 user: drh tags: trunk) | |
Changes
Changes to pages/changes.in.
︙ | ︙ | |||
44 45 46 47 48 49 50 | chng {2013-08-15 (3.8.0)} { <li>Cut-over to the [next generation query planner] for faster and better query plans. <li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table. <li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()]. <li>Added the "percentile()" function as a loadable extension in the ext/misc subdirectory of the source tree. | | > > > > > > | 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | chng {2013-08-15 (3.8.0)} { <li>Cut-over to the [next generation query planner] for faster and better query plans. <li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table. <li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()]. <li>Added the "percentile()" function as a loadable extension in the ext/misc subdirectory of the source tree. <li>A running SELECT statement that lacks a FROM clause (or any other statement that never reads or writes from any database file) will not prevent a read transaction from closing. <li>Add the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option that, if set to 0, disables automatic indices by default. <li>Issue an [SQLITE_WARNING_AUTOINDEX] warning on the [SQLITE_CONFIG_LOG] whenever the query planner uses an automatic index. } chng {2013-05-20 (3.7.17)} { <li>Add support for [memory-mapped I/O]. <li>Add the [sqlite3_strglob()] convenience interface. <li>Assigned the integer at offset 68 in the [database header] as the [Application ID] for when SQLite is used as an [application file-format]. |
︙ | ︙ |
Changes to pages/compile.in.
︙ | ︙ | |||
32 33 34 35 36 37 38 39 40 41 42 43 44 45 | hd_fragment [string tolower $all] hd_keywords $all } hd_puts <p><b>$name</b></p> regsub -all "\n\\s*\n" $text "</p>\n\n<p>" text hd_resolve <blockquote><p>$text</p></blockquote> } COMPILE_OPTION {SQLITE_DEFAULT_AUTOVACUUM=<i><0 or 1 or 2></i>} { This macro determines if SQLite creates databases with the [auto_vacuum] flag set by default to OFF (0), FULL (1), or INCREMENTAL (2). The default value is 0 meaning that databases are created with auto-vacuum turned off. In any case the compile-time default may be overridden by the | > > > > > > > > > > | 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | hd_fragment [string tolower $all] hd_keywords $all } hd_puts <p><b>$name</b></p> regsub -all "\n\\s*\n" $text "</p>\n\n<p>" text hd_resolve <blockquote><p>$text</p></blockquote> } COMPILE_OPTION {SQLITE_DEFAULT_AUTOMATIC_INDEX=<i><0 or 1></i>} { This macro determines the initial setting for [PRAGMA automatic_index] for newly opened [database connections]. For all versions of SQLite through 3.7.17, automatic indices are normally enabled for new database connections if this compile-time option is omitted. However, that might change in future releases of SQLite. <p>See also: [SQLITE_OMIT_AUTOMATIC_INDEX] } COMPILE_OPTION {SQLITE_DEFAULT_AUTOVACUUM=<i><0 or 1 or 2></i>} { This macro determines if SQLite creates databases with the [auto_vacuum] flag set by default to OFF (0), FULL (1), or INCREMENTAL (2). The default value is 0 meaning that databases are created with auto-vacuum turned off. In any case the compile-time default may be overridden by the |
︙ | ︙ | |||
746 747 748 749 750 751 752 753 754 755 756 757 758 759 | to invoke [sqlite3_initialize()] directly prior to beginning use of the SQLite library. } COMPILE_OPTION {SQLITE_OMIT_AUTOMATIC_INDEX} { This option is used to omit the [automatic indexing] functionality. } COMPILE_OPTION {SQLITE_OMIT_AUTORESET} { By default, the [sqlite3_step()] interface will automatically invoke [sqlite3_reset()] to reset the [prepared statement] if necessary. This compile-time option changes that behavior so that [sqlite3_step()] will return [SQLITE_MISUSE] if it called again after returning anything other | > | 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 | to invoke [sqlite3_initialize()] directly prior to beginning use of the SQLite library. } COMPILE_OPTION {SQLITE_OMIT_AUTOMATIC_INDEX} { This option is used to omit the [automatic indexing] functionality. See also: [SQLITE_DEFAULT_AUTOMATIC_INDEX]. } COMPILE_OPTION {SQLITE_OMIT_AUTORESET} { By default, the [sqlite3_step()] interface will automatically invoke [sqlite3_reset()] to reset the [prepared statement] if necessary. This compile-time option changes that behavior so that [sqlite3_step()] will return [SQLITE_MISUSE] if it called again after returning anything other |
︙ | ︙ |
Changes to pages/features.in.
︙ | ︙ | |||
39 40 41 42 43 44 45 | </ul> </p> <h2>Suggested Uses For SQLite:</h2> <p><ul> <li><p><b>Application File Format.</b> | | > | | | 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | </ul> </p> <h2>Suggested Uses For SQLite:</h2> <p><ul> <li><p><b>Application File Format.</b> Rather than using fopen() to write XML, JSON, CSV, or some proprietary format into disk files used by your application, use an SQLite database. You'll avoid having to write and troubleshoot a parser, your data will be more easily accessible and cross-platform, and your updates will be transactional. ([application file-format | more...])</p></li> <li><p><b>Database For Gadgets.</b> SQLite is popular choice for the database engine in cellphones, PDAs, MP3 players, set-top boxes, and other electronic gadgets. SQLite has a small code footprint, makes efficient use of memory, disk space, and disk bandwidth, is highly reliable, and requires no maintenance from a Database Administrator.</p></li> |
︙ | ︙ |
Changes to pages/fileformat2.in.
︙ | ︙ | |||
1154 1155 1156 1157 1158 1159 1160 | <p>^Application code is allowed to modify the sqlite_sequence table, to add new rows, to delete rows, or to modify existing rows. ^However, application code cannot create the sqlite_sequence table if it does not already exist. ^Application code can delete all entries from the sqlite_sequence table, but application code cannot drop the sqlite_sequence table. | | | 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 | <p>^Application code is allowed to modify the sqlite_sequence table, to add new rows, to delete rows, or to modify existing rows. ^However, application code cannot create the sqlite_sequence table if it does not already exist. ^Application code can delete all entries from the sqlite_sequence table, but application code cannot drop the sqlite_sequence table. <tcl>hd_fragment stat1tab {sqlite_stat1} SQLITE_STAT1 </tcl> <h4>2.5.3 The sqlite_stat1 table</h4> <p>^The sqlite_stat1 is an internal table created by the [ANALYZE] command and used to hold supplemental information about tables and indices that the query planner can use to help it find better ways of performing queries. ^Applications can update, delete from, insert into or drop the sqlite_stat1 table, but may not create or alter the sqlite_stat1 table. |
︙ | ︙ | |||
1230 1231 1232 1233 1234 1235 1236 | Conceptually, the index space is divided into 10 uniform buckets and the samples are the middle row from each bucket. <p>The format for sqlite_stat2 is recorded here for legacy reference. Recent versions of SQLite no longer support sqlite_stat2 and the sqlite_stat2 table, it is exists, is simply ignored. | | | 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 | Conceptually, the index space is divided into 10 uniform buckets and the samples are the middle row from each bucket. <p>The format for sqlite_stat2 is recorded here for legacy reference. Recent versions of SQLite no longer support sqlite_stat2 and the sqlite_stat2 table, it is exists, is simply ignored. <tcl>hd_fragment stat3tab {sqlite_stat3} SQLITE_STAT3</tcl> <h4>2.5.5 The sqlite_stat3 table</h4> <p>The sqlite_stat3 is only created and is only used if SQLite is compiled with [SQLITE_ENABLE_STAT3] and if the SQLite version number is 3.7.9 or greater. The sqlite_stat3 table is neither read nor written by any version of SQLite before 3.6.9. The sqlite_stat3 table contains additional information |
︙ | ︙ |
Changes to pages/lang.in.
︙ | ︙ | |||
217 218 219 220 221 222 223 | [INSERT], and [UPDATE] commands. ^(The [DROP TABLE] command works on sqlite_stat1 and sqlite_stat3 as of SQLite version 3.7.9.)^ Appropriate care should be used when changing the content of the statistics tables as invalid content can cause SQLite to select inefficient query plans. Generally speaking, one should not modify the content of the statistics tables by any mechanism other than invoking the | | > > | > > | 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | [INSERT], and [UPDATE] commands. ^(The [DROP TABLE] command works on sqlite_stat1 and sqlite_stat3 as of SQLite version 3.7.9.)^ Appropriate care should be used when changing the content of the statistics tables as invalid content can cause SQLite to select inefficient query plans. Generally speaking, one should not modify the content of the statistics tables by any mechanism other than invoking the ANALYZE command. See "[Manual Control Of Query Plans Using SQLITE_STAT Tables]" for further information.</p> <p> ^Statistics gathered by ANALYZE are not automatically updated as the content of the database changes. If the content of the database changes significantly, or if the database schema changes, then one should consider rerunning the ANALYZE command in order to update the statistics.</p> <p> The query planner might not notice manual changes to the sqlite_stat1 and/or sqlite_stat3 tables. ^An application can force the query planner to reread the statistics tables by running <b>ANALYZE sqlite_master</b>. </p> <p> <tcl> ############################################################################## Section {ATTACH DATABASE} attach *ATTACH BubbleDiagram attach-stmt 1 </tcl> |
︙ | ︙ | |||
3020 3021 3022 3023 3024 3025 3026 | <i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of <i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^ <p>^If the join-op is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma (",") and there is no ON or USING clause, then the result of the join is simply the cartesian product of the left and right-hand datasets. | | < < < < | | 3024 3025 3026 3027 3028 3029 3030 3031 3032 3033 3034 3035 3036 3037 3038 3039 | <i>Nlhs</i> rows of <i>Mlhs</i> columns, and the right-hand dataset of <i>Nrhs</i> rows of <i>Mrhs</i> columns, then the cartesian product is a dataset of <i>Nlhs.Nrhs</i> rows, each containing <i>Mlhs+Mrhs</i> columns.)^ <p>^If the join-op is "CROSS JOIN", "INNER JOIN", "JOIN" or a comma (",") and there is no ON or USING clause, then the result of the join is simply the cartesian product of the left and right-hand datasets. If join-op does have ON or USING clauses, those are handled according to the following bullet points: <ul> <li> <p>^(If there is an ON clause specified, then the ON expression is evaluated for each row of the cartesian product as a [boolean expression]. All rows for which the expression evaluates to false are excluded from the dataset.)^ |
︙ | ︙ | |||
3072 3073 3074 3075 3076 3077 3078 | dataset. </ul> <p>^(When more than two tables are joined together as part of a FROM clause, the join operations are processed in order from left to right. In other words, the FROM clause (A join-op-1 B join-op-2 C) is computed as ((A join-op-1 B) join-op-2 C).)^ | | > > > > > > > > > > > > > > > > | | > | 3072 3073 3074 3075 3076 3077 3078 3079 3080 3081 3082 3083 3084 3085 3086 3087 3088 3089 3090 3091 3092 3093 3094 3095 3096 3097 3098 3099 3100 3101 3102 3103 3104 3105 3106 3107 | dataset. </ul> <p>^(When more than two tables are joined together as part of a FROM clause, the join operations are processed in order from left to right. In other words, the FROM clause (A join-op-1 B join-op-2 C) is computed as ((A join-op-1 B) join-op-2 C).)^ <tcl>hd_fragment crossjoin {treats the CROSS JOIN operator specially}</tcl> <p><b>Side note: Special handling of CROSS JOIN.</b> ^There is no difference between the "INNER JOIN", "JOIN" and "," join operators. They are completely interchangable in SQLite. ^(The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators)^, but is <a href=optoverview.html#crossjoin>handled differently by the query optimizer</a> in that it prevents the query optimizer from reordering the tables in the join. An application programmer can use the CROSS JOIN operator to directly influense the algorithm that is chosen to implement the SELECT statement. Avoid using CROSS JOIN except in specific situations where manual control of the query optimizer is desired. Avoid using CROSS JOIN early in the development of an application as doing so is a <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature optimization</a>. The special handling of CROSS JOIN is an SQLite-specific feature and is not a part of standard SQL. <tcl>hd_fragment whereclause</tcl> <tcl>hd_keywords {WHERE clause}</tcl> <p><b>2. WHERE clause filtering.</b> <p>^(If a WHERE clause is specified, the WHERE expression is evaluated for each row in the input data as a [boolean expression]. All rows for which the WHERE clause expression evaluates to false are excluded from the dataset before continuing.)^ <p><b>3. Generation of the set of result rows.</b> |
︙ | ︙ | |||
3469 3470 3471 3472 3473 3474 3475 | <tcl> ############################################################################## Section {INDEXED BY} indexedby {{INDEXED BY} {NOT INDEXED}} </tcl> | | < | | > | | | 3486 3487 3488 3489 3490 3491 3492 3493 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 | <tcl> ############################################################################## Section {INDEXED BY} indexedby {{INDEXED BY} {NOT INDEXED}} </tcl> <p>^The INDEXED BY phrase forces the [SQLite query planner] to use a particular named index on a [DELETE], [SELECT], or [UPDATE] statement. The INDEXED BY phrase is an extension that is particular to SQLite and is not portable to other SQL database engines. The INDEXED BY phrase can be seen in the following syntax diagrams:</p> <tcl> BubbleDiagram qualified-table-name BubbleDiagram single-source </tcl> <p>^The "INDEXED BY index-name" phrase specifies that the named index must be used in order to look up values on the preceding table. ^If index-name does not exist or cannot be used for the query, then the preparation of the SQL statement fails. ^(The "NOT INDEXED" clause specifies that no index shall be used when accessing the preceding table, including implied indices create by UNIQUE and PRIMARY KEY constraints. However, the INTEGER PRIMARY KEY can still be used to look up entries even when "NOT INDEXED" is specified.)^</p> |
︙ | ︙ | |||
3512 3513 3514 3515 3516 3517 3518 | Developers are admonished to omit all use of INDEXED BY during application design, implementation, testing, and tuning. If INDEXED BY is to be used at all, it should be inserted at the very end of the development process when "locking down" a design.</p> <h3>See Also:</h3> | > > > > > > > > > > > > > | > | 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 | Developers are admonished to omit all use of INDEXED BY during application design, implementation, testing, and tuning. If INDEXED BY is to be used at all, it should be inserted at the very end of the development process when "locking down" a design.</p> <h3>See Also:</h3> <ol> <li><p>The [query planner checklist] describes steps that application developers should following to help resolve query planner problems. Notice the that the use of INDEXED BY is a last resort, to be used only when all other measures fail.</p> <li><p>[upluscontrol|The unary + operator] can be used to disqualify terms in the WHERE clause from use by indices. Careful use of unary + can sometimes help prevent the query planner from choosing a poor index without restricting it to using one specific index. Careful placement of unary + operators is a better method for controlling which indices are used a query.</p> <li><p>The [sqlite3_stmt_status()] C/C++ interface together with the [SQLITE_STMTSTATUS_FULLSCAN_STEP] and [SQLITE_STMTSTATUS_SORT] verbs can be used to detect at run-time when an SQL statement is not making effective use of indices. Many applications may prefer to use the [sqlite3_stmt_status()] interface to detect index misuse rather than the INDEXED BY phrase described here.</p> </ol> <tcl> ############################################################################# # A list of keywords. A asterisk occurs after the keyword if it is on # the fallback list. # set keyword_list [lsort { |
︙ | ︙ |
Changes to pages/optoverview.in.
1 | <title>The SQLite Query Optimizer Overview</title> | | | 1 2 3 4 5 6 7 8 9 | <title>The SQLite Query Optimizer Overview</title> <tcl>hd_keywords {optimizer} {query planner} {SQLite query planner}</tcl> <tcl> proc CODE {text} { hd_puts "<blockquote><pre>" hd_puts $text hd_puts "</pre></blockquote>" } |
︙ | ︙ | |||
37 38 39 40 41 42 43 | } else { set num $level(1) for {set i 2} {$i<=$n} {incr i} { append num .$level($i) } } incr n 1 | > > > | > > > > > > > > > > < < < < < < < < < < < < < < < < | 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | } else { set num $level(1) for {set i 2} {$i<=$n} {incr i} { append num .$level($i) } } incr n 1 if {$n==1} { hd_puts "<h$n align='center'>$num $name</h$n>" } else { hd_puts "<h$n>$num $name</h$n>" } } HEADING 0 {The SQLite Query Planner} PARAGRAPH { This document provides overview of how the query planner and optimizer for SQLite works. } PARAGRAPH { Given a single SQL statement, there might be dozens, hundreds, or even thousands of ways to implement that statement, depending on the complexity of the statement itself and of the underlying database schema. The task of the query planner is to select an algorithm from among the many choices that provides the answer with a minimum of disk I/O and CPU overhead. } PARAGRAPH { With release 3.8.0, the SQLite query planner was reimplemented as the [Next Generation Query Planner] or "NGQP". All of the features, techniques, and algorithms described in this document are applicable to both the pre-3.8.0 legacy query planner and to the NGQP. For further information on how the NGQP differs from the legacy query planner, see the [NGQP | detailed description of the NGQP]. } HEADING 1 {WHERE clause analysis} where_clause PARAGRAPH { ^The WHERE clause on a query is broken up into "terms" where each term is separated from the others by an AND operator. If the WHERE clause is composed of constraints separate by the OR operator then the entire clause is considered to be a single "term" to which the <a href="#or_opt">OR-clause optimization</a> is applied. } PARAGRAPH { ^All terms of the WHERE clause are analyzed to see if they can be satisfied using indices. ^(To be usable by an index a term must be of one of the following forms: } SYNTAX { /column/ = /expression/ /column/ > /expression/ /column/ >= /expression/ |
︙ | ︙ | |||
112 113 114 115 116 117 118 | CODE { CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z); } PARAGRAPH { Then the index might be used if the initial columns of the index (columns a, b, and so forth) appear in WHERE clause terms.)^ ^The initial columns of the index must be used with | | | 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | CODE { CREATE INDEX idx_ex1 ON ex1(a,b,c,d,e,...,y,z); } PARAGRAPH { Then the index might be used if the initial columns of the index (columns a, b, and so forth) appear in WHERE clause terms.)^ ^The initial columns of the index must be used with the *=* or *IN* or *IS NULL* operators. ^The right-most column that is used can employ inequalities. ^For the right-most column of an index that is used, there can be up to two inequalities that must sandwich the allowed values of the column between two extremes. } PARAGRAPH { ^It is not necessary for every column of an index to appear in a |
︙ | ︙ | |||
197 198 199 200 201 202 203 | PARAGRAPH { ^(If a term of the WHERE clause is of the following form: } SYNTAX { /expr1/ BETWEEN /expr2/ AND /expr3/ } PARAGRAPH { | | > > | | | | 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | PARAGRAPH { ^(If a term of the WHERE clause is of the following form: } SYNTAX { /expr1/ BETWEEN /expr2/ AND /expr3/ } PARAGRAPH { Then two "virtual" terms are added as follows: } SYNTAX { /expr1/ >= /expr2/ AND /expr1/ <= /expr3/ } PARAGRAPH {)^ ^Virtual terms are used for analysis only and do not cause any VDBE code to be generated. ^If both virtual terms end up being used as constraints on an index, then the original BETWEEN term is omitted and the corresponding test is not performed on input rows. ^Thus if the BETWEEN term ends up being used as an index constraint no tests are ever performed on that term. ^On the other hand, the virtual terms themselves never causes tests to be performed on input rows. ^Thus if the BETWEEN term is not used as an index constraint and instead must be used to test input rows, the <i>expr1</i> expression is only evaluated once. } HEADING 1 {OR optimizations} or_opt hd_keywords {or optimization} PARAGRAPH { WHERE clause constraints that are connected by OR instead of AND can be handled in two different ways. ^(If a term consists of multiple subterms containing a common column name and separated by OR, like this: } SYNTAX { /column/ = /expr1/ OR /column/ = /expr2/ OR /column/ = /expr3/ OR ... } PARAGRAPH { Then that term is rewritten as follows: } SYNTAX { /column/ IN (/expr1/,/expr2/,/expr3/,...) } PARAGRAPH {)^ ^The rewritten term then might go on to constrain an index using the normal rules for *IN* operators. ^Note that <i>column</i> must be the same column in every OR-connected subterm, although the column can occur on either the left or the right side of the *=* operator. |
︙ | ︙ | |||
258 259 260 261 262 263 264 | *a=5* or *x>y* or they can be LIKE or BETWEEN expressions, or a subterm can be a parenthesized list of AND-connected sub-subterms. ^Each subterm is analyzed as if it were itself the entire WHERE clause in order to see if the subterm is indexable by itself. ^If <u>every</u> subterm of an OR clause is separately indexable then the OR clause might be coded such that a separate index is used to evaluate each term of the OR clause. One way to think about how | | | 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 | *a=5* or *x>y* or they can be LIKE or BETWEEN expressions, or a subterm can be a parenthesized list of AND-connected sub-subterms. ^Each subterm is analyzed as if it were itself the entire WHERE clause in order to see if the subterm is indexable by itself. ^If <u>every</u> subterm of an OR clause is separately indexable then the OR clause might be coded such that a separate index is used to evaluate each term of the OR clause. One way to think about how SQLite uses separate indices for each OR clause term is to imagine that the WHERE clause where rewritten as follows: } SYNTAX { rowid IN (SELECT rowid FROM /table/ WHERE /expr1/ UNION SELECT rowid FROM /table/ WHERE /expr2/ UNION SELECT rowid FROM /table/ WHERE /expr3/) } |
︙ | ︙ | |||
486 487 488 489 490 491 492 | ^Inner joins can be freely reordered. ^However a left outer join is neither commutative nor associative and hence will not be reordered. ^Inner joins to the left and right of the outer join might be reordered if the optimizer thinks that is advantageous but the outer joins are always evaluated in the order in which they occur. } PARAGRAPH { | > > > > > > > | | | > | 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 | ^Inner joins can be freely reordered. ^However a left outer join is neither commutative nor associative and hence will not be reordered. ^Inner joins to the left and right of the outer join might be reordered if the optimizer thinks that is advantageous but the outer joins are always evaluated in the order in which they occur. } PARAGRAPH { SQLite [treats the CROSS JOIN operator specially]. The CROSS JOIN operator is commutative in theory. But SQLite chooses to never reorder tables in a CROSS JOIN. This provides a mechanism by which the programmer can force SQLite to choose a particular loop nesting order. } PARAGRAPH { ^When selecting the order of tables in a join, SQLite uses an efficient polynomial-time algorithm. ^Because of this, SQLite is able to plan queries with 50- or 60-way joins in a matter of microseconds. } PARAGRAPH { Join reordering is automatic and usually works well enough that programmers do not have to think about it, especially if [ANALYZE] has been used to gather statistics about the available indices. But occasionally some hints from the programmer are needed. Consider, for example, the following schema: |
︙ | ︙ | |||
532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 | This query asks for is all information about edges that go from nodes labeled "alice" to nodes labeled "bob". The query optimizer in SQLite has basically two choices on how to implement this query. (There are actually six different choices, but we will only consider two of them here.) Pseudocode below demonstrating these two choices. } PARAGRAPH {Option 1:} CODE { foreach n1 where n1.name='alice' do: foreach n2 where n2.name='bob' do: foreach e where e.orig=n1.id and e.dest=n2.id return n1.*, n2.*, e.* end end end } PARAGRAPH {Option 2:} CODE { foreach n1 where n1.name='alice' do: foreach e where e.orig=n1.id do: foreach n2 where n2.id=e.dest and n2.name='bob' do: return n1.*, n2.*, e.* end | > > | 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 | This query asks for is all information about edges that go from nodes labeled "alice" to nodes labeled "bob". The query optimizer in SQLite has basically two choices on how to implement this query. (There are actually six different choices, but we will only consider two of them here.) Pseudocode below demonstrating these two choices. } hd_fragment option1 PARAGRAPH {Option 1:} CODE { foreach n1 where n1.name='alice' do: foreach n2 where n2.name='bob' do: foreach e where e.orig=n1.id and e.dest=n2.id return n1.*, n2.*, e.* end end end } hd_fragment option2 PARAGRAPH {Option 2:} CODE { foreach n1 where n1.name='alice' do: foreach e where e.orig=n1.id do: foreach n2 where n2.id=e.dest and n2.name='bob' do: return n1.*, n2.*, e.* end |
︙ | ︙ | |||
598 599 600 601 602 603 604 | SQLite choose by default? ^(As of version 3.6.18, without running [ANALYZE], SQLite will choose option 2.)^ ^But if the [ANALYZE] command is run in order to gather statistics, a different choice might be made if the statistics indicate that the alternative is likely to run faster. } | | > | | | | > | | | > > > > < < > > | < > > > > > | | | | > | | > > > > > > | 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 | SQLite choose by default? ^(As of version 3.6.18, without running [ANALYZE], SQLite will choose option 2.)^ ^But if the [ANALYZE] command is run in order to gather statistics, a different choice might be made if the statistics indicate that the alternative is likely to run faster. } HEADING 2 {Manual Control Of Query Plans Using SQLITE_STAT Tables} \ manctrl {Manual Control Of Query Plans Using SQLITE_STAT Tables} PARAGRAPH { SQLite provides the ability for advanced programmers to exercise control over the query plan chosen by the optimizer. One method for doing this is to fudge the [ANALYZE] results in the [sqlite_stat1] and [sqlite_stat3] tables. That approach is not recommended except for the one scenario described in the next paragraph. } PARAGRAPH { For a program that uses an SQLite database as its [application file-format], when a new database instances is first created the [ANALYZE] command is ineffective because the database contain no data from which to gather statistics. In that case, one could construct a large prototype database containing typical data during development and run the [ANALYZE] command on this prototype database to gather statistics, then save the prototype statistics as part of the application. After deployment, when the application goes to create a new database file, it can run the [ANALYZE] command in order to create the [sqlite_stat1] and [sqlite_stat3] tables, then copy the precomputed statistics obtained from the prototype database into these new statistics tables. In that way, statistics from large working data sets can be preloaded into newly created application files. } HEADING 2 {Manual Control Of Query Plans Using CROSS JOIN} \ crossjoin {Manual Control Of Query Plans Using CROSS JOIN} {CROSS JOIN} PARAGRAPH { Programmers can force SQLite to use a particular loop nesting order for a join by using the CROSS JOIN operator instead of just JOIN, INNER JOIN, NATURAL JOIN, or a "," join. Though CROSS JOINs are communitive in theory, SQLite chooses to never reorder the tables in a CROSS JOIN. Hence, the left table of a CROSS JOIN will always be in an outer loop relative to the right table. } PARAGRAPH { ^(In the following query, the optimizer is free to reorder the tables of FROM clause anyway it sees fit: } CODE { SELECT * FROM node AS n1, edge AS e, node AS n2 WHERE n1.name = 'alice' AND n2.name = 'bob' AND e.orig = n1.id AND e.dest = n2.id; } PARAGRAPH {)^ ^(But in the following logically equivalent formulation of the same query, the substitution of "CROSS JOIN" for the "," means that the order of tables must be N1, E, N2. } CODE { SELECT * FROM node AS n1 CROSS JOIN edge AS e CROSS JOIN node AS n2 WHERE n1.name = 'alice' AND n2.name = 'bob' AND e.orig = n1.id AND e.dest = n2.id; } PARAGRAPH {)^ In the latter query, the query plan must be <a href="#option2">option 2</a>. ^Note that you must use the keyword "CROSS" in order to disable the table reordering optimization; INNER JOIN, NATURAL JOIN, JOIN, and other similar combinations work just like a comma join in that the optimizer is free to reorder tables as it sees fit. (Table reordering is also disabled on an outer join, but that is because outer joins are not associative or commutative. Reordering tables in in OUTER JOIN changes the result.) } PARAGRAPH { See "[The Fossil NGQP Upgrade Case Study]" for another real-world example of using CROSS JOIN to manually control the nesting order of a join. The [query planner checklist] found later in the same document provides further guidance on manual control of the query planner. } HEADING 1 {Choosing between multiple indices} multi_index PARAGRAPH { Each table in the FROM clause of a query can use at most one index (except when the <a href="#or_opt">OR-clause optimization</a> comes into play) |
︙ | ︙ | |||
710 711 712 713 714 715 716 | changes so after making significant changes it might be prudent to rerun [ANALYZE]. ^The results of an ANALYZE command are only available to database connections that are opened after the ANALYZE command completes. } PARAGRAPH { The various <b>sqlite_stat</b><i>N</i> tables contain information on how | | | > > | | | | 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 | changes so after making significant changes it might be prudent to rerun [ANALYZE]. ^The results of an ANALYZE command are only available to database connections that are opened after the ANALYZE command completes. } PARAGRAPH { The various <b>sqlite_stat</b><i>N</i> tables contain information on how selective the various indices are. ^(For example, the [sqlite_stat1] table might indicate that an equality constraint on column x reduces the search space to 10 rows on average, whereas an equality constraint on column y reduces the search space to 3 rows on average. In that case, SQLite would prefer to use index ex2i2 since that index is more selective.)^ } HEADING 2 {Disqualifying WHERE Clause Terms Using Unary-"+"} \ {uplus} {*upluscontrol} PARAGRAPH { ^Terms of the WHERE clause can be manually disqualified for use with indices by prepending a unary *+* operator to the column name. ^The unary *+* is a no-op and will not generate any byte code in the prepared statement. But the unary *+* operator will prevent the term from constraining an index. ^(So, in the example above, if the query were rewritten as: } CODE { SELECT z FROM ex2 WHERE +x=5 AND y=6; } PARAGRAPH { The *+* operator on the *x* column will prevent that term from |
︙ | ︙ | |||
768 769 770 771 772 773 774 | reduce the search space by a factor of only 10. So the ex2i1 index should be preferred. } PARAGRAPH { ^SQLite will make this determination, but only if it has been compiled with [SQLITE_ENABLE_STAT3]. ^The [SQLITE_ENABLE_STAT3] option causes the [ANALYZE] command to collect a histogram of column content in the | | | 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 | reduce the search space by a factor of only 10. So the ex2i1 index should be preferred. } PARAGRAPH { ^SQLite will make this determination, but only if it has been compiled with [SQLITE_ENABLE_STAT3]. ^The [SQLITE_ENABLE_STAT3] option causes the [ANALYZE] command to collect a histogram of column content in the [sqlite_stat3] table and to use this histogram to make a better guess at the best query to use for range constraints such as the above. } PARAGRAPH { ^The histogram data is only useful if the right-hand side of the constraint is a simple compile-time constant or [parameter] and not an expression. } PARAGRAPH { |
︙ | ︙ | |||
792 793 794 795 796 797 798 | PARAGRAPH { Here the inequalities are on columns x and y which are not the left-most index columns. ^Hence, the histogram data which is collected no left-most column of indices is useless in helping to choose between the range constraints on columns x and y. } | | > > > > > > | > > > > > > > > > > | 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 | PARAGRAPH { Here the inequalities are on columns x and y which are not the left-most index columns. ^Hence, the histogram data which is collected no left-most column of indices is useless in helping to choose between the range constraints on columns x and y. } HEADING 1 {Covering Indices} PARAGRAPH { When doing an indexed lookup of a row, the usual procedure is to do a binary search on the index to find the index entry, then extract the [rowid] from the index and use that [rowid] to do a binary search on the original table. Thus a typical indexed lookup involves two binary searches. ^If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast. } PARAGRAPH { When an index contains all of the data needed for a query and when the original table never needs to be consulted, we call that index a "covering index". } HEADING 1 {ORDER BY optimizations} order_by PARAGRAPH { ^SQLite attempts to use an index to satisfy the ORDER BY clause of a query when possible. ^When faced with the choice of using an index to satisfy WHERE clause constraints or satisfying an ORDER BY clause, SQLite does the same work analysis described above and chooses the index that it believes will result in the fastest answer. } PARAGRAPH { ^SQLite will also attempt to use indices to help satisfy GROUP BY clauses and the DISTINCT keyword. If the nested loops of the join can be arranged such that are equivalent for the GROUP BY or for the DISTINCT are consecutive, then the GROUP BY or DISTINCT logic can determine if the current row is part of the same group or if the current row is distinct simply by comparing the current row to the previous row. This can be much faster than the alternative of comparing each row to all prior rows. } HEADING 1 {Subquery flattening} flattening hd_keywords {flattening optimization} PARAGRAPH { When a subquery occurs in the FROM clause of a SELECT, the simplest |
︙ | ︙ | |||
948 949 950 951 952 953 954 | } HEADING 1 {Automatic Indices} autoindex {automatic indexing} {Automatic indexing} PARAGRAPH { ^(When no indices are available to aid the evaluation of a query, SQLite might create an automatic index that lasts only for the duration | | | | 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 | } HEADING 1 {Automatic Indices} autoindex {automatic indexing} {Automatic indexing} PARAGRAPH { ^(When no indices are available to aid the evaluation of a query, SQLite might create an automatic index that lasts only for the duration of a single SQL statement.)^ Since the cost of constructing the automatic index is O(NlogN) (where N is the number of entries in the table) and the cost of doing a full table scan is only O(N), an automatic index will only be created if SQLite expects that the lookup will be run more than logN times during the course of the SQL statement. Consider an example: } CODE { CREATE TABLE t1(a,b); |
︙ | ︙ | |||
988 989 990 991 992 993 994 | of the t1.b column. If each table contains N rows, SQLite expects that the subquery will run N times, and hence it will believe it is faster to construct an automatic, transient index on t2 first and then using that index to satisfy the N instances of the subquery. } PARAGRAPH { The automatic indexing capability can be disabled at run-time using | | > > > | > > > > > > > > > | 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 | of the t1.b column. If each table contains N rows, SQLite expects that the subquery will run N times, and hence it will believe it is faster to construct an automatic, transient index on t2 first and then using that index to satisfy the N instances of the subquery. } PARAGRAPH { The automatic indexing capability can be disabled at run-time using the [automatic_index pragma]. Automatic indexing is turned on by default, but this can be changed so that automatic indexing is off by default using the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option. The ability to create automatic indices can be completely disabled by compiling with the [SQLITE_OMIT_AUTOMATIC_INDEX] compile-time option. } PARAGRAPH { In SQLite version 3.8.0, an [SQLITE_WARNING_AUTOINDEX] message is sent to the [error log] every time a statement is prepared that uses an automatic index. Application developers can and should use these warnings to identify the need for new persistent indices in the schema. } PARAGRAPH { Future releases of SQLite may disable automatic indices by default. } </tcl> |
Changes to pages/pragma.in.
︙ | ︙ | |||
136 137 138 139 140 141 142 | } Pragma {automatic_index} { <p>^(<b>PRAGMA automatic_index; <br>PRAGMA automatic_index = </b><i>boolean</i><b>;</b></p> <p>Query, set, or clear the [automatic indexing] capability.)^ | | > | 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | } Pragma {automatic_index} { <p>^(<b>PRAGMA automatic_index; <br>PRAGMA automatic_index = </b><i>boolean</i><b>;</b></p> <p>Query, set, or clear the [automatic indexing] capability.)^ <p>^[Automatic indexing] is enabled by default as of version 3.7.17, but this might change in future releases of SQLite. } Pragma {auto_vacuum} { <p><b>PRAGMA auto_vacuum;<br> PRAGMA auto_vacuum = </b> <i>0 | NONE | 1 | FULL | 2 | INCREMENTAL</i><b>;</b></p> |
︙ | ︙ |
Changes to pages/queryplanner-ng.in.
︙ | ︙ | |||
8 9 10 11 12 13 14 | } proc code {txt} { hd_puts "<center><table><tr><td><pre>\n" regsub {^[ \n]*\n} $txt {} txt hd_puts [string trimright $txt]\n hd_puts "</pre></table></center>\n" } | | | 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | } proc code {txt} { hd_puts "<center><table><tr><td><pre>\n" regsub {^[ \n]*\n} $txt {} txt hd_puts [string trimright $txt]\n hd_puts "</pre></table></center>\n" } hd_keywords {next generation query planner} {Next Generation Query Planner} NGQP </tcl> <h1 align="center">The Next Generation Query Planner</h1> <h2>1.0 Introduction</h2> <p> The task of the "query planner" is to figure |
︙ | ︙ | |||
31 32 33 34 35 36 37 | solves those problems.</p> <p>The NGQP is almost always better than the legacy query planner. However, there may exist legacy applications that unknowingly depend on undefined and/or suboptimal behavior in the legacy query planner, and upgrading to the NGQP on those legacy applications could cause performance regressions. This risk is considered and a checklist is provided | | > > > > > | | 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | solves those problems.</p> <p>The NGQP is almost always better than the legacy query planner. However, there may exist legacy applications that unknowingly depend on undefined and/or suboptimal behavior in the legacy query planner, and upgrading to the NGQP on those legacy applications could cause performance regressions. This risk is considered and a checklist is provided for reducing the risk and for fixing any issues that do arise.</p> <p>This document focuses on the NGQP. For a more general overview of the SQLite query planner that encompasses the entire history of SQLite, see "[query planner | The SQLite Query Optimizer Overview]".</p> <h2>2.0 Background</h2> <p>For simple queries against a single table with few indices, there is usually an obvious choice for the best algorithm. But for larger and more complex queries, such as multi-way joins with many indices and subqueries, there can be hundreds, thousands, or millions of reasonable algorithms for computing the result. The job of the query planner is to chose a single "best" query plan from this multitude of possibilities.</p> <p>The existence of a query planner is what makes SQL database engines so amazingly useful and powerful. (This is true of all SQL database engines, not just SQLite.) The query planner frees the programmer from the chore of selecting a particular query plan, and thereby allows the programmer to focus more mental energy on higher-level application issues and on providing more value to the end user. For simple queries where the choice of query plan is obvious, this is convenient but not hugely important. But as applications and schemas and queries grow more complex, a clever query planner can greatly speed and simplify the work of application development. There is amazing power in being about to tell the database engine what content is desired, and then let the database engine figure out the best way to retrieve that content.</p> |
︙ | ︙ | |||
97 98 99 100 101 102 103 | <ol type="a"> <li>the database schema does not change in significant ways such as adding or dropping indices,</li> <li>the ANALYZE command is not run, </li> <li>SQLite is not compiled with [SQLITE_ENABLE_STAT3], and</li> <li>the same version of SQLite is used.</li> </ol> | | > | 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 | <ol type="a"> <li>the database schema does not change in significant ways such as adding or dropping indices,</li> <li>the ANALYZE command is not run, </li> <li>SQLite is not compiled with [SQLITE_ENABLE_STAT3], and</li> <li>the same version of SQLite is used.</li> </ol> The SQLite stability guarantee means that if all of your queries run efficiently during testing, and if your application does not change the schema, then SQLite will not suddenly decide to start using a different query plan, possibly causing a performance problem, after your application is released to users. If your application works in the lab, it will continue working the same way after deployment.</p> <p>Enterprise-class client/server SQL database engines do not normally |
︙ | ︙ | |||
119 120 121 122 123 124 125 | for the evolving structure of the data. But sometimes the new query plan will cause a performance reduction. With a client/server database engine, there is typically a Database Administrator (DBA) on hand to deal with these rare problems as they come up. But DBAs are not available to fix problems in an embedded database like SQLite, and hence SQLite is careful to ensure that plans do not change unexpectedly after deployment.</p> | | | | > | 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | for the evolving structure of the data. But sometimes the new query plan will cause a performance reduction. With a client/server database engine, there is typically a Database Administrator (DBA) on hand to deal with these rare problems as they come up. But DBAs are not available to fix problems in an embedded database like SQLite, and hence SQLite is careful to ensure that plans do not change unexpectedly after deployment.</p> <p>The SQLite stability guarantee applies equally to the legacy query planner and to the NGQP.</p> <p>It is important to note that changing versions of SQLite might cause changes in query plans, as new enhancements are added to the query planner. The same version of SQLite it will always pick the same query plan, but if you relink your application to use a different version of SQLite, then query plans might change. In rare cases, an SQLite version change might lead to a performance regression. This is one reason you should consider statically linking your applications against SQLite rather than try to use a system-wide SQLite shared library which might change without your knowledge or control.</p> <h2>3.0 A Difficult Case</h2> <p> |
︙ | ︙ | |||
354 355 356 357 358 359 360 361 362 363 364 365 366 | a small risk of introducing performance regressions. The problem here is not that the NGQP is incorrect or buggy or inferior to the legacy query planner. Given accurate cost estimates, the NGQP will always pick a better plan than older query planners. The problem is that cost estimates might sometimes be inaccurate and the legacy query planner just happened to stumbled over a good plan by stupid luck while the NGQP is not as lucky.</p> <h3>4.1 Case Study: Upgrading Fossil to the NGQP</h3> <p>The <a href="http://www.fossil-scm.org/">Fossil DVCS</a> is the version control system used to track all of the SQLite source code. A Fossil repository is an SQLite database file. (Readers are invited to ponder this recursion as an independent exercise.) | > < < | | | | > > | 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 | a small risk of introducing performance regressions. The problem here is not that the NGQP is incorrect or buggy or inferior to the legacy query planner. Given accurate cost estimates, the NGQP will always pick a better plan than older query planners. The problem is that cost estimates might sometimes be inaccurate and the legacy query planner just happened to stumbled over a good plan by stupid luck while the NGQP is not as lucky.</p> <tcl>hd_fragment fossilcasestudy {The Fossil NGQP Upgrade Case Study}</tcl> <h3>4.1 Case Study: Upgrading Fossil to the NGQP</h3> <p>The <a href="http://www.fossil-scm.org/">Fossil DVCS</a> is the version control system used to track all of the SQLite source code. A Fossil repository is an SQLite database file. (Readers are invited to ponder this recursion as an independent exercise.) Fossil is both the version-control system for SQLite and a test platform for SQLite. Whenever enhancements are made to SQLite, Fossil is one of the first applications to test and evaluate those enhancements. So Fossil was an early adopter of the NGQP.</p> <p>Unfortunately, the NGQP caused a performance regression in Fossil.</p> <p>One of the many reports that Fossil makes available is a timeline of changes to a single branch showing all merges in and out of that branch. See <a href="http://www.sqlite.org/src/timeline?nd&n=200&r=trunk">http://www.sqlite.org/src/timeline?nd&n=200&r=trunk</a> for a typical example of such a report. Generating such a report normally takes just a few milliseconds. But after upgrading to the NGQP we noticed that |
︙ | ︙ | |||
432 433 434 435 436 437 438 | The first condition causes all of the trunk check-ins to be displayed and the second and third cause check-ins that merge into or are derived from the trunk to also be included. The three conditions are implemented by the three OR-connected EXISTS statements in the WHERE clause of the query. The slowdown that occurred with the NGQP was caused by the second and third conditions. The problem is the same in each, so we will examine | | | 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 | The first condition causes all of the trunk check-ins to be displayed and the second and third cause check-ins that merge into or are derived from the trunk to also be included. The three conditions are implemented by the three OR-connected EXISTS statements in the WHERE clause of the query. The slowdown that occurred with the NGQP was caused by the second and third conditions. The problem is the same in each, so we will examine just the second one. The subquery of the second condition can be rewritten (with minor and immaterial simplifications) as follows:</p> <blockquote><pre> SELECT 1 FROM plink JOIN tagxref ON tagxref.rid=plink.cid WHERE tagxref.tagid=$trunk |
︙ | ︙ | |||
512 513 514 515 516 517 518 | so the first field of TAGXREF_I1 will be of little help in narrowing down the search. </p> <p> The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this query, unless [ANALYZE] has been run on the database. The [ANALYZE] command | | > | | 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 | so the first field of TAGXREF_I1 will be of little help in narrowing down the search. </p> <p> The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this query, unless [ANALYZE] has been run on the database. The [ANALYZE] command gathers statistics on the quality of the various indices and stores those statistics in [SQLITE_STAT1] table. Having access to this statistical information, the NGQP easily choses algorithm-1 as the best algorithm, by a wide margin.</p> <p>Why didn't the legacy query planner choose algorithm-2? Easy: because the NN algorithm never even considered algorithm-2. Graphs of the planning problem look like this:</p> |
︙ | ︙ | |||
556 557 558 559 560 561 562 | <h3>4.2 Fixing The Problem</h3> <p>Running [ANALYZE] on the repository database immediately fixed the performance problem. However, we want Fossil to be robust and to always work quickly regardless of whether or not its repository has been analyzed. For this reason, the query was modify to use the CROSS JOIN operator | | < < | | | 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 | <h3>4.2 Fixing The Problem</h3> <p>Running [ANALYZE] on the repository database immediately fixed the performance problem. However, we want Fossil to be robust and to always work quickly regardless of whether or not its repository has been analyzed. For this reason, the query was modify to use the CROSS JOIN operator instead of the plain JOIN operator. SQLite will not reorder the tables of a CROSS JOIN. This is a long-standing feature of SQLite that is specifically designed to allow knowledgeable programmers to enforce a particular loop nesting order. Once the join was changed to CROSS JOIN (the addition of a single keyword) the NGQP was forced to chose the faster algorithm-1 regardless of whether or not statistical information had been gathered using ANALYZE.</p> <p>We say that algorithm-1 "faster", but this is not strictly true. Algorithm-1 is faster in common repositories, but it is possible to construct a repository in which every check-in is on a different uniquely-named branch and all check-ins are children of the root check-in. In that case, TAGXREF_I1 would become more selective than PLINK_I1 and algorithm-2 really would be the faster choice. However such repositories are very unlikely to appear in practice and so hard-coding the loop nested order using the CROSS JOIN syntax is a reasonable solution to the problem in this case.</p> <tcl>hd_fragment howtofix {query planner checklist}</tcl> <h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2> <ol> <li><p><b>Don't panic!</b> Cases where the query planner picks an inferior plan are actually quite rare. You are unlikely to run accross any problems in your application. If you are not having performance issues, the you do not need to worry |
︙ | ︙ | |||
602 603 604 605 606 607 608 | <li><p><b>Avoid creating low-quality indices.</b>. A low-quality index (for the purpose of this checklist) is one where there are more than 10 or 20 rows in the table that have the same value for the left-most column of the index. In particular, avoid using boolean or "enum" columns as the left-most columns of your indices.</p> | | | | | | | | | | > > > > > > > > > > > | | 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 | <li><p><b>Avoid creating low-quality indices.</b>. A low-quality index (for the purpose of this checklist) is one where there are more than 10 or 20 rows in the table that have the same value for the left-most column of the index. In particular, avoid using boolean or "enum" columns as the left-most columns of your indices.</p> <p>The Fossil performance problem described in the previous section of this document arose because there were over ten thousands entries in the TAGXREF table with the same value for the left-most column (the TAGID column) of the TAGXREF_I1 index.</p> <li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b> Low-quality indices will not confuse the query planner as long as the query planner knows that the indices are of low quality. And the way the query planner knows this is by the content of the [SQLITE_STAT1] table, which is computed by the ANALYZE command.</p> <p>Of course, ANALYZE only works effectively if you have a significant amount of content in your database in the first place. When creating a new database that you expect to accumulate a lot of data, you can run the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table, then prepopulate the SQLITE_STAT1 table (using ordinary INSERT statements) with content that describes a typical database for your application - perhaps content that you extracted after running ANALYZE on a well-populated template database in the lab.</p> <li><p><b>Instrument your code.</b> Add logic that lets you know quickly and easily which queries are taking too much time. Then work on just those specific queries.</p> <li><p><b>Use the [CROSS JOIN] syntax to enforce a particular loop nesting order on queries that might use low-quality indices in an unanalyzed database.</b> SQLite [treats the CROSS JOIN operator specially], forcing the table to the left to be an an outer loop relative to the table on the right.</p> <p>Avoid this step if possible, as it defeats one of the huge advantages of the whole SQL language concept, specifically that the application programmer does not need to get involved with query planning. If you do use CROSS JOIN, wait until late in your development cycle to do so, and comment the use of CROSS JOIN carefully so that you can take it out later if possible. Avoid using CROSS JOIN early in the development cycle as doing so is a premature optimization, which is well known to be <a href="http://c2.com/cgi/wiki?PrematureOptimization">the root of all evil</a>.</p> <li><p><b>Use unary "+" operators to disqualify WHERE clause terms.</b> If the query planner insists of selecting a poor-quality index for a particular query when a much higher-quality index is available, then [upluscontrol | careful use of unary "+" operators] in the WHERE clause can force the query planner away from the poor-quality index. Avoid using this trick if at all possible, and especially avoid it early in the application development cycle. Beware that adding a unary "+" operator to an equality expression might change the result of that expression if <a href="datatype3.html#affinity">type affinity</a> is involved.</p> <li><p><b>Use the [INDEXED BY] syntax to enforce the selection of particular indices on problem queries.</b> As with the previous two bullets, avoid this step if possible, and especially avoid doing this early in development as it is clearly a premature optimization.</p> </ol> <h2>6.0 Summary</h2> <p>The query planner in SQLite normally does a terrific job of selecting |
︙ | ︙ |