Documentation Source Text

Check-in [9e12c649c9]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Minor enhancements and updates to various documents.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 9e12c649c9ba5aafb8145b387e97d48e055f845049da86c592681f5b9e8b3fb0
User & Date: drh 2018-11-24 20:24:59
Context
2018-11-24
20:58
In the GeoPoly documentation, mention that geopoly_ccw() can be used to correct vertex order after geopoly_xform(). check-in: bfc897c9c0 user: drh tags: trunk
20:24
Minor enhancements and updates to various documents. check-in: 9e12c649c9 user: drh tags: trunk
19:14
Update the speed-and-size spreadsheet and the cpu.html page. Also make minor tweaks to the omitted.html page. check-in: 555bf82e10 user: drh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/amalgamation.in.

     5      5   
     6      6   <h1>Executive Summary</h1>
     7      7   
     8      8   <p>Over 100 separate source files are concatenated into a
     9      9   single large files of C-code named "sqlite3.c" and
    10     10   called "the amalgamation". The amalgamation
    11     11   contains everything an application needs to embed SQLite.
    12         -The amalgamation file is more than 180,000 lines long and over 6
    13         -megabytes in size.
           12  +The amalgamation file is more than 220,000 lines long and over 7.5
           13  +megabytes in size (as of 2018-11-24).
    14     14   
    15     15   <p>Combining all the code for SQLite into one big file makes SQLite
    16     16   easier to deploy &mdash; there is just one file to keep track of.  
    17     17   And because all code is in
    18     18   a single translation unit, compilers can do better inter-procedure
    19     19   optimization resulting in machine code that is between 5% and 10% faster.
    20     20   

Changes to pages/arch.in.

   219    219   Memory allocation, caseless string comparison routines, 
   220    220   portable text-to-number conversion routines, and other utilities
   221    221   are located in <file>util.c</file>.
   222    222   Symbol tables used by the parser are maintained by hash tables found
   223    223   in <file>hash.c</file>.  The <file>utf.c</file> source file contains Unicode
   224    224   conversion subroutines.
   225    225   SQLite has its own private implementation of 
   226         -[sqlite3_mprintf|printf()] (with
          226  +[built-in printf|printf()] (with
   227    227   some extensions) in <file>printf.c</file> and its own
   228    228   pseudo-random number generator (PRNG) in <file>random.c</file>.
   229    229   </p>
   230    230   
   231    231   <h1>Test Code</h1>
   232    232   
   233    233   <p>
   234    234   Files in the "src/" folder of the source tree whose names begin with
   235    235   <b>test</b> are for testing only and are not included in a standard
   236    236   build of the library.
   237    237   </p>
   238    238   }
   239    239   </tcl>

Changes to pages/optoverview.in.

    20     20   
    21     21   <p>
    22     22     Additional background information is available in the
    23     23     [indexing tutorial] document.
    24     24   
    25     25   
    26     26   <p>
    27         -  With release 3.8.0, the SQLite query planner was reimplemented as the
           27  +  With release 3.8.0 ([dateof:3.8.0]),
           28  +  the SQLite query planner was reimplemented as the
    28     29     [Next Generation Query Planner] or "NGQP".  All of the features, techniques,
    29     30     and algorithms described in this document are applicable to both the
    30     31     pre-3.8.0 legacy query planner and to the NGQP.  For further information on
    31     32     how the NGQP differs from the legacy query planner, see the 
    32     33     [NGQP | detailed description of the NGQP].
    33     34   
    34     35   
................................................................................
  1249   1250     CREATE TABLE t2(c,d);
  1250   1251     -- Insert many rows into both t1 and t2
  1251   1252     SELECT * FROM t1, t2 WHERE a=c;
  1252   1253   </codeblock>
  1253   1254   <p>
  1254   1255     In the query above, if both t1 and t2 have approximately N rows, then
  1255   1256     without any indices the query will require O(N*N) time.  On the other
  1256         -  hand, creating an index on table t2 requires O(NlogN) time and then using 
         1257  +  hand, creating an index on table t2 requires O(NlogN) time and then use
  1257   1258     that index to evaluate the query requires an additional O(NlogN) time.
  1258   1259     In the absence of [ANALYZE] information, SQLite guesses that N is one
  1259   1260     million and hence it believes that constructing the automatic index will
  1260   1261     be the cheaper approach.
  1261   1262   
  1262   1263   <p>
  1263   1264     An automatic index might also be used for a subquery:
................................................................................
  1268   1269     -- Insert many rows into both t1 and t2
  1269   1270     SELECT a, (SELECT d FROM t2 WHERE c=b) FROM t1;
  1270   1271   </codeblock>
  1271   1272   <p>
  1272   1273     In this example, the t2 table is used in a subquery to translate values
  1273   1274     of the t1.b column.  If each table contains N rows, SQLite expects that
  1274   1275     the subquery will run N times, and hence it will believe it is faster
  1275         -  to construct an automatic, transient index on t2 first and then using
         1276  +  to construct an automatic, transient index on t2 first and then use
  1276   1277     that index to satisfy the N instances of the subquery.
  1277   1278   
  1278   1279   <p>
  1279   1280     The automatic indexing capability can be disabled at run-time using
  1280   1281     the [automatic_index pragma].  Automatic indexing is turned on by
  1281   1282     default, but this can be changed so that automatic indexing is off
  1282   1283     by default using the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option.

Changes to pages/queryplanner-ng.in.

    36     36   However, there may exist legacy applications that unknowingly depend on 
    37     37   undefined and/or suboptimal behavior in the legacy query planner, and
    38     38   upgrading to the NGQP on those legacy applications could cause performance
    39     39   regressions.  This risk is considered and a checklist is provided
    40     40   for reducing the risk and for fixing any issues that do arise.</p>
    41     41   
    42     42   <p>This document focuses on the NGQP.  For a more general overview of the
    43         -SQLite query planner that encompasses the entire history of SQLite, see
    44         -"[query planner | The SQLite Query Optimizer Overview]".</p>
           43  +SQLite query planner that encompasses the entire history of SQLite, see the
           44  +"[query planner | The SQLite Query Optimizer Overview]" and
           45  +"[indexing | How Indexes Work]" documents.</p>
    45     46   
    46     47   <h1> Background</h1>
    47     48   
    48         -<p>For simple queries against a single table with few indices, there is usually
           49  +<p>For simple queries against a single table with few indexes, there is usually
    49     50   an obvious choice for the best algorithm.
    50     51   But for larger and more complex queries, such as
    51         -multi-way joins with many indices
           52  +multi-way joins with many indexes
    52     53   and subqueries, there can be hundreds, thousands, or millions
    53     54   of reasonable algorithms for computing the result.
    54     55   The job of the query planner is to choose the single "best" query plan from
    55     56   this multitude of possibilities.</p>
    56     57   
    57     58   <p>Query planners are what make SQL database engines so amazingly useful and powerful.
    58     59   (This is true of all SQL database engines, not just SQLite.)
................................................................................
    81     82   <h2> Query Planning In SQLite</h2>
    82     83   
    83     84   <p>SQLite computes joins using nested loops, 
    84     85   one loop for each table
    85     86   in the join.  (Additional loops might be inserted for IN
    86     87   and OR operators in the WHERE clause.  SQLite considers those too,
    87     88   but for simplicity we will ignore them in this essay.)
    88         -One or more indices might be used on each loop to speed the search,
           89  +One or more indexes might be used on each loop to speed the search,
    89     90   or a loop might be a "full table scan" that reads every row in the
    90     91   table.  Thus query planning decomposes into two subtasks:</p>
    91     92   <ol>
    92     93   <li> Picking the nested order of the various loops
    93         -<li> Choosing good indices for each loop
           94  +<li> Choosing good indexes for each loop
    94     95   </ol>
    95     96   <p>Picking the nesting order is generally the more challenging problem.
    96         -Once the nesting order of the join is established, the choice of indices
           97  +Once the nesting order of the join is established, the choice of indexes
    97     98   for each loop is normally obvious.</p>
    98     99   
    99    100   <tcl>hd_fragment qpstab {query planner stability guarantee} QPSG</tcl>
   100    101   <h2> The SQLite Query Planner Stability Guarantee</h2>
   101    102   
   102    103   <p>When the Query Planner Stability Guarantee (QPSG) is enabled
   103    104   SQLite will always pick the same query plan for any
   104    105   given SQL statement as long as:
   105    106   
   106    107   <ol type="a">
   107    108   <li>the database schema does not change in significant ways such as 
   108         -    adding or dropping indices,</li>
          109  +    adding or dropping indexes,</li>
   109    110   <li>the ANALYZE command is not rerun, </li>
   110    111   <li>the same version of SQLite is used.</li>
   111    112   </ol>
   112    113   
   113    114   <p>The QPSG is disabled by default.  It can be enabled at compile-time
   114    115   using the [SQLITE_ENABLE_QPSG] compile-time option, or at run-time by
   115    116   invoking [sqlite3_db_config](db,[SQLITE_DBCONFIG_ENABLE_QPSG],1,0).
................................................................................
   120    121   query plan, possibly causing a performance problem, after your application 
   121    122   is released to users.  If your application works in the lab, it
   122    123   will continue working the same way after deployment.</p>
   123    124   
   124    125   <p>Enterprise-class client/server SQL database engines do not normally 
   125    126   make this guarantee.
   126    127   In client/server SQL database engines, the server keeps track of
   127         -statistics on the sizes of tables and on the quality of indices 
          128  +statistics on the sizes of tables and on the quality of indexes 
   128    129   and the query planner uses those statistics to help select the best plans.
   129    130   As content is added, deleted, or changed in the database, the statistics 
   130    131   will evolve and may cause the query planner to begin using a different 
   131    132   query plan for some particular query.  Usually the new plan will be better 
   132    133   for the evolving structure of the data.  But sometimes the new query plan will
   133    134   cause a performance reduction.  With a client/server database engine, there
   134    135   is typically a Database Administrator (DBA) on hand to deal with these 
................................................................................
   214    215   
   215    216   <h2> Complications</h2>
   216    217   
   217    218   <p>The presentation of the query planner problem above is a simplification.
   218    219   The costs are estimates.  We cannot
   219    220   know what the true cost of running a loop is until we actually run the loop.
   220    221   SQLite makes guesses for the cost of running a loop based on
   221         -the availability of indices and constraints found in the WHERE
          222  +the availability of indexes and constraints found in the WHERE
   222    223   clause.  These guesses are usually pretty good, but they can sometimes be
   223    224   off.  Using the [ANALYZE] command to collect additional statistical
   224    225   information about the database can sometimes enable SQLite to make better
   225    226   guesses about the cost.</p>
   226    227   
   227    228   <p>The costs are comprised of multiple numbers, not a single number as
   228    229   shown in the graph.
................................................................................
   350    351   <p>The initial implementation of NGQP chooses N=1 for simple queries, N=5
   351    352   for two-way joins and N=10 for all joins with three or more tables.  This
   352    353   formula for selecting N might change in subsequent releases.</p>
   353    354   
   354    355   <tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl>
   355    356   <h1> Hazards Of Upgrading To NGQP</h1>
   356    357   
          358  +<p><i>Update on 2018-11-24: This section was important 
          359  +when the NGQP was new.  But five years have elapsed, the NGQP has been
          360  +deployed successfully to billions of devices, and everyone has upgraded.
          361  +The upgrade hazard has vanished.
          362  +This section is retained for historical reference only.
          363  +Modern reads can skip ahead to the [query planner checklist].</i>
          364  +
   357    365   <p>For most applications, upgrading from the legacy query planner to the NGQP
   358    366   requires little thought or effort.
   359    367   Simply replace the older SQLite version with the newer version of SQLite 
   360    368   and recompile and the application will run faster.  
   361    369   There are no API changes nor modifications
   362    370   to compilation procedures.</p>
   363    371   
   364    372   <p>But as with any query planner change, upgrading to the NGQP does carry
   365    373   a small risk of introducing performance regressions.  The problem here is
   366    374   not that the NGQP is incorrect or buggy or inferior to the legacy query
   367         -planner.  Given reliable information about the selectivity of indices, 
          375  +planner.  Given reliable information about the selectivity of indexes, 
   368    376   the NGQP should always pick a plan that is as good or better than before.
   369    377   The problem is that some applications may be using low-quality and
   370         -low-selectivity indices without having run [ANALYZE].  The older query
          378  +low-selectivity indexes without having run [ANALYZE].  The older query
   371    379   planners look at many fewer possible implementations for each query and 
   372    380   so they may have stumbled over a good plan by stupid luck.  The NGQP, on 
   373    381   the other hand, looks at many more query plan possibilities, and it may 
   374    382   choose a different query plan that
   375         -works better in theory, assuming good indices, but which gives a performance
          383  +works better in theory, assuming good indexes, but which gives a performance
   376    384   regression in practice, because of the shape of the data.</p>
   377    385   
   378    386   <p>Key points:</p>
   379    387   
   380    388   <ul>
   381    389   <li><p>The NGQP will always find an equal or better query plan, compared to
   382    390       prior query planners, as long as it
   383    391       has access to accurate [ANALYZE] data in the [SQLITE_STAT1] file.</p>
   384    392   <li><p>The NGQP will always find a good query plan 
   385         -    as long as the schema does not contain indices that have more than
          393  +    as long as the schema does not contain indexes that have more than
   386    394       about 10 or 20 rows with the same value in the left-most column of the
   387    395       index.</p>
   388    396   </ul>
   389    397   
   390    398   <p>Not all applications meet these conditions.  Fortunately,
   391    399   the NGQP will still usually find good query plans, even without these conditions.
   392    400   However, cases do arise (rarely) where performance regressions can occur.</p>
................................................................................
   519    527   Intuitively, we humans understand that algorithm-1 is best.
   520    528   Each check-in is likely to have few children (one child is
   521    529   the most common case) and each child can be tested for the
   522    530   $trunk tag in logarithmic time.  Indeed, algorithm-1 is the
   523    531   faster choice in practice.  But the NGQP has no intuition.  The
   524    532   NGQP must use hard math, and algorithm-2 is slightly
   525    533   better mathematically.  This is because, in the absence of other information,
   526         -the NGQP must assume that the indices PLINK_I1 and TAGXREF_I1 are of
          534  +the NGQP must assume that the indexes PLINK_I1 and TAGXREF_I1 are of
   527    535   equal quality and are equally selective.  Algorithm-2 uses one field
   528    536   of the TAGXREF_I1 index and both fields of the PLINK_I1 index whereas
   529    537   algorithm-1 only uses the first field of each index.  Since 
   530    538   algorithm-2 uses more index material, the NGQP is correct
   531    539   to judge it to be the better algorithm.  The scores are close and
   532    540   algorithm-2 just barely squeaks ahead of algorithm-1.  But
   533    541   algorithm-2 really is the correct choice here.
................................................................................
   535    543   
   536    544   <p>
   537    545   Unfortunately, algorithm-2 is slower than algorithm-1 in
   538    546   this application.
   539    547   </p>
   540    548   
   541    549   <p>
   542         -The problem is that the indices are not of equal quality.
          550  +The problem is that the indexes are not of equal quality.
   543    551   A check-in is likely to only have one child.  So the first
   544    552   field of PLINK_I1 will usually narrow down the search to just a single
   545    553   row.  But there are thousands and thousands check-ins tagged with "trunk", 
   546    554   so the first field of TAGXREF_I1 will be
   547    555   of little help in narrowing down the search.
   548    556   </p>
   549    557   
   550    558   <p>
   551    559   The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
   552    560   query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
   553         -gathers statistics on the quality of the various indices and stores those
          561  +gathers statistics on the quality of the various indexes and stores those
   554    562   statistics in [SQLITE_STAT1] table.  
   555    563   Having access to this statistical information,
   556    564   the NGQP easily chooses algorithm-1 as the best algorithm, by a wide
   557    565   margin.</p>
   558    566   
   559    567   <p>Why didn't the legacy query planner choose algorithm-2?
   560    568   Easy: because the NN algorithm
................................................................................
   622    630   <ol>
   623    631   <li><p><b>Don't panic!</b>
   624    632   Cases where the query planner picks an inferior plan are actually quite
   625    633   rare.  You are unlikely to run across any problems in your application.
   626    634   If you are not having performance issues, you do not need to worry
   627    635   about any of this.</p>
   628    636   
   629         -<li><p><b>Create appropriate indices.</b>
          637  +<li><p><b>Create appropriate indexes.</b>
   630    638   Most SQL performance problems arise not because of query planner issues
   631         -but rather due to lack of appropriate indices.  Make sure indices are
          639  +but rather due to lack of appropriate indexes.  Make sure indexes are
   632    640   available to assist all large queries.  Most performance issues can be
   633    641   resolved by one or two CREATE INDEX commands and with no changes to
   634    642   application code.</p>
   635    643   
   636         -<li><p><b>Avoid creating low-quality indices.</b>.
          644  +<li><p><b>Avoid creating low-quality indexes.</b>.
   637    645   A low-quality index (for the purpose of this checklist) is one where
   638    646   there are more than 10 or 20 rows in the table that have the same value
   639    647   for the left-most column of the index.  In particular, avoid using
   640         -boolean or "enum" columns as the left-most columns of your indices.</p>
          648  +boolean or "enum" columns as the left-most columns of your indexes.</p>
   641    649   
   642    650   <p>The Fossil performance problem described in the previous section of
   643    651   this document arose because there were over
   644    652   ten-thousand entries in the TAGXREF table with the same value for the 
   645    653   left-most column (the TAGID column) of the TAGXREF_I1 index.</p>
   646    654   
   647    655   <li><p><b>If you must use a low-quality index, be sure to run [ANALYZE].</b>
   648         -Low-quality indices will not confuse the query planner as long as the
   649         -query planner knows that the indices are of low quality.  And the way
          656  +Low-quality indexes will not confuse the query planner as long as the
          657  +query planner knows that the indexes are of low quality.  And the way
   650    658   the query planner knows this is by the content of the [SQLITE_STAT1] table,
   651    659   which is computed by the ANALYZE command.</p>
   652    660   
   653    661   <p>Of course, ANALYZE only works effectively if you have a significant 
   654    662   amount of content in your database in the first place.  When creating a 
   655    663   new database that you expect to accumulate a lot of data, you can run 
   656    664   the command "ANALYZE sqlite_master" to create the SQLITE_STAT1 table,
................................................................................
   661    669   
   662    670   <li><p><b>Instrument your code.</b>
   663    671   Add logic that lets you know quickly and easily which queries are taking
   664    672   too much time.  Then work on just those specific queries.</p>
   665    673   
   666    674   <li><p><b>Use [unlikely()] and [likelihood()] SQL functions.</b>
   667    675   SQLite normally assumes that terms in the WHERE clause that cannot be used
   668         -by indices have a strong probability of being true.  If this assumption
          676  +by indexes have a strong probability of being true.  If this assumption
   669    677   is incorrect, it could lead to a suboptimal query plan.  The
   670    678   [unlikely()] and [likelihood()] SQL functions can be used to provide
   671    679   hints to the query planner about WHERE clause terms that are probably
   672    680   not true, and thus aid the query planner in selecting the best possible
   673    681   plan.
   674    682   
   675    683   <li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
   676         -loop nesting order on queries that might use low-quality indices in an
          684  +loop nesting order on queries that might use low-quality indexes in an
   677    685   unanalyzed database.</b>
   678    686   SQLite [treats the CROSS JOIN operator specially], forcing the table to 
   679    687   the left to be an outer loop relative to the table on the right.</p>
   680    688   
   681    689   <p>Avoid this step if possible, as it defeats one of the huge advantages
   682    690   of the whole SQL language concept, specifically that the application 
   683    691   programmer does not need to get involved with query planning.  If you
................................................................................
   696    704   Avoid using this trick if at all possible, and especially avoid it
   697    705   early in the application development cycle.  Beware that
   698    706   adding a unary "+" operator to an equality expression might change
   699    707   the result of that expression if 
   700    708   <a href="datatype3.html#affinity">type affinity</a> is involved.</p>
   701    709   
   702    710   <li><p><b>Use the [INDEXED BY] syntax to enforce the selection of
   703         -particular indices on problem queries.</b>
          711  +particular indexes on problem queries.</b>
   704    712   As with the previous two bullets, avoid this step if possible, and 
   705    713   especially avoid doing this early in development as it is clearly a
   706    714   premature optimization.</p>
   707    715   </ol>
   708    716   
   709    717   <h1> Summary</h1>
   710    718