Documentation Source Text

Check-in [ce2a4ec1e7]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix documentation typos.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: ce2a4ec1e717264097b2e2487c6683a08d73a728
User & Date: drh 2013-08-23 16:16:12
Context
2013-08-23
16:37
Clarification and typo fixes in the NGQP document. check-in: a5a0ee107d user: drh tags: trunk
16:16
Fix documentation typos. check-in: ce2a4ec1e7 user: drh tags: trunk
01:57
Add documentation for the query_only and defer_foreign_keys pragmas. Update the change log for the 3.8.0 release. check-in: 8e16497ef1 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/changes.in.

43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
..
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
    incr nChng
    if {$nChng==1 && [file exists $DEST/$filename]} {
      file copy -force $DEST/$filename $DEST/releaselog/current.html
    }
  }
}

chng {2013-08-29 (3.8.0)} {
<li>Add support for [partial indexes]</li>
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>The [EXPLAIN QUERY PLAN] output no longer shows an estimate of the number of 
    rows generated by each loop in a join.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.
<li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()].
<li>Added the [cache_spill pragma].
................................................................................
    HAVE_POSIX_FALLOCATE compile-time option is used.
<li>Update the ".import" command in the [command-line shell] to support multi-line
    fields and correct RFC-4180 quoting and to issue warning and/or error messages
    if the input text is not strictly RFC-4180 compliant.
<li>Bug fix: In the [unicode61] tokenizer of [FTS4], treat all private code points
    as identifier symbols.
<li>Bug fix: Bare identifiers in ORDER BY clauses bind more tightly to output column
    names, but indentifiers in expressions bind more tightly to input column names.
    Indentifers in GROUP BY clauses always prefer output column names, however.
<li>Bug fixes: Multiple problems in the legacy query optimizer were fixed by the 
    move to [NGQP].
}

chng {2013-05-20 (3.7.17)} {
<li>Add support for [memory-mapped I/O].
<li>Add the [sqlite3_strglob()] convenience interface.







|







 







|
|







43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
..
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
    incr nChng
    if {$nChng==1 && [file exists $DEST/$filename]} {
      file copy -force $DEST/$filename $DEST/releaselog/current.html
    }
  }
}

chng {2013-08-27 (3.8.0)} {
<li>Add support for [partial indexes]</li>
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>The [EXPLAIN QUERY PLAN] output no longer shows an estimate of the number of 
    rows generated by each loop in a join.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.
<li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()].
<li>Added the [cache_spill pragma].
................................................................................
    HAVE_POSIX_FALLOCATE compile-time option is used.
<li>Update the ".import" command in the [command-line shell] to support multi-line
    fields and correct RFC-4180 quoting and to issue warning and/or error messages
    if the input text is not strictly RFC-4180 compliant.
<li>Bug fix: In the [unicode61] tokenizer of [FTS4], treat all private code points
    as identifier symbols.
<li>Bug fix: Bare identifiers in ORDER BY clauses bind more tightly to output column
    names, but identifiers in expressions bind more tightly to input column names.
    Identifiers in GROUP BY clauses always prefer output column names, however.
<li>Bug fixes: Multiple problems in the legacy query optimizer were fixed by the 
    move to [NGQP].
}

chng {2013-05-20 (3.7.17)} {
<li>Add support for [memory-mapped I/O].
<li>Add the [sqlite3_strglob()] convenience interface.

Changes to pages/cintro.in.

362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
  The [sqlite3_config()] interface is used to make global, process-wide
  configuration changes for SQLite.  The [sqlite3_config()] interface must
  be called before any [database connections] are created.  The
  [sqlite3_config()] interface allows the programmer to do things like:
<ul>
<li>Adjust how SQLite does [memory allocation], including setting up
    alternative memory allocators appropriate for safety-critical
    real-time embedded systems and application-defind memory allocators.
<li>Set up a process-wide [error log].
<li>Specify an application-defined page cache.
<li>Adjust the use of mutexes so that they are appropriate for various
    [threading mode | threading models], or substitute an 
    application-defined mutex system.
</ul> 
<p>







|







362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
  The [sqlite3_config()] interface is used to make global, process-wide
  configuration changes for SQLite.  The [sqlite3_config()] interface must
  be called before any [database connections] are created.  The
  [sqlite3_config()] interface allows the programmer to do things like:
<ul>
<li>Adjust how SQLite does [memory allocation], including setting up
    alternative memory allocators appropriate for safety-critical
    real-time embedded systems and application-defined memory allocators.
<li>Set up a process-wide [error log].
<li>Specify an application-defined page cache.
<li>Adjust the use of mutexes so that they are appropriate for various
    [threading mode | threading models], or substitute an 
    application-defined mutex system.
</ul> 
<p>

Changes to pages/fileio.in.

1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595

  <h3 id=opening_a_write_transaction>Beginning a Write Transaction</h3>
    <p>
      Before any database pages may be modified within the <i>page cache</i>,
      the <i>database connection</i> must open a <i>write transaction</i>. 
      Opening a <i>write transaction</i> requires that the <i>database
      connection</i> obtains a <i>reserved lock</i> (or greater) on the 
      <i>database file</i>. Because a obtaining a <i>reserved lock</i> on
      a <i>database file</i> guarantees that no other <i>database
      connection</i> may hold or obtain a <i>reserved lock</i> or greater,
      it follows that no other <i>database connection</i> may have an
      open <i>write transaction</i>.

    <p>
      A <i>reserved lock</i> on the <i>database file</i> may be thought of







|







1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595

  <h3 id=opening_a_write_transaction>Beginning a Write Transaction</h3>
    <p>
      Before any database pages may be modified within the <i>page cache</i>,
      the <i>database connection</i> must open a <i>write transaction</i>. 
      Opening a <i>write transaction</i> requires that the <i>database
      connection</i> obtains a <i>reserved lock</i> (or greater) on the 
      <i>database file</i>. Because obtaining a <i>reserved lock</i> on
      a <i>database file</i> guarantees that no other <i>database
      connection</i> may hold or obtain a <i>reserved lock</i> or greater,
      it follows that no other <i>database connection</i> may have an
      open <i>write transaction</i>.

    <p>
      A <i>reserved lock</i> on the <i>database file</i> may be thought of

Changes to pages/lang.in.

2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
an existing table. ^If no column-list is specified then the number
of values inserted into each row
must be the same as the number of columns in the table. ^In this case
the result of evaluating the left-most expression in each term of
the VALUES list is inserted into the left-most column of the each new row,
and forth for each subsequent expression. ^If a
column-list is specified, then the number of values in each term of the
VALUS list must match the number of
specified columns. ^Each of the named columns of the new row is populated
with the results of evaluating the corresponding VALUES expression. ^Table
columns that do not appear in the column list are populated with the default
column value (specified as part of the CREATE TABLE statement), or with NULL if
no default value is specified.

<li><p>The second form of the INSERT statement contains a SELECT statement







|







2752
2753
2754
2755
2756
2757
2758
2759
2760
2761
2762
2763
2764
2765
2766
an existing table. ^If no column-list is specified then the number
of values inserted into each row
must be the same as the number of columns in the table. ^In this case
the result of evaluating the left-most expression in each term of
the VALUES list is inserted into the left-most column of the each new row,
and forth for each subsequent expression. ^If a
column-list is specified, then the number of values in each term of the
VALUE list must match the number of
specified columns. ^Each of the named columns of the new row is populated
with the results of evaluating the corresponding VALUES expression. ^Table
columns that do not appear in the column list are populated with the default
column value (specified as part of the CREATE TABLE statement), or with NULL if
no default value is specified.

<li><p>The second form of the INSERT statement contains a SELECT statement

Changes to pages/mmap.in.

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

<p>To read a page of database content using the legacy xRead() method,
SQLite first allocates a page-size chunk of heap memory then invokes
the xRead() method which causes the database page content to be copied
into the newly allocated heap memory.  This involves (at a minimum)
a copy of the entire page.</p>

<p>But if SQLite wants to access a page of the databse file and
memory mapped I/O is enabled, it first calls the xFetch() method.
The xFetch() method asks the operating system to return a pointer to
the requested page, if possible.  If the requested page has been or
can be mapped into the application address space, then xFetch returns
a pointer to that page for SQLite to use without having to copy anything.
Skipping the copy step is what makes memory mapped I/O faster.</p>








|







67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

<p>To read a page of database content using the legacy xRead() method,
SQLite first allocates a page-size chunk of heap memory then invokes
the xRead() method which causes the database page content to be copied
into the newly allocated heap memory.  This involves (at a minimum)
a copy of the entire page.</p>

<p>But if SQLite wants to access a page of the database file and
memory mapped I/O is enabled, it first calls the xFetch() method.
The xFetch() method asks the operating system to return a pointer to
the requested page, if possible.  If the requested page has been or
can be mapped into the application address space, then xFetch returns
a pointer to that page for SQLite to use without having to copy anything.
Skipping the copy step is what makes memory mapped I/O faster.</p>

Changes to pages/news.in.

15
16
17
18
19
20
21
22



23
24
25
26


27
28









29
30
31
32
33
34
35
  regsub -all "\n( *\n)+" $text "</p>\n\n<p>" txt
  regsub -all {[Tt]icket #(\d+)} $txt \
      {<a href="http://www.sqlite.org/cvstrac/tktview?tn=\1">\0</a>} txt
  hd_resolve "<blockquote>$txt</blockquote>"
  hd_puts "<hr width=\"50%\">"
}

newsitem {2013-08-15} {Release 3.8.0} {



  SQLite [version 3.8.0] features the cutover of the
  [next generation query planner] or [NGQP].
  The NGQP will normally cause applications to run faster.  However,
  there are [hazards of upgrading to the NGQP | hazards] associated


  with this upgrade.  A [query planner checklist] is available to
  application developers to aid in avoiding problems.









}

newsitem {2013-05-20} {Release 3.7.17} {
  SQLite [version 3.7.17] is a regularly schedule maintenance release.
  Visit the [version 3.7.17 | change log] for a full explanation of the
  changes in this release.








|
>
>
>
|
|
<
|
>
>
|
|
>
>
>
>
>
>
>
>
>







15
16
17
18
19
20
21
22
23
24
25
26
27

28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
  regsub -all "\n( *\n)+" $text "</p>\n\n<p>" txt
  regsub -all {[Tt]icket #(\d+)} $txt \
      {<a href="http://www.sqlite.org/cvstrac/tktview?tn=\1">\0</a>} txt
  hd_resolve "<blockquote>$txt</blockquote>"
  hd_puts "<hr width=\"50%\">"
}

newsitem {2013-08-27} {Release 3.8.0} {
  <b>Do not fear the zero!</b>

  <p>SQLite [version 3.8.0] might easily have been called "3.7.18" instead.
  However, this release features the cutover of the
  [next generation query planner] or [NGQP], and there is a small chance of

  [hazards of upgrading to the NGQP | breaking legacy programs] that
  rely on undefined behavior in previous SQLite releases, and so the
  minor version number was incremented for that reason.
  But the risks are low and there is a [query planner checklist] is 
  available to application developers to aid in avoiding problems.

  <p>SQLite [version 3.8.0] is actually one of the most heavily tested
  SQLite releases ever.  Thousands and thousands of beta copies have
  be downloaded, and presumably tested, and there have been no problem
  reports.

  <p>In addition to the [next generation query planner], the 3.8.0 release
  adds support for [partial indices], as well as several other new features.
  See the [version 3.8.0 | change log] for further detail.
}

newsitem {2013-05-20} {Release 3.7.17} {
  SQLite [version 3.7.17] is a regularly schedule maintenance release.
  Visit the [version 3.7.17 | change log] for a full explanation of the
  changes in this release.

Changes to pages/optoverview.in.

641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
HEADING 2 {Manual Control Of Query Plans Using CROSS JOIN} \
  crossjoin {Manual Control Of Query Plans Using CROSS JOIN} {CROSS JOIN}

PARAGRAPH {
  Programmers can force SQLite to use a particular loop nesting order
  for a join by using the CROSS JOIN operator instead of just JOIN, 
  INNER JOIN, NATURAL JOIN, or a "," join.  Though CROSS JOINs are
  communitive in theory, SQLite chooses to never reorder the tables in
  a CROSS JOIN.  Hence, the left table of a CROSS JOIN will always be
  in an outer loop relative to the right table.
}
PARAGRAPH {
  ^(In the following query, the optimizer is free to reorder the 
  tables of FROM clause anyway it sees fit:
}







|







641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
HEADING 2 {Manual Control Of Query Plans Using CROSS JOIN} \
  crossjoin {Manual Control Of Query Plans Using CROSS JOIN} {CROSS JOIN}

PARAGRAPH {
  Programmers can force SQLite to use a particular loop nesting order
  for a join by using the CROSS JOIN operator instead of just JOIN, 
  INNER JOIN, NATURAL JOIN, or a "," join.  Though CROSS JOINs are
  commutative in theory, SQLite chooses to never reorder the tables in
  a CROSS JOIN.  Hence, the left table of a CROSS JOIN will always be
  in an outer loop relative to the right table.
}
PARAGRAPH {
  ^(In the following query, the optimizer is free to reorder the 
  tables of FROM clause anyway it sees fit:
}

Changes to pages/partialindex.in.

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
...
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
...
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

<p>
A partial index is an index over a subset of the rows of a table.
</p>

<p>
^In ordinary indexes, there is exactly one entry in the index for every
row in the table.  ^In partial index, only some subset of the rows in the
table have corresponding index entries.  ^For example, a partial index might
omit entries for which the column being indexed is NULL.  When used 
judiciously, partial indexes can result in smaller database files and
improvements in both query and write performance.
</p>

<h2>2.0 Creating Partial Indexes</h2>
................................................................................

<blockquote>
SELECT person_id FROM person WHERE is_team_leader AND team_id=?1;
</blockquote>)^

<h2>3.0 Queries Using Partial Indexes</h2>

<p>Let X be the expression in the WHERE claues of a partial
index, and let W be the WHERE clause of a query that uses the
table that is indexed.  Then, the query is permitted to use 
the partial index if W&#x21d2;X, where the &#x21d2; operator
(usually pronounced "implies") is the logic operator 
equivalent to "X or not W".
Hence, determining whether or not a partial index
is usable in a particular query reduces to proving a theorem in
................................................................................
SELECT * FROM tab2 WHERE b=456 AND c&lt;&gt;0;  <i>-- uses partial index</i>
</blockquote>)^
^(<p>But the next query can not use the partial index:
<blockquote>
SELECT * FROM tab2 WHERE b=456;  <i>-- cannot use partial index</i>
</blockquote>)^
<p>The latter query can not use the partial index because there might be
rows in the table with b=456 and and where c is NULL.  But those rows would
not be in the partial index.
</ol>

<p>These two rules describe how the query planner for SQLite works as of
this writing (2013-08-01).  And the rules above will always be honored.
However, future versions of SQLite might incorporate a better theorem prover
that can find other cases where W&#x21d2;X is true and thus may







|







 







|







 







|







8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
...
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
...
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193

<p>
A partial index is an index over a subset of the rows of a table.
</p>

<p>
^In ordinary indexes, there is exactly one entry in the index for every
row in the table.  ^In partial indexes, only some subset of the rows in the
table have corresponding index entries.  ^For example, a partial index might
omit entries for which the column being indexed is NULL.  When used 
judiciously, partial indexes can result in smaller database files and
improvements in both query and write performance.
</p>

<h2>2.0 Creating Partial Indexes</h2>
................................................................................

<blockquote>
SELECT person_id FROM person WHERE is_team_leader AND team_id=?1;
</blockquote>)^

<h2>3.0 Queries Using Partial Indexes</h2>

<p>Let X be the expression in the WHERE clause of a partial
index, and let W be the WHERE clause of a query that uses the
table that is indexed.  Then, the query is permitted to use 
the partial index if W&#x21d2;X, where the &#x21d2; operator
(usually pronounced "implies") is the logic operator 
equivalent to "X or not W".
Hence, determining whether or not a partial index
is usable in a particular query reduces to proving a theorem in
................................................................................
SELECT * FROM tab2 WHERE b=456 AND c&lt;&gt;0;  <i>-- uses partial index</i>
</blockquote>)^
^(<p>But the next query can not use the partial index:
<blockquote>
SELECT * FROM tab2 WHERE b=456;  <i>-- cannot use partial index</i>
</blockquote>)^
<p>The latter query can not use the partial index because there might be
rows in the table with b=456 and where c is NULL.  But those rows would
not be in the partial index.
</ol>

<p>These two rules describe how the query planner for SQLite works as of
this writing (2013-08-01).  And the rules above will always be honored.
However, future versions of SQLite might incorporate a better theorem prover
that can find other cases where W&#x21d2;X is true and thus may

Changes to pages/queryplanner-ng.in.

192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
...
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
...
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
...
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
...
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
...
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
...
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673

<p>The arrows from the small circles labeled with "*" indicate the cost
of running each loop with no dependencies.  The outer loop must use this
*-cost.  Inner loops have the option of using the *-cost or a cost assuming
one of the other terms is in an outer loop, whichever gives the best
result.  One can think of the *-costs as a short-hand notation indicating
multiple arcs, one from each of the other nodes in the
graph.  The graph is is therefore "complete", meaning that there are arcs
(some explicit and some implied) in both directions between every pair of 
nodes in the graph.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph that visits each node
exactly once.</p>

................................................................................
The general case involves a lot of extra complication, which for clarity
is neglected in the remainder of this article.</p>

<h3>3.3 Finding The Best Query Plan</h3>

<p>Prior to version 3.8.0, SQLite always used the
the "Nearest Neighbor" or "NN" heuristic when searching for the best query plan.
The NN heuristic makes a single traversal of the graph, always chosing
the lowest-cost arc as the next step.  
The NN heuristic works surprisingly well in most cases.
And NN is fast, so that SQLite is able to quickly find good plans
for even large 64-way joins.  In contrast, other SQL database engines that
do more extensive searching tend to bog down when the
number of tables in a join goes above 10 or 15.</p>

................................................................................
But that is not enough for the TPC-H Q8
problem.  With N=8 on TPC-H Q8 the N3 algorithm finds 
the solution R-N1-C-O-L-S-N2-P with a cost of 29.78.  
That is a big improvement over NN, but it is still
not optimal.  N3 finds the optimal solution for TPC-H Q8 
when N is 10 or greater.</p>

<p>The initial implementation of NGQP choses N=1 for simple queries, N=5
for two-way joins and N=10 for all joins with three or more tables.  This
formula for selecting N might change in subsequent releases.</p>

<tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl>
<h2>4.0 Hazards Of Upgrading To NGQP</h2>

<p>For most applications, upgrading from the legacy query planner to the NGQP
................................................................................

<p>
The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
gathers statistics on the quality of the various indices and stores those
statistics in [SQLITE_STAT1] table.  
Having access to this statistical information,
the NGQP easily choses algorithm-1 as the best algorithm, by a wide
margin.</p>

<p>Why didn't the legacy query planner choose algorithm-2?
Easy: because the NN algorithm
never even considered algorithm-2.  Graphs of the planning
problem look like this:</p>

................................................................................
selected by both NN and N3.
</p>

<p>(Side note:  The costs estimates in the two most recent graphs 
were computed by the NGQP using a base-2 logarithm and slightly different
cost assumptions compared to the legacy query planner.  
Hence, the cost estimates in
these latter two graphs are not directly comparible to the cost estimates
in the TPC-H Q8 graph.)</p>

<h3>4.2 Fixing The Problem</h3>

<p>Running [ANALYZE] on the repository database immediately fixed the
performance problem.  However, we want Fossil to be robust and to always
work quickly regardless of whether or not its repository has been analyzed.
................................................................................

<tcl>hd_fragment howtofix {query planner checklist}</tcl>
<h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2>

<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run accross any problems in your application.
If you are not having performance issues, the you do not need to worry
about any of this.</p>

<li><p><b>Create appropriate indices.</b>
Most SQL performance problems arise not because of query planner issues
but rather due to lack of appropriate indices.  Make sure indices are
available to assist all large queries.  Most performance issues can be
................................................................................
Add logic that lets you know quickly and easily which queries are taking
too much time.  Then work on just those specific queries.</p>

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an an outer loop relative to the table on the right.</p>

<p>Avoid this step if possible, as it defeats one of the huge advantages
of the whole SQL language concept, specifically that the application 
programmer does not need to get involved with query planning.  If you
do use CROSS JOIN, wait until late in your development cycle to do so,
and comment the use of CROSS JOIN carefully so that you can take it out
later if possible.  Avoid using CROSS JOIN early in the development







|







 







|







 







|







 







|







 







|







 







|







 







|







192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
...
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
...
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
...
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
...
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
...
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
...
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673

<p>The arrows from the small circles labeled with "*" indicate the cost
of running each loop with no dependencies.  The outer loop must use this
*-cost.  Inner loops have the option of using the *-cost or a cost assuming
one of the other terms is in an outer loop, whichever gives the best
result.  One can think of the *-costs as a short-hand notation indicating
multiple arcs, one from each of the other nodes in the
graph.  The graph is therefore "complete", meaning that there are arcs
(some explicit and some implied) in both directions between every pair of 
nodes in the graph.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph that visits each node
exactly once.</p>

................................................................................
The general case involves a lot of extra complication, which for clarity
is neglected in the remainder of this article.</p>

<h3>3.3 Finding The Best Query Plan</h3>

<p>Prior to version 3.8.0, SQLite always used the
the "Nearest Neighbor" or "NN" heuristic when searching for the best query plan.
The NN heuristic makes a single traversal of the graph, always choosing
the lowest-cost arc as the next step.  
The NN heuristic works surprisingly well in most cases.
And NN is fast, so that SQLite is able to quickly find good plans
for even large 64-way joins.  In contrast, other SQL database engines that
do more extensive searching tend to bog down when the
number of tables in a join goes above 10 or 15.</p>

................................................................................
But that is not enough for the TPC-H Q8
problem.  With N=8 on TPC-H Q8 the N3 algorithm finds 
the solution R-N1-C-O-L-S-N2-P with a cost of 29.78.  
That is a big improvement over NN, but it is still
not optimal.  N3 finds the optimal solution for TPC-H Q8 
when N is 10 or greater.</p>

<p>The initial implementation of NGQP chooses N=1 for simple queries, N=5
for two-way joins and N=10 for all joins with three or more tables.  This
formula for selecting N might change in subsequent releases.</p>

<tcl>hd_fragment hazards {hazards of upgrading to the NGQP}</tcl>
<h2>4.0 Hazards Of Upgrading To NGQP</h2>

<p>For most applications, upgrading from the legacy query planner to the NGQP
................................................................................

<p>
The NGQP has no way of knowing that TAGXREF_I1 is almost useless in this
query, unless [ANALYZE] has been run on the database.  The [ANALYZE] command
gathers statistics on the quality of the various indices and stores those
statistics in [SQLITE_STAT1] table.  
Having access to this statistical information,
the NGQP easily chooses algorithm-1 as the best algorithm, by a wide
margin.</p>

<p>Why didn't the legacy query planner choose algorithm-2?
Easy: because the NN algorithm
never even considered algorithm-2.  Graphs of the planning
problem look like this:</p>

................................................................................
selected by both NN and N3.
</p>

<p>(Side note:  The costs estimates in the two most recent graphs 
were computed by the NGQP using a base-2 logarithm and slightly different
cost assumptions compared to the legacy query planner.  
Hence, the cost estimates in
these latter two graphs are not directly comparable to the cost estimates
in the TPC-H Q8 graph.)</p>

<h3>4.2 Fixing The Problem</h3>

<p>Running [ANALYZE] on the repository database immediately fixed the
performance problem.  However, we want Fossil to be robust and to always
work quickly regardless of whether or not its repository has been analyzed.
................................................................................

<tcl>hd_fragment howtofix {query planner checklist}</tcl>
<h2>5.0 Checklist For Avoiding Or Fixing Query Planner Problems</h2>

<ol>
<li><p><b>Don't panic!</b>
Cases where the query planner picks an inferior plan are actually quite
rare.  You are unlikely to run across any problems in your application.
If you are not having performance issues, the you do not need to worry
about any of this.</p>

<li><p><b>Create appropriate indices.</b>
Most SQL performance problems arise not because of query planner issues
but rather due to lack of appropriate indices.  Make sure indices are
available to assist all large queries.  Most performance issues can be
................................................................................
Add logic that lets you know quickly and easily which queries are taking
too much time.  Then work on just those specific queries.</p>

<li><p><b>Use the [CROSS JOIN] syntax to enforce a particular
loop nesting order on queries that might use low-quality indices in an
unanalyzed database.</b>
SQLite [treats the CROSS JOIN operator specially], forcing the table to 
the left to be an outer loop relative to the table on the right.</p>

<p>Avoid this step if possible, as it defeats one of the huge advantages
of the whole SQL language concept, specifically that the application 
programmer does not need to get involved with query planning.  If you
do use CROSS JOIN, wait until late in your development cycle to do so,
and comment the use of CROSS JOIN carefully so that you can take it out
later if possible.  Avoid using CROSS JOIN early in the development

Changes to pages/tclsqlite.in.

75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
...
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
This provides a small performance boost in single-threaded applications.
</p></dd>

<dt><b>-readonly</b> <i>BOOLEAN</i></dt>
<dd><p>
If true, then open the database file read-only.  If false, then the
database is opened for both reading and writing if filesystem permissions
allow, or for reading only if filesystem write permission is denigned
by the operating system.  The default setting is "false".  Note that
if the previous process to have the database did not exits cleanly
and left behind a [hot journal], then the write permission is required
to recover the database after opening, and the database cannot be
opened read-only.
</p></dd>

................................................................................
METHOD wal_hook {

<p>This method registers a callback routine that is invoked after transaction
commit when the database is in [WAL mode].  Two arguments are appended to the
callback command before it is invoked:</p>

<ul>
<li>The name of the datebase on which the transaction was committed
<li>The number of entries in the write-ahead log (WAL) file for that database
</ul>

<p>This method might decide to run a [checkpoint] either itself or as a
subsequent idle callback.  Note that SQLite only allows a single WAL hook.
By default this single WAL hook is used for the auto-checkpointing.  If you
set up an explicit WAL hook, then that one WAL hook must ensure that checkpoints







|







 







|







75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
...
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
This provides a small performance boost in single-threaded applications.
</p></dd>

<dt><b>-readonly</b> <i>BOOLEAN</i></dt>
<dd><p>
If true, then open the database file read-only.  If false, then the
database is opened for both reading and writing if filesystem permissions
allow, or for reading only if filesystem write permission is denied
by the operating system.  The default setting is "false".  Note that
if the previous process to have the database did not exits cleanly
and left behind a [hot journal], then the write permission is required
to recover the database after opening, and the database cannot be
opened read-only.
</p></dd>

................................................................................
METHOD wal_hook {

<p>This method registers a callback routine that is invoked after transaction
commit when the database is in [WAL mode].  Two arguments are appended to the
callback command before it is invoked:</p>

<ul>
<li>The name of the database on which the transaction was committed
<li>The number of entries in the write-ahead log (WAL) file for that database
</ul>

<p>This method might decide to run a [checkpoint] either itself or as a
subsequent idle callback.  Note that SQLite only allows a single WAL hook.
By default this single WAL hook is used for the auto-checkpointing.  If you
set up an explicit WAL hook, then that one WAL hook must ensure that checkpoints

Changes to pages/testing.in.

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
..
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
...
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# facilitate that, all the size values are defined by variables here
# which are then used as needed through the document.
#
# NOTE:  Also update the version number in the text!!!
#

# sloc sqlite3.c
set stat(coreSLOC)    81272  ;# Non-comment lines of amalgamation code 
# sloc test*.c
set stat(tclcSLOC)    25111  ;# Non-comment lines of test C code
# ls test*.c tclsqlite.c | wc
set stat(tclcNfile)      44  ;# Number of files of TCL C testcode + tclsqlite.c
# ls -l test*.c tclsqlite.c | awk '{sum+=$5}END{print sum}'
set stat(tclcNByte) 1171344  ;# Number of bytes of TCL C testcode + tclsqlite.c
# sloc *.test *.tcl
set stat(tclsSLOC)   238184  ;# Non-comment lines of TCL test script
# ls *.test *.tcl | wc
set stat(tclsNFile)     711  ;# Number of files of TCL test script
# ls -l *.test *.tcl | awk '{sum+=$5}END{print sum}'
set stat(tclsNByte) 10497097 ;# Number of bytes of TCL test script
# grep do_test *.test | wc; grep do_execsql_test *.test | wc
set stat(tclNTest)    29226  ;# Number of test cases in the TCL test suite
set stat(tclNEval)  3115385  ;# Number of test case evaluations
set stat(nSqlFuzz)   126327  ;# Number of SQL fuzz tests
set stat(vqNEval)    129072  ;# Number of test evaluations for veryquick.test
#  set stat(vqStmtCov)   97.23  ;# veryquick statement coverage
#  set stat(vqBrCov)     92.57  ;# veryquick branch coverage
#  set stat(allStmtCov)  99.50  ;# all.test statement coverage
#  set stat(allBrCov)    97.41  ;# all.test condition/decision coverage
# tclsh mkth3.tcl cfg/*.cfg */*.test >th3.c; sloc th3.c
set stat(th3SLOC)    666907  ;# Non-comment lines in full th3.c
# ls -l th3.c
set stat(th3NByte) 50050842  ;# Number of bytes in full th3.c
# grep th3testBegin */*.test
# grep th3oomBegin */*.test
# grep th3ioerrBegin */*.test
# grep '^--testcase' */*.test
set stat(th3NTest)      34229  ;# Number of test cases
# from output of a min.rc test run.
set stat(th3NECov)     836678  ;# Number of test case evals for coverage
#set stat(th3NETest)  1504866  ;# Number of test case evaluations
#set stat(th3NEExt) 589175483  ;# Number of test case evals extended
#set stat(th3NERel) 2500000000 ;# Number of test case evals release
set stat(th3StmtCov) 100.00  ;# TH3 statement coverage
set stat(th3BrCov)   100.00  ;# TH3 branch coverage
# wc `find . -name '*.test'` | awk '{x+=$1}END{print x}'
set stat(sltsSLOC)  90489494 ;# Non-comment lines of SLT test script
# ls -l `find . -name '*.test'` | awk '{sum+=$5}END{print sum}'
set stat(sltsNByte) 1120693457 ;# Bytes of SLT test script
# find . -name '*.test' | wc
set stat(sltsNFile)        629 ;# Files of SLT test script
# sloc md5.c slt_*.c sqllogictest.c
set stat(sltcSLOC)        1403 ;# Non-comment lines of SLT C code
# grep '^query' `fossil ls | awk '/\.test$/{print $2}'` | wc
set stat(sltNTest)     7200478 ;# Number of test cases in SLT
# grep 'assert(' sqlite3.c | wc
set stat(nAssert)         3531 ;# Number of assert statements
# grep 'testcase(' sqlite3.c | grep -v define | wc
set stat(nTestcase)        667 ;# Number of testcase statements

set stat(totalSLOC) [expr {$stat(tclcSLOC)+$stat(tclsSLOC)+
                           $stat(th3SLOC)+$stat(sltcSLOC)+$stat(sltsSLOC)}]

proc GB {expr} {
  set n [uplevel #0 expr $expr]
  hd_puts [format %.2f [expr {$n/(1000.0*1000.0*1000.0)}]]
................................................................................
<h1 align="center">How SQLite Is Tested</h1>

<h2>1.0 Introduction</h2>

<p>The reliability and robustness of SQLite is achieved in part
by thorough and careful testing.</p>

<p>As of [version 3.7.14],
the SQLite library consists of approximately
<tcl>KB {$stat(coreSLOC)}</tcl> KSLOC of C code.
(KSLOC means thousands of "Source Lines Of Code" or, in other words,
lines of code excluding blank lines and comments.)
By comparison, the project has
<tcl>
hd_puts "[expr {int($stat(totalSLOC)/$stat(coreSLOC))}] times as much"
................................................................................
[http://www.tcl.tk/ | TCL scripting language].
The TCL test harness itself consists of <tcl>KB {$stat(tclcSLOC)}</tcl> KSLOC 
of C code used to create the TCL interface.  The test scripts are contained
in <tcl>N {$stat(tclsNFile)}</tcl> files totaling 
<tcl>MiB {$stat(tclsNByte)}</tcl>MB in size.  There are
<tcl>N {$stat(tclNTest)}</tcl> distinct test cases, but many of the test
cases are parameterized and run multiple times (with different parameters)
so that on a full test run, about <tcl>MB {$stat(tclNEval)}</tcl> million 
separate tests are performed.
</p>
</li>

<li><p>
The <b>[TH3]</b> test harness is a set of proprietary tests, written in
C that provide 100% branch test coverage (and 100% MC/DC test coverage) to







|

|



|

|

|

|

|
|
|
|





|

|




|


|







|

|

|

|

|

|







 







|







 







|







7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
..
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
...
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# facilitate that, all the size values are defined by variables here
# which are then used as needed through the document.
#
# NOTE:  Also update the version number in the text!!!
#

# sloc sqlite3.c
set stat(coreSLOC)    84298  ;# Non-comment lines of amalgamation code 
# sloc test*.c
set stat(tclcSLOC)    24160  ;# Non-comment lines of test C code
# ls test*.c tclsqlite.c | wc
set stat(tclcNfile)      44  ;# Number of files of TCL C testcode + tclsqlite.c
# ls -l test*.c tclsqlite.c | awk '{sum+=$5}END{print sum}'
set stat(tclcNByte) 1060069  ;# Number of bytes of TCL C testcode + tclsqlite.c
# sloc *.test *.tcl
set stat(tclsSLOC)   246201  ;# Non-comment lines of TCL test script
# ls *.test *.tcl | wc
set stat(tclsNFile)     760  ;# Number of files of TCL test script
# ls -l *.test *.tcl | awk '{sum+=$5}END{print sum}'
set stat(tclsNByte) 10803982 ;# Number of bytes of TCL test script
# grep do_test *.test | wc; grep do_execsql_test *.test | wc
set stat(tclNTest)    30295  ;# Number of test cases in the TCL test suite
set stat(tclNEval)  1298536  ;# Number of test case evaluations
set stat(nSqlFuzz)   114899  ;# Number of SQL fuzz tests
set stat(vqNEval)    202234  ;# Number of test evaluations for veryquick.test
#  set stat(vqStmtCov)   97.23  ;# veryquick statement coverage
#  set stat(vqBrCov)     92.57  ;# veryquick branch coverage
#  set stat(allStmtCov)  99.50  ;# all.test statement coverage
#  set stat(allBrCov)    97.41  ;# all.test condition/decision coverage
# tclsh mkth3.tcl cfg/*.cfg */*.test >th3.c; sloc th3.c
set stat(th3SLOC)    691250  ;# Non-comment lines in full th3.c
# ls -l th3.c
set stat(th3NByte) 51188527  ;# Number of bytes in full th3.c
# grep th3testBegin */*.test
# grep th3oomBegin */*.test
# grep th3ioerrBegin */*.test
# grep '^--testcase' */*.test
set stat(th3NTest)      35211  ;# Number of test cases
# from output of a min.rc test run.
set stat(th3NECov)     836678  ;# Number of test case evals for coverage
#set stat(th3NETest)  7247055  ;# Number of test case evaluations
#set stat(th3NEExt) 589175483  ;# Number of test case evals extended
#set stat(th3NERel) 2500000000 ;# Number of test case evals release
set stat(th3StmtCov) 100.00  ;# TH3 statement coverage
set stat(th3BrCov)   100.00  ;# TH3 branch coverage
# wc `find . -name '*.test'` | awk '{x+=$1}END{print x}'
set stat(sltsSLOC)  90489494 ;# Non-comment lines of SLT test script
# ls -l `find . -name '*.test'` | awk '{sum+=$5}END{print sum}'
set stat(sltsNByte) 1116800308 ;# Bytes of SLT test script
# find . -name '*.test' | wc
set stat(sltsNFile)        622 ;# Files of SLT test script
# sloc md5.c slt_*.c sqllogictest.c
set stat(sltcSLOC)        1404 ;# Non-comment lines of SLT C code
# grep '^query' `fossil ls | awk '/\.test$/{print $2}'` | wc
set stat(sltNTest)     7195342 ;# Number of test cases in SLT
# grep 'assert(' sqlite3.c | wc
set stat(nAssert)         3691 ;# Number of assert statements
# grep 'testcase(' sqlite3.c | grep -v define | wc
set stat(nTestcase)        695 ;# Number of testcase statements

set stat(totalSLOC) [expr {$stat(tclcSLOC)+$stat(tclsSLOC)+
                           $stat(th3SLOC)+$stat(sltcSLOC)+$stat(sltsSLOC)}]

proc GB {expr} {
  set n [uplevel #0 expr $expr]
  hd_puts [format %.2f [expr {$n/(1000.0*1000.0*1000.0)}]]
................................................................................
<h1 align="center">How SQLite Is Tested</h1>

<h2>1.0 Introduction</h2>

<p>The reliability and robustness of SQLite is achieved in part
by thorough and careful testing.</p>

<p>As of [version 3.8.0],
the SQLite library consists of approximately
<tcl>KB {$stat(coreSLOC)}</tcl> KSLOC of C code.
(KSLOC means thousands of "Source Lines Of Code" or, in other words,
lines of code excluding blank lines and comments.)
By comparison, the project has
<tcl>
hd_puts "[expr {int($stat(totalSLOC)/$stat(coreSLOC))}] times as much"
................................................................................
[http://www.tcl.tk/ | TCL scripting language].
The TCL test harness itself consists of <tcl>KB {$stat(tclcSLOC)}</tcl> KSLOC 
of C code used to create the TCL interface.  The test scripts are contained
in <tcl>N {$stat(tclsNFile)}</tcl> files totaling 
<tcl>MiB {$stat(tclsNByte)}</tcl>MB in size.  There are
<tcl>N {$stat(tclNTest)}</tcl> distinct test cases, but many of the test
cases are parameterized and run multiple times (with different parameters)
so that on a full test run millions of
separate tests are performed.
</p>
</li>

<li><p>
The <b>[TH3]</b> test harness is a set of proprietary tests, written in
C that provide 100% branch test coverage (and 100% MC/DC test coverage) to

Changes to pages/whentouse.in.

88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
     32-bit and 64-bit and big- and little-endian architectures.
<li> The application only has to load as much data as it needs, rather
     than reading the entire application file and holding a complete parse
     in memory.  Startup time and memory consumption are reduced.
<li> Small edits only overwrite the parts of the file that change,
     not the entire file, thus improving performance
     and reducing wear on SSD drives.
<li> Content is updated continously and atomically so 
     that there is no work lost in the event of a power failure or crash.
<li> Applications can leverage the
     [full-text search] and [RTREE] capablities that are built into SQLite.
<li> Performance problems can often be resolved using [CREATE INDEX]
     rather than redesigning, rewriting, and retesting application code.
<li> A federation of programs, perhaps written in different programming
     languages, can all access the same application file with no 
     compatibility concerns.
<li> Multiple processes can attach to the same application
     file and can read and write without interfering with each another.







|


|







88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
     32-bit and 64-bit and big- and little-endian architectures.
<li> The application only has to load as much data as it needs, rather
     than reading the entire application file and holding a complete parse
     in memory.  Startup time and memory consumption are reduced.
<li> Small edits only overwrite the parts of the file that change,
     not the entire file, thus improving performance
     and reducing wear on SSD drives.
<li> Content is updated continuously and atomically so 
     that there is no work lost in the event of a power failure or crash.
<li> Applications can leverage the
     [full-text search] and [RTREE] capabilities that are built into SQLite.
<li> Performance problems can often be resolved using [CREATE INDEX]
     rather than redesigning, rewriting, and retesting application code.
<li> A federation of programs, perhaps written in different programming
     languages, can all access the same application file with no 
     compatibility concerns.
<li> Multiple processes can attach to the same application
     file and can read and write without interfering with each another.