Documentation Source Text

Check-in [4e286cee85]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Added a 3.8.1 change log. Added documentation on SQLITE_MINIMUM_FILE_DESCRIPTOR.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 4e286cee85a1fc3626375ba1932fcfa3203fdaef
User & Date: drh 2013-08-30 14:28:53.067
Context
2013-08-31
14:32
Add documentation for the fts4 unicode61 tokenizer option "remove_diacritics=0". (check-in: f5d1d7106f user: dan tags: trunk)
2013-08-30
14:28
Added a 3.8.1 change log. Added documentation on SQLITE_MINIMUM_FILE_DESCRIPTOR. (check-in: 4e286cee85 user: drh tags: trunk)
14:03
Include all changes since 3.7.17 in the 3.8.0.1 change log. Typos fixed in the NGQP document. (check-in: 3f23dd6ba9 user: drh tags: branch-3.8.0)
2013-08-29
16:51
If the user tries a download hyperlink without javascript, send them to an explanation page. (check-in: 32aeca2648 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/changes.in.
42
43
44
45
46
47
48


































































49
50
51
52
53
54
55
    hd_enable_main 1
    incr nChng
    if {$nChng==1 && [file exists $DEST/$filename]} {
      file copy -force $DEST/$filename $DEST/releaselog/current.html
    }
  }
}



































































chng {2013-08-26 (3.8.0)} {
<li>Add support for [partial indexes]</li>
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>The [EXPLAIN QUERY PLAN] output no longer shows an estimate of the number of 
    rows generated by each loop in a join.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
    hd_enable_main 1
    incr nChng
    if {$nChng==1 && [file exists $DEST/$filename]} {
      file copy -force $DEST/$filename $DEST/releaselog/current.html
    }
  }
}

chng {2013-10-?? (3.8.1)} {
<li>Add support for SQLITE_ENABLE_STAT4
<li>Add the [SQLITE_MINIMUM_FILE_DESCRIPTOR] compile-time option
<li>Add the win32-longpath VFS on windows.
}

chng {2013-08-29 (3.8.0.1)} {
<li>Add support for [partial indexes]</li>
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>The [EXPLAIN QUERY PLAN] output no longer shows an estimate of the number of 
    rows generated by each loop in a join.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.
<li>Added the [SQLITE_STMTSTATUS_VM_STEP] option to [sqlite3_stmt_status()].
<li>Added the [cache_spill pragma].
<li>Added the [query_only pragma].
<li>Added the [defer_foreign_keys pragma] and the
    [sqlite3_db_status](db, [SQLITE_DBSTATUS_DEFERRED_FKS],...) C-language interface.
<li>Added the "percentile()" function as a [loadable extension] in the ext/misc
    subdirectory of the source tree.
<li>Added the [SQLITE_ALLOW_URI_AUTHORITY] compile-time option.
<li>Add the [sqlite3_cancel_auto_extension(X)] interface.
<li>A running SELECT statement that lacks a FROM clause (or any other statement that
    never reads or writes from any database file) will not prevent a read
    transaction from closing.
<li>Add the [SQLITE_DEFAULT_AUTOMATIC_INDEX] compile-time option.  Setting this option
    to 0 disables automatic indices by default.
<li>Issue an [SQLITE_WARNING_AUTOINDEX] warning on the [SQLITE_CONFIG_LOG] whenever
    the query planner uses an automatic index.
<li>Added the [SQLITE_FTS3_MAX_EXPR_DEPTH] compile-time option.
<li>Added an optional 5th parameter defining the collating sequence to the 
    next_char() extension SQL function.
<li>The [SQLITE_BUSY_SNAPSHOT] extended error code is returned in WAL mode when
    a read transaction cannot be upgraded to a write transaction because the read is
    on an older snapshot.
<li>Enhancements to the sqlite3_analyzer utility program to provide size
    information separately for each individual index of a table, in addition to
    the aggregate size.
<li>Allow read transactions to be freely opened and closed by SQL statements run 
    from within the implementation of [application-defined SQL functions] if the
    function is called by a SELECT statement that does not access any database table.
<li>Disable the use of posix_fallocate() on all (unix) systems unless the
    HAVE_POSIX_FALLOCATE compile-time option is used.
<li>Update the ".import" command in the [command-line shell] to support multi-line
    fields and correct RFC-4180 quoting and to issue warning and/or error messages
    if the input text is not strictly RFC-4180 compliant.
<li>Bug fix: In the [unicode61] tokenizer of [FTS4], treat all private code points
    as identifier symbols.
<li>Bug fix: Bare identifiers in ORDER BY clauses bind more tightly to output column
    names, but identifiers in expressions bind more tightly to input column names.
    Identifiers in GROUP BY clauses always prefer output column names, however.
<li>Bug fixes: Multiple problems in the legacy query optimizer were fixed by the 
    move to [NGQP].
</ul><p>The above are changes since [version 3.7.17].  The differences
between 3.8.0 and 3.8.0.1 are as follows:</p><ul>
<li>Fix an off-by-one error that caused quoted empty string at the end of a 
CRNL-terminated line of CSV input to be misread by the command-line shell.
<li>Fix a query planner bug involving a LEFT JOIN with a BETWEEN or LIKE/GLOB
constraint and then another INNER JOIN to the right that involves an OR constraint.
<li>Fix a query planner bug that could result in a segfault when querying tables
with a UNIQUE or PRIMARY KEY constraint with more than four columns.

<li>SQLITE_SOURCE_ID: 
    "2013-08-29 17:35:01 352362bc01660edfbda08179d60f09e2038a2f49"
<li>SHA1 for sqlite3.c: 99906bf63e6cef63d6f3d7f8526ac4a70e76559e
}

chng {2013-08-26 (3.8.0)} {
<li>Add support for [partial indexes]</li>
<li>Cut-over to the [next generation query planner] for faster and better query plans.
<li>The [EXPLAIN QUERY PLAN] output no longer shows an estimate of the number of 
    rows generated by each loop in a join.
<li>Added the [FTS4 notindexed option], allowing non-indexed columns in an FTS4 table.
Changes to pages/compile.in.
191
192
193
194
195
196
197













198
199
200
201
202
203
204
  in that if one thread is constantly changing the schema, another thread
  might spin on reparses and repreparations of a prepared statement and
  never get any real work done.  This parameter prevents an infinite loop
  by forcing the spinning thread to give up after a fixed number of attempts
  at recompiling the prepared statement.  The default setting is 50 which is
  more than adequate for most applications.
}














COMPILE_OPTION {SQLITE_POWERSAFE_OVERWRITE=<i>&lt;0 or 1&gt;</i>} {
  This option changes the default assumption about [powersafe overwrite]
  for the underlying filesystems for the unix and windows [VFSes].
  Setting SQLITE_POWERSAFE_OVERWRITE to 1 causes SQLite to assume that
  application-level writes cannot changes bytes outside the range of
  bytes written even if the write occurs just before a power loss.







>
>
>
>
>
>
>
>
>
>
>
>
>







191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
  in that if one thread is constantly changing the schema, another thread
  might spin on reparses and repreparations of a prepared statement and
  never get any real work done.  This parameter prevents an infinite loop
  by forcing the spinning thread to give up after a fixed number of attempts
  at recompiling the prepared statement.  The default setting is 50 which is
  more than adequate for most applications.
}

COMPILE_OPTION {SQLITE_MINIMUM_FILE_DESCRIPTOR=<i>N</i>} {
  The unix [VFS] will never use a file descriptor less than <i>N</i>.  The
  default value of <i>N</i> is 3.
  <p>
  Avoiding the use of low-numbered file descriptors is a defense against
  accidental database corruption.  If a database file was opened using
  file descriptor 2, for example, and then an assert() failed and invoked
  write(2,...), that would likely cause database corruption.  Using only
  higher-valued file descriptors avoids that problem.  The protection against
  using low-numbered file descriptiors can be disabled by setting this
  compile-time option to 0.
}

COMPILE_OPTION {SQLITE_POWERSAFE_OVERWRITE=<i>&lt;0 or 1&gt;</i>} {
  This option changes the default assumption about [powersafe overwrite]
  for the underlying filesystems for the unix and windows [VFSes].
  Setting SQLITE_POWERSAFE_OVERWRITE to 1 causes SQLite to assume that
  application-level writes cannot changes bytes outside the range of
  bytes written even if the write occurs just before a power loss.
Changes to pages/index.in.
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

</td>
<td width="20"></td><td bgcolor="#044a64" width="1"></td><td width="20"></td>
<td valign="top">
<h3>Current Status</h3>

<p><ul>
<li><a href="releaselog/3_8_0.html">Version 3.8.0</a>
of SQLite is recommended for all new development.
Upgrading from version 3.7.17 is optional.
Upgrading from all other prior versions of SQLite
is recommended.</li>
</ul></p>

<h3>Common Links</h3>

<p><ul>







|

|







91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

</td>
<td width="20"></td><td bgcolor="#044a64" width="1"></td><td width="20"></td>
<td valign="top">
<h3>Current Status</h3>

<p><ul>
<li><a href="releaselog/3_8_1.html">Version 3.8.1</a>
of SQLite is recommended for all new development.
Upgrading from version 3.7.17 and 3.8.0.1 is optional.
Upgrading from all other prior versions of SQLite
is recommended.</li>
</ul></p>

<h3>Common Links</h3>

<p><ul>
Changes to pages/news.in.
14
15
16
17
18
19
20





21
22
23
24
25
26
27
  hd_puts "<h3>$date - $title</h3>"
  regsub -all "\n( *\n)+" $text "</p>\n\n<p>" txt
  regsub -all {[Tt]icket #(\d+)} $txt \
      {<a href="http://www.sqlite.org/cvstrac/tktview?tn=\1">\0</a>} txt
  hd_resolve "<blockquote>$txt</blockquote>"
  hd_puts "<hr width=\"50%\">"
}






newsitem {2013-08-26} {Release 3.8.0} {
  <b>Do not fear the zero!</b>

  <p>SQLite [version 3.8.0] might easily have been called "3.7.18" instead.
  However, this release features the cutover of the
  [next generation query planner] or [NGQP], and there is a small chance of







>
>
>
>
>







14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  hd_puts "<h3>$date - $title</h3>"
  regsub -all "\n( *\n)+" $text "</p>\n\n<p>" txt
  regsub -all {[Tt]icket #(\d+)} $txt \
      {<a href="http://www.sqlite.org/cvstrac/tktview?tn=\1">\0</a>} txt
  hd_resolve "<blockquote>$txt</blockquote>"
  hd_puts "<hr width=\"50%\">"
}

newsitem {2013-08-29} {Release 3.8.0.1} {
  <p>SQLite [version 3.8.0.1] fixes some obscure bugs that were uncovered by
  users in the 3.8.0 release.  Changes from 3.8.0 are minimal.
}

newsitem {2013-08-26} {Release 3.8.0} {
  <b>Do not fear the zero!</b>

  <p>SQLite [version 3.8.0] might easily have been called "3.7.18" instead.
  However, this release features the cutover of the
  [next generation query planner] or [NGQP], and there is a small chance of
Changes to pages/queryplanner-ng.in.
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
nodes in the graph.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph that visits each node
exactly once.</p>

<p>(Side note:  The costs estimates in the TPC-H Q8 graph were computed
by the query planner in SQLite 3.7.16 and converted using a base-10 logarithm.)
</p>

<h3>3.2 Complications</h3>

<p>The presentation of the query planner problem above is a simplification.
The costs are estimates.  We cannot
know what the true cost of running a loop is until we actually run the loop.







|







202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
nodes in the graph.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph that visits each node
exactly once.</p>

<p>(Side note:  The costs estimates in the TPC-H Q8 graph were computed
by the query planner in SQLite 3.7.16 and converted using a natural logarithm.)
</p>

<h3>3.2 Complications</h3>

<p>The presentation of the query planner problem above is a simplification.
The costs are estimates.  We cannot
know what the true cost of running a loop is until we actually run the loop.
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
GROUP BY, or DISTINCT clause. So for TPC-H Q8,
the graph above is a reasonable representation of what needs to be computed.
The general case involves a lot of extra complication, which for clarity
is neglected in the remainder of this article.</p>

<h3>3.3 Finding The Best Query Plan</h3>

<p>Prior to version 3.8.0, SQLite always used the
the "Nearest Neighbor" or "NN" heuristic when searching for the best query plan.
The NN heuristic makes a single traversal of the graph, always choosing
the lowest-cost arc as the next step.  
The NN heuristic works surprisingly well in most cases.
And NN is fast, so that SQLite is able to quickly find good plans
for even large 64-way joins.  In contrast, other SQL database engines that
do more extensive searching tend to bog down when the
number of tables in a join goes above 10 or 15.</p>

<p>Unfortunately, the query plan computed by NN for TPC-H Q8 is not optimal.
The plan computed using NN is R-N1-N2-S-C-O-L-P with a cost of 36.92.  
The notation
in the previous sentence means that the R table is run in the outer loop,
N1 is in the next inner loop, N2 is in the third loop, and so forth down
to P which is in the inner-most loop.  The shortest path through the
graph (as found via exhaustive search) is  P-L-O-C-N1-R-S-N2
with a cost of 27.38.  The difference might not seem like much, but 
remember that
the costs are logarithmic, so the shortest path is nearly 14,000 times
faster than that path found using the NN heuristic.</p>

<p>One solution to this problem is to change SQLite to do an exhaustive
search for the best path.  But an exhaustive search requires time 
proportional to
K! (where K is the number of tables in the join) and so when you get 
beyond a 10-way join, the time







|


















|







254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
GROUP BY, or DISTINCT clause. So for TPC-H Q8,
the graph above is a reasonable representation of what needs to be computed.
The general case involves a lot of extra complication, which for clarity
is neglected in the remainder of this article.</p>

<h3>3.3 Finding The Best Query Plan</h3>

<p>Prior to version 3.8.0, SQLite always used
the "Nearest Neighbor" or "NN" heuristic when searching for the best query plan.
The NN heuristic makes a single traversal of the graph, always choosing
the lowest-cost arc as the next step.  
The NN heuristic works surprisingly well in most cases.
And NN is fast, so that SQLite is able to quickly find good plans
for even large 64-way joins.  In contrast, other SQL database engines that
do more extensive searching tend to bog down when the
number of tables in a join goes above 10 or 15.</p>

<p>Unfortunately, the query plan computed by NN for TPC-H Q8 is not optimal.
The plan computed using NN is R-N1-N2-S-C-O-L-P with a cost of 36.92.  
The notation
in the previous sentence means that the R table is run in the outer loop,
N1 is in the next inner loop, N2 is in the third loop, and so forth down
to P which is in the inner-most loop.  The shortest path through the
graph (as found via exhaustive search) is  P-L-O-C-N1-R-S-N2
with a cost of 27.38.  The difference might not seem like much, but 
remember that
the costs are logarithmic, so the shortest path is nearly 750 times
faster than that path found using the NN heuristic.</p>

<p>One solution to this problem is to change SQLite to do an exhaustive
search for the best path.  But an exhaustive search requires time 
proportional to
K! (where K is the number of tables in the join) and so when you get 
beyond a 10-way join, the time
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
<center>
<img src="images/qp/fqp1.gif">
</center>

<p>
In the "without ANALYZE" case on the left, the NN algorithm chooses 
loop P (PLINK) as the outer loop because 4.9 is less than 5.2, resulting
in path P-T which is algorithm-1. NN only looks a the single best choice
at each step so it completely misses the fact that 
5.2+4.4 makes a slightly cheaper plan than 4.9+4.8. But the N3 algorithm
keeps track of the 5 best paths for a 2-way join, so it ends up
selecting path T-P because of its slightly lower overall cost.
Path T-P is algorithm-2.
</p>








|







561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
<center>
<img src="images/qp/fqp1.gif">
</center>

<p>
In the "without ANALYZE" case on the left, the NN algorithm chooses 
loop P (PLINK) as the outer loop because 4.9 is less than 5.2, resulting
in path P-T which is algorithm-1. NN only looks at the single best choice
at each step so it completely misses the fact that 
5.2+4.4 makes a slightly cheaper plan than 4.9+4.8. But the N3 algorithm
keeps track of the 5 best paths for a 2-way join, so it ends up
selecting path T-P because of its slightly lower overall cost.
Path T-P is algorithm-2.
</p>

587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
in the TPC-H Q8 graph.)</p>

<h3>4.2 Fixing The Problem</h3>

<p>Running [ANALYZE] on the repository database immediately fixed the
performance problem.  However, we want Fossil to be robust and to always
work quickly regardless of whether or not its repository has been analyzed.
For this reason, the query was modify to use the CROSS JOIN operator 
instead of the plain JOIN operator.
SQLite will not reorder the tables of a CROSS JOIN.
This is a long-standing feature of SQLite that is specifically designed
to allow knowledgeable programmers
to enforce a particular loop nesting order.  Once the join
was changed to CROSS JOIN (the addition of a single keyword) the NGQP was
forced to chose the faster algorithm-1 regardless of whether or not







|







587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
in the TPC-H Q8 graph.)</p>

<h3>4.2 Fixing The Problem</h3>

<p>Running [ANALYZE] on the repository database immediately fixed the
performance problem.  However, we want Fossil to be robust and to always
work quickly regardless of whether or not its repository has been analyzed.
For this reason, the query was modified to use the CROSS JOIN operator 
instead of the plain JOIN operator.
SQLite will not reorder the tables of a CROSS JOIN.
This is a long-standing feature of SQLite that is specifically designed
to allow knowledgeable programmers
to enforce a particular loop nesting order.  Once the join
was changed to CROSS JOIN (the addition of a single keyword) the NGQP was
forced to chose the faster algorithm-1 regardless of whether or not