Documentation Source Text

Check-in [67d025e82c]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Documentation updates for version 3.7.17.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 67d025e82c1bf21577d81b95c0778072cf7ee087
User & Date: drh 2013-05-18 17:45:49.933
Context
2013-05-19
21:21
Tweaks to the "application file format" use case description. (check-in: 89971860f6 user: drh tags: trunk)
2013-05-18
17:45
Documentation updates for version 3.7.17. (check-in: 67d025e82c user: drh tags: trunk)
2013-05-15
18:58
Add recent bug fixes to the 3.7.17 change log. (check-in: 2b8ea61f51 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/changes.in.
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
      <a href="http://www.sqlite.org/src/timeline">
      http://www.sqlite.org/src/timeline</a>.</p>
    }
    hd_close_aux
    hd_enable_main 1
  }
}
chng {2013-05-22 (3.7.17)} {
<li>Add support for [memory-mapped I/O].
<li>Add the [sqlite3_strglob()] convenience interface.
<li>Assigned the integer at offset 68 in the [database header] as the
    [Application ID] for when SQLite is used as an [application file-format].
    Added the [PRAGMA application_id] command to query and set the Application ID.
<li>Report rollback recovery in the [error log] as SQLITE_NOTICE_RECOVER_ROLLBACK.
    Change the error log code for WAL recover from 







|







37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
      <a href="http://www.sqlite.org/src/timeline">
      http://www.sqlite.org/src/timeline</a>.</p>
    }
    hd_close_aux
    hd_enable_main 1
  }
}
chng {2013-05-20 (3.7.17)} {
<li>Add support for [memory-mapped I/O].
<li>Add the [sqlite3_strglob()] convenience interface.
<li>Assigned the integer at offset 68 in the [database header] as the
    [Application ID] for when SQLite is used as an [application file-format].
    Added the [PRAGMA application_id] command to query and set the Application ID.
<li>Report rollback recovery in the [error log] as SQLITE_NOTICE_RECOVER_ROLLBACK.
    Change the error log code for WAL recover from 
71
72
73
74
75
76
77
78



79
80
81
82
83
84

85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
        suffix (".so", ".dylib", or ".dll") will be added automatically.
    </ol>
<li>Added many new loadable extensions to the source tree, including
    amatch, closure, fuzzer, ieee754, nextchar, regexp, spellfix,
    and wholenumber.  See header comments on each extension source file 
    for further information about what that extension does.
<li>Enhance [FTS3] to avoid using excess stack space when there are a huge
    number of terms on the right-hand side of the MATCH operator.



<li>Added the [fts3tokenize virtual table] to the [full-text search] logic.
<li>Query planner enhancement: Use the transitive property of constraints
    to move constraints into the outer loops of a join whenever possible,
    thereby reducing the amount of work that needs to occur in inner loops.
<li>Discontinue the use of posix_fallocate() on unix, as it does not work on all
    filesystems.

<li>Bug fix: Fix a potential <b>database corruption bug</b>
    in [shared cache mode] when one
    [database connection] is closed while another is in the middle of a write
    transaction.
    Ticket [http://www.sqlite.org/src/info/e636a050b7 | e636a050b7]
<li>Bug fix:
    Only consider AS names from the result set as candidates for resolving
    identifiers in the WHERE clause if there are no other matches. In the 
    ORDER BY clause, AS names take priority over any column names.
    Ticket [http://www.sqlite.org/src/info/2500cdb9be05 | 2500cdb9be05]
<li>Bug fix: 
    Prevent excessive stack usage in the [full-text search] engine when processing
    full-text queries with many thousands of search terms.
<li>Bug fix: Do not allow a virtual table to cancel the ORDER BY clause unless 
    all outer loops are guaranteed to return no more than one row result.
    Ticket [http://www.sqlite.org/src/info/ba82a4a41eac1 | ba82a4a41eac1].
<li>Bug fix: Do not suppress the ORDER BY clause on a virtual table query if
    an IN constraint is used.
    Ticket [http://www.sqlite.org/src/info/f69b96e3076e | f69b96e3076e].
<li>Bug fix: The [command-line shell] gives an exit code of 0 when terminated







|
>
>
>






>










<
<
<







71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98



99
100
101
102
103
104
105
        suffix (".so", ".dylib", or ".dll") will be added automatically.
    </ol>
<li>Added many new loadable extensions to the source tree, including
    amatch, closure, fuzzer, ieee754, nextchar, regexp, spellfix,
    and wholenumber.  See header comments on each extension source file 
    for further information about what that extension does.
<li>Enhance [FTS3] to avoid using excess stack space when there are a huge
    number of terms on the right-hand side of the MATCH operator.  A side-effect
    of this change is that the MATCH operator can only accommodate 12 NEAR
    operators at a time.
<li>Enhance the [fts4aux] virtual table so that it can be a TEMP table.
<li>Added the [fts3tokenize virtual table] to the [full-text search] logic.
<li>Query planner enhancement: Use the transitive property of constraints
    to move constraints into the outer loops of a join whenever possible,
    thereby reducing the amount of work that needs to occur in inner loops.
<li>Discontinue the use of posix_fallocate() on unix, as it does not work on all
    filesystems.
<li>Improved tracing and debugging facilities in the Windows [VFS].
<li>Bug fix: Fix a potential <b>database corruption bug</b>
    in [shared cache mode] when one
    [database connection] is closed while another is in the middle of a write
    transaction.
    Ticket [http://www.sqlite.org/src/info/e636a050b7 | e636a050b7]
<li>Bug fix:
    Only consider AS names from the result set as candidates for resolving
    identifiers in the WHERE clause if there are no other matches. In the 
    ORDER BY clause, AS names take priority over any column names.
    Ticket [http://www.sqlite.org/src/info/2500cdb9be05 | 2500cdb9be05]



<li>Bug fix: Do not allow a virtual table to cancel the ORDER BY clause unless 
    all outer loops are guaranteed to return no more than one row result.
    Ticket [http://www.sqlite.org/src/info/ba82a4a41eac1 | ba82a4a41eac1].
<li>Bug fix: Do not suppress the ORDER BY clause on a virtual table query if
    an IN constraint is used.
    Ticket [http://www.sqlite.org/src/info/f69b96e3076e | f69b96e3076e].
<li>Bug fix: The [command-line shell] gives an exit code of 0 when terminated
Changes to pages/loadext.in.
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
need to supply SQLite with the name of the file containing the
shared library or DLL and an entry point to initialize the extension.
In C code, this information is supplied using the
[sqlite3_load_extension()] API.  See the documentation on that
routine for additional information.</p>

<p>Note that different operating systems use different filename
suffixes for their shared librarys.  Windows use ".dll", Mac uses
".dylib", and most unixes other than mac use ".so".  If you want to
make your code portable, you can omit the suffix from the shared
library filename and the appropriate suffix will be added automatically
by the [sqlite3_load_extension()] interface.</p>

<p>There is also an SQL function that can be used to load extensions:
[load_extension(X,Y)].  It works just like the [sqlite3_load_extension()]







|







24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
need to supply SQLite with the name of the file containing the
shared library or DLL and an entry point to initialize the extension.
In C code, this information is supplied using the
[sqlite3_load_extension()] API.  See the documentation on that
routine for additional information.</p>

<p>Note that different operating systems use different filename
suffixes for their shared libraries.  Windows use ".dll", Mac uses
".dylib", and most unixes other than mac use ".so".  If you want to
make your code portable, you can omit the suffix from the shared
library filename and the appropriate suffix will be added automatically
by the [sqlite3_load_extension()] interface.</p>

<p>There is also an SQL function that can be used to load extensions:
[load_extension(X,Y)].  It works just like the [sqlite3_load_extension()]
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

#ifdef _WIN32
__declspec(dllexport)
#endif
/* TODO: Change the entry point name so that "extension" is replaced by
** text derived from the shared library filename as follows:  Copy every
** ASCII alphabetic character from the filename after the last "/" through
** the next following ".", converting each character to to lowercase, and
** discarding the first three characters if they are "lib".
*/
int sqlite3_extension_init(
  sqlite3 *db, 
  char **pzErrMsg, 
  const sqlite3_api_routines *pApi
){







|







181
182
183
184
185
186
187
188
189
190
191
192
193
194
195

#ifdef _WIN32
__declspec(dllexport)
#endif
/* TODO: Change the entry point name so that "extension" is replaced by
** text derived from the shared library filename as follows:  Copy every
** ASCII alphabetic character from the filename after the last "/" through
** the next following ".", converting each character to lowercase, and
** discarding the first three characters if they are "lib".
*/
int sqlite3_extension_init(
  sqlite3 *db, 
  char **pzErrMsg, 
  const sqlite3_api_routines *pApi
){
Changes to pages/news.in.
14
15
16
17
18
19
20














21
22
23
24
25
26
27
  hd_puts "<h3>$date - $title</h3>"
  regsub -all "\n( *\n)+" $text "</p>\n\n<p>" txt
  regsub -all {[Tt]icket #(\d+)} $txt \
      {<a href="http://www.sqlite.org/cvstrac/tktview?tn=\1">\0</a>} txt
  hd_resolve "<blockquote>$txt</blockquote>"
  hd_puts "<hr width=\"50%\">"
}















newsitem {2013-04-12} {Release 3.7.16.2} {
  SQLite [version 3.7.16.2] fixes a long-standing flaw in the Windows
  OS interface that
  can result in database corruption under a rare race condition.
  See [http://www.sqlite.org/src/info/7ff3120e4f] for a full description
  of the problem.







>
>
>
>
>
>
>
>
>
>
>
>
>
>







14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
  hd_puts "<h3>$date - $title</h3>"
  regsub -all "\n( *\n)+" $text "</p>\n\n<p>" txt
  regsub -all {[Tt]icket #(\d+)} $txt \
      {<a href="http://www.sqlite.org/cvstrac/tktview?tn=\1">\0</a>} txt
  hd_resolve "<blockquote>$txt</blockquote>"
  hd_puts "<hr width=\"50%\">"
}

newsitem {2013-05-20} {Release 3.7.17} {
  SQLite [version 3.7.17] is a regularly schedule maintenance release.
  Visit the [version 3.7.17 | change log] for a full explanation of the
  changes in this release.

  There are many bug fixes in version 3.7.17.  But this does not indicate
  that 3.7.16 was a problematic release.  All of the bugs in 3.7.17 are 
  obscure and are unlikely to impact any particular application.  And most 
  of the bugs that are fixed in 3.7.17 predate 3.7.16 and have been in 
  the code for years without ever before being noticed.
  Nevertheless, due to the large number of fixes,
  all users are encouraged to upgrade when possible.
}

newsitem {2013-04-12} {Release 3.7.16.2} {
  SQLite [version 3.7.16.2] fixes a long-standing flaw in the Windows
  OS interface that
  can result in database corruption under a rare race condition.
  See [http://www.sqlite.org/src/info/7ff3120e4f] for a full description
  of the problem.
Changes to pages/queryplanner-ng.in.
8
9
10
11
12
13
14

15
16
17
18
19
20
21
}
proc code {txt} {
  hd_puts "<center><table><tr><td><pre>\n"
  regsub {^[ \n]*\n} $txt {} txt
  hd_puts [string trimright $txt]\n
  hd_puts "</pre></table></center>\n"
}

</tcl>
<h1 align="center">The Next Generation Query Planner</h1>

<blockquote>
<p style="border:1px solid black; color: red;"><i><b>
This article is about proposed enhancements to SQLite that have not yet
been implemented.  Everything mentioned here is preliminary and subject







>







8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
}
proc code {txt} {
  hd_puts "<center><table><tr><td><pre>\n"
  regsub {^[ \n]*\n} $txt {} txt
  hd_puts [string trimright $txt]\n
  hd_puts "</pre></table></center>\n"
}
hd_keywords {next generation query planner}
</tcl>
<h1 align="center">The Next Generation Query Planner</h1>

<blockquote>
<p style="border:1px solid black; color: red;"><i><b>
This article is about proposed enhancements to SQLite that have not yet
been implemented.  Everything mentioned here is preliminary and subject
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
query plan for TPC-H Q8.  This article explains why that is and what
we intend to do about it for SQLite version 3.7.18 and later.
</p>

<h2>The Query</h2>

<p>
TPC-H Q8 is a eight-way join.
As SQLite uses only loop joins, the main task of the query planner is
to figure out the best nesting order of the 8 loops in order to minimize
the amount of CPU time and I/O required to complete the join.
A simplified model of this problem for the case of TPC-H Q8 is shown
by the following (hand drawn) diagram:
</p>








|







34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
query plan for TPC-H Q8.  This article explains why that is and what
we intend to do about it for SQLite version 3.7.18 and later.
</p>

<h2>The Query</h2>

<p>
TPC-H Q8 is an eight-way join.
As SQLite uses only loop joins, the main task of the query planner is
to figure out the best nesting order of the 8 loops in order to minimize
the amount of CPU time and I/O required to complete the join.
A simplified model of this problem for the case of TPC-H Q8 is shown
by the following (hand drawn) diagram:
</p>

70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
of running each loop with no dependencies.  The outer loop must use this
*-cost.  Inner loops have the option of using the *-cost or a cost assuming
one of the other terms is in an outer loop, whichever gives the best
result.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph shown above that visits each node
exactly once.  In other words, 
the problem of finding the best query plan is equivalent to solving the
<a href="http://en.wikipedia.org/wiki/Travelling_salesman_problem">Travelling
Saleman Problem</a> (TSP).  And the TSP is 
<a href="http://en.wikipedia.org/wiki/NP-hard">NP-complete</a>.</p>

<h3>Complications</h3>

<p>The presentation of the query planner problem above is a simplification.
Keep in mind, first of all, that the costs are merely estimates.  We cannot
know what the true cost of running a loop is until we actually run the loop.
SQLite makes guesses for the cost of running a loop based on







|
<
<
<
<







71
72
73
74
75
76
77
78




79
80
81
82
83
84
85
of running each loop with no dependencies.  The outer loop must use this
*-cost.  Inner loops have the option of using the *-cost or a cost assuming
one of the other terms is in an outer loop, whichever gives the best
result.</p>

<p>The problem of finding the best query plan is equivalent to finding
a minimum-cost path through the graph shown above that visits each node
exactly once.</p>





<h3>Complications</h3>

<p>The presentation of the query planner problem above is a simplification.
Keep in mind, first of all, that the costs are merely estimates.  We cannot
know what the true cost of running a loop is until we actually run the loop.
SQLite makes guesses for the cost of running a loop based on
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
have an index.  There is an additional startup cost which is the cost of
initializing the loop for each iteration of the outer loop.  Then there
is the cost of running each step of the loop.  Finally, there is an estimate
of the number rows generated by the loop, which is information needed in
estimating the costs of inner loops.</p>

<p>A general query need not be a simple graph as shown above.
Dependences need not be on a single other loop.  For example, you might
have a query where a particular loop depends on two or more other terms
being in outer loops, instead of just a single term.  We cannot really
draw a graph for these kinds of queries since there is no way for an
arc to originate at two or more nodes at once.</p>

<p>In the TPC-H Q8 query, the setup and startup costs are all negligible
and all dependences are from a single other node. So for this case, at least,
the graph above is a reasonable representation of what needs to be computed.
The general case involves a lot of extra complication.
But the point of this article is to explain what is going on, and that
extra complication does not aid in the explanation, and is therefore
omitted.</p>

<h2>Finding The Best Query Plan</h2>

<p>Since its inception, SQLite has always used the rough equilvalent of
the "Nearest Neighbor" or "NN" heuristic when searching for the best query plan
for a particular query.  The NN heuristic choses the lowest-cost arc
to follow next.  The NN heuristic works surprisingly well in most cases.
And NN is fast, so that SQLite is able to quickly find good query plans
for even large 64-way joins.  In contrast, other SQL database engines such
as Oracle, PostgreSQL, MySQL, and SQL Server tend to bog down when the
number of tables in a join goes above 10 or 15.</p>

<p>However, the query plan computed by NN for TPC-H Q8 is not optimal.
The plan computed using NN is R-N1-N2-S-C-O-L-P with a cost of 36.92.  
The notation
in the previous sentence means that the R table is run in the outer loop,
N1 is in the next inner loop, N2 is in the third loop, and so forth down
to P which is in the inner-most loop.  But the shortest path through the
graph (as found via exhaustive search) is  P-L-O-C-N1-R-S-N2
with a cost of 27.38.  This might not seem like much, but remember that
the costs are logarithmic, so the shortest path is nearly 14,000 times
faster than that path found using the NN heuristic.</p>

<p>One solution to this problem is to change SQLite to do an exhaustive
search for the best query plan.  But the SQLite developers are unwilling
to do this because an exhaustive search requires time proportial to
K! (where K is the number of tables in the join) and so when you get 
beyond a 10-way join, then time
to run [sqlite3_prepare()] becomes very large.</p>

<h3>N Nearest Neighbors</h3>

<p>The proposed solution to this dilemma 







|






|










|



















|







97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
have an index.  There is an additional startup cost which is the cost of
initializing the loop for each iteration of the outer loop.  Then there
is the cost of running each step of the loop.  Finally, there is an estimate
of the number rows generated by the loop, which is information needed in
estimating the costs of inner loops.</p>

<p>A general query need not be a simple graph as shown above.
Dependencies need not be on a single other loop.  For example, you might
have a query where a particular loop depends on two or more other terms
being in outer loops, instead of just a single term.  We cannot really
draw a graph for these kinds of queries since there is no way for an
arc to originate at two or more nodes at once.</p>

<p>In the TPC-H Q8 query, the setup and startup costs are all negligible
and all dependencies are from a single other node. So for this case, at least,
the graph above is a reasonable representation of what needs to be computed.
The general case involves a lot of extra complication.
But the point of this article is to explain what is going on, and that
extra complication does not aid in the explanation, and is therefore
omitted.</p>

<h2>Finding The Best Query Plan</h2>

<p>Since its inception, SQLite has always used the rough equilvalent of
the "Nearest Neighbor" or "NN" heuristic when searching for the best query plan
for a particular query.  The NN heuristic chooses the lowest-cost arc
to follow next.  The NN heuristic works surprisingly well in most cases.
And NN is fast, so that SQLite is able to quickly find good query plans
for even large 64-way joins.  In contrast, other SQL database engines such
as Oracle, PostgreSQL, MySQL, and SQL Server tend to bog down when the
number of tables in a join goes above 10 or 15.</p>

<p>However, the query plan computed by NN for TPC-H Q8 is not optimal.
The plan computed using NN is R-N1-N2-S-C-O-L-P with a cost of 36.92.  
The notation
in the previous sentence means that the R table is run in the outer loop,
N1 is in the next inner loop, N2 is in the third loop, and so forth down
to P which is in the inner-most loop.  But the shortest path through the
graph (as found via exhaustive search) is  P-L-O-C-N1-R-S-N2
with a cost of 27.38.  This might not seem like much, but remember that
the costs are logarithmic, so the shortest path is nearly 14,000 times
faster than that path found using the NN heuristic.</p>

<p>One solution to this problem is to change SQLite to do an exhaustive
search for the best query plan.  But the SQLite developers are unwilling
to do this because an exhaustive search requires time proportional to
K! (where K is the number of tables in the join) and so when you get 
beyond a 10-way join, then time
to run [sqlite3_prepare()] becomes very large.</p>

<h3>N Nearest Neighbors</h3>

<p>The proposed solution to this dilemma