Documentation Source Text

Check-in [885e891911]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Merge updates from the 3.10 branch.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 885e891911288a83e2d63dacd85f664519f2b3f9
User & Date: drh 2016-01-14 18:26:16
Context
2016-01-15
08:33
Update fts5.html to node that the unicode61 tokenizer is compatible with the fts3 tokenizer of the same name. check-in: 3cc1ff72a4 user: dan tags: trunk
2016-01-14
18:26
Merge updates from the 3.10 branch. check-in: 885e891911 user: drh tags: trunk
17:58
Fix another typo in the query planner document. check-in: 47810e77ef user: drh tags: branch-3.10
15:52
Remove the AWK dependency from the amalgamation page. check-in: fbbbdbca17 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/queryplanner.in.

1
2
3
4




5
6
7
8
9
10
11
...
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
...
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
...
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
...
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
<title>Query Planning</title>
<tcl>
hd_keywords {indexing} {indexing tutorial}
proc figure {fignum tag img title} {




  hd_puts "<p><center>\n"
  hd_puts "<img src=\"images/qp/$img\" alt=\"figure $fignum\"><br>\n"
  hd_puts "Figure $fignum: $title\n"
  hd_puts "</center></p>\n"
}
proc code {txt} {
  hd_puts "<center><table><tr><td><pre>\n"
................................................................................

<tcl>code {
SELECT price FROM fruitsforsale WHERE rowid=4;
}</tcl>

<p>
Since the information is stored in the table in rowid order, SQLite
can find the correct row using doing a binary search on the rowid.
If the table contains N element, the time required to look up the
desired row is proportional to logN rather than being proportional
to N as in a full table scan.  If the table contains 10 million elements,
that means the query will be on the order of N/logN or about 1 million
times faster!
</p>

................................................................................

<p>
To make the original query more efficient, we can add an index on the
"fruit" column of the "fruitsforsale" table like this:
</p>

<tcl>code {
CREATE INDEX idx1 ON fruitsforsale(fruit);
}</tcl>

<p>
An index is another table similar to the original "fruitsforsale" table
but with the content (the fruit column in this case) stored in front of the
rowid and with all rows in content order.
<a href="#fig4">Figure 4</a> gives a logical view of the Idx1 index.
................................................................................
</tcl>

<p>
The "state" index works just like the "fruit" index in that it is a
new table with an extra column in front of the rowid and sorted by
that extra column as the primary key.  The only difference is that
in Idx2, the first column is "state" instead of "fruit" as it is with
Idx1.  In our example data set, the is more redundancy in the "state"
column and so they are more duplicate entries.  The ties are still
resolved using the rowid.
</p>

<p>
Using the new Idx2 index on "state", SQLite has another option for
lookup up the price of California oranges:  it can look up every row
................................................................................
<tcl>figure 18 #fig18 obfruitidx1.gif {Sorting With An Index}</tcl>

<p>
The Idx1 index is scanned from top to bottom (or from bottom to top if
"ORDER BY fruit DESC" is used) in order to find the rowids for each item
in order by fruit.  Then for each rowid, a binary search is done to lookup
and output that row.  In this way, the output appears in the requested order
with the need to gather then entire output and sort it using a separate step.
</p>

<p>
But does this really save time?  The number of steps in the 
<a href="#fig16">original indexless sort</a> is proportional to NlogN since
that is how much time it takes to sort N rows.  But when we use Idx1 as
shown here, we have to do N rowid lookups which take logN time each, so




>
>
>
>







 







|







 







|







 







|







 







|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
...
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
...
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
...
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
...
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
<title>Query Planning</title>
<tcl>
hd_keywords {indexing} {indexing tutorial}
proc figure {fignum tag img title} {
  if {$tag!=""} {
    set tag [string trim $tag #]
    hd_puts "<a name='$tag'></a>\n"
  }
  hd_puts "<p><center>\n"
  hd_puts "<img src=\"images/qp/$img\" alt=\"figure $fignum\"><br>\n"
  hd_puts "Figure $fignum: $title\n"
  hd_puts "</center></p>\n"
}
proc code {txt} {
  hd_puts "<center><table><tr><td><pre>\n"
................................................................................

<tcl>code {
SELECT price FROM fruitsforsale WHERE rowid=4;
}</tcl>

<p>
Since the information is stored in the table in rowid order, SQLite
can find the correct row using a binary search on the rowid.
If the table contains N element, the time required to look up the
desired row is proportional to logN rather than being proportional
to N as in a full table scan.  If the table contains 10 million elements,
that means the query will be on the order of N/logN or about 1 million
times faster!
</p>

................................................................................

<p>
To make the original query more efficient, we can add an index on the
"fruit" column of the "fruitsforsale" table like this:
</p>

<tcl>code {
CREATE INDEX Idx1 ON fruitsforsale(fruit);
}</tcl>

<p>
An index is another table similar to the original "fruitsforsale" table
but with the content (the fruit column in this case) stored in front of the
rowid and with all rows in content order.
<a href="#fig4">Figure 4</a> gives a logical view of the Idx1 index.
................................................................................
</tcl>

<p>
The "state" index works just like the "fruit" index in that it is a
new table with an extra column in front of the rowid and sorted by
that extra column as the primary key.  The only difference is that
in Idx2, the first column is "state" instead of "fruit" as it is with
Idx1.  In our example data set, there is more redundancy in the "state"
column and so they are more duplicate entries.  The ties are still
resolved using the rowid.
</p>

<p>
Using the new Idx2 index on "state", SQLite has another option for
lookup up the price of California oranges:  it can look up every row
................................................................................
<tcl>figure 18 #fig18 obfruitidx1.gif {Sorting With An Index}</tcl>

<p>
The Idx1 index is scanned from top to bottom (or from bottom to top if
"ORDER BY fruit DESC" is used) in order to find the rowids for each item
in order by fruit.  Then for each rowid, a binary search is done to lookup
and output that row.  In this way, the output appears in the requested order
without the need to gather the entire output and sort it using a separate step.
</p>

<p>
But does this really save time?  The number of steps in the 
<a href="#fig16">original indexless sort</a> is proportional to NlogN since
that is how much time it takes to sort N rows.  But when we use Idx1 as
shown here, we have to do N rowid lookups which take logN time each, so