Documentation Source Text

Check-in [33d6d684cb]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to the queryplanner.html document.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 33d6d684cbaef7a12ab2278a3068c764d8005d6c
User & Date: drh 2010-08-30 00:13:59.000
Context
2010-08-30
00:42
Copy the spell-check exception list into the buid directory. Add "SQLite" to the spell-check exception list. (check-in: 99a59a73ff user: drh tags: trunk)
00:13
Updates to the queryplanner.html document. (check-in: 33d6d684cb user: drh tags: trunk)
2010-08-28
05:48
Fix documentation typos. (check-in: 78fe166463 user: shaneh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Added images/qp/fruitobstate0.gif.

cannot compute difference between binary files

Changes to pages/changes.in.
37
38
39
40
41
42
43










44
45
46
47
48
49
50
      <a href="http://www.sqlite.org/src/timeline">
      http://www.sqlite.org/src/timeline</a>.</p>
    }
    hd_close_aux
    hd_enable_main 1
  }
}











chng {2010 August 24 (3.7.2)} {
<li> Fix an <a href="http://www.sqlite.org/src/info/5e10420e8d">
     old and very obscure bug</a> that can lead to corruption of the
     database [free-page list] when [incremental_vacuum] is used.
}








>
>
>
>
>
>
>
>
>
>







37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
      <a href="http://www.sqlite.org/src/timeline">
      http://www.sqlite.org/src/timeline</a>.</p>
    }
    hd_close_aux
    hd_enable_main 1
  }
}

chng {2010 October 15 (3.7.3)} {
<li> Added the [sqlite3_create_function_v2()] interface that includes a
     destructor callback.
<li> The default page cache strives more diligently to avoid using memory
     beyond what is allocated to it by [SQLITE_CONFIG_PAGECACHE].  Or if
     using page cache is allocating from the heap, it strives to avoid
     going over the [sqlite3_soft_heap_limit()], even if
     [SQLITE_ENABLE_MEMORY_MANAGEMENT] is not set.
}

chng {2010 August 24 (3.7.2)} {
<li> Fix an <a href="http://www.sqlite.org/src/info/5e10420e8d">
     old and very obscure bug</a> that can lead to corruption of the
     database [free-page list] when [incremental_vacuum] is used.
}

Changes to pages/queryplanner.in.
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
<p>
The best feature of SQL (in <u>all</u> its implementations, not just SQLite)
is that it is a <i>declarative</i> language, not a <i>procedural</i>
language.  When programming in SQL you tell the system <i>what</i> you
want to compute, not <i>how</i> to compute it.  The task of figuring out
the <i>how</i> is delegated to the <i>query planner</i> subsystem within 
the SQL database engine.
Releaving the programmer from the chore of designing specific
query algorithms reduces the workload on the programmer and 
reduces the number of opportunities for the
programmer to make mistakes.
</p>

<p>
Most of the time, the query planner in SQLite does a good job on its
own and without outside help.
However, the query planner needs indices to
work with and it usually falls to the programmer to add indices to the
schema that are sufficient for the query planner to accomplish its task.
</p>

<p>
This document is intended to provide programmers who are new to SQL
and databases with the background information to help them understand
what is going on behind the scenes with SQLite, which in turn should make
it easier for programmers to create the indices that will help the SQLite
query planner to pick the best plans.
</p>

<h2>1.0 Querying</h2>

<h3>1.1 Tables Without Indices</h3>

<p>
Every table in SQLite consists of zero or more rows with a unique integer
key (the [rowid] or [INTEGER PRIMARY KEY]) followed by content.  The rows
are logically stored in order of increasing rowid.  As an example, this







|















|





|







19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
<p>
The best feature of SQL (in <u>all</u> its implementations, not just SQLite)
is that it is a <i>declarative</i> language, not a <i>procedural</i>
language.  When programming in SQL you tell the system <i>what</i> you
want to compute, not <i>how</i> to compute it.  The task of figuring out
the <i>how</i> is delegated to the <i>query planner</i> subsystem within 
the SQL database engine.
Relieving the programmer from the chore of designing specific
query algorithms reduces the workload on the programmer and 
reduces the number of opportunities for the
programmer to make mistakes.
</p>

<p>
Most of the time, the query planner in SQLite does a good job on its
own and without outside help.
However, the query planner needs indices to
work with and it usually falls to the programmer to add indices to the
schema that are sufficient for the query planner to accomplish its task.
</p>

<p>
This document is intended to provide programmers who are new to SQL
with background information to help them understand
what is going on behind the scenes with SQLite, which in turn should make
it easier for programmers to create the indices that will help the SQLite
query planner to pick the best plans.
</p>

<h2>1.0 Searching</h2>

<h3>1.1 Tables Without Indices</h3>

<p>
Every table in SQLite consists of zero or more rows with a unique integer
key (the [rowid] or [INTEGER PRIMARY KEY]) followed by content.  The rows
are logically stored in order of increasing rowid.  As an example, this
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
</tcl>

<p>
In this case, SQLite still does a single binary search to find the first
entry of the index where fruit='Orange'.  Then it extracts the rowid from
the index and uses that rowid to lookup the original table entry via
binary search and output the price from the original table.  But instead
of quiting, the database engine then advances to the next row of index
to repeat the process for next fruit='Orange' entry.  Advancing to the
next row of an index (or table) is much less costly than doing a binary
search since the next row is often located on the same database page as
the current row.  In fact, the cost of advancing to the next row is so
cheap in comparison to a binary search that we usually ignore it.  So
our estimate for the total cost of this query is 3 binary searches.
If the number of rows of output is K and the number of rows in the table







|







206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
</tcl>

<p>
In this case, SQLite still does a single binary search to find the first
entry of the index where fruit='Orange'.  Then it extracts the rowid from
the index and uses that rowid to lookup the original table entry via
binary search and output the price from the original table.  But instead
of quitting, the database engine then advances to the next row of index
to repeat the process for next fruit='Orange' entry.  Advancing to the
next row of an index (or table) is much less costly than doing a binary
search since the next row is often located on the same database page as
the current row.  In fact, the cost of advancing to the next row is so
cheap in comparison to a binary search that we usually ignore it.  So
our estimate for the total cost of this query is 3 binary searches.
If the number of rows of output is K and the number of rows in the table
359
360
361
362
363
364
365

366
367
368
369
370
371
372
<p>
Hence, a good rule of thumb is that your database schema should never
contain two indices where one index is a prefix of the other.  Drop the
index with fewer columns.  SQLite will still be able to do efficient
lookups with the longer index.
</p>


<h3>1.7 Covering Indices</h3>

<p>
The "price of California oranges" query was made more efficient through
the use of a two-column index.  But SQLite can do even better with a
three-column index that also includes the "price" column:
</p>







>







359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
<p>
Hence, a good rule of thumb is that your database schema should never
contain two indices where one index is a prefix of the other.  Drop the
index with fewer columns.  SQLite will still be able to do efficient
lookups with the longer index.
</p>

<tcl>hd_fragment covidx {covering index} {covering indices}</tcl>
<h3>1.7 Covering Indices</h3>

<p>
The "price of California oranges" query was made more efficient through
the use of a two-column index.  But SQLite can do even better with a
three-column index that also includes the "price" column:
</p>
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
by AND, by using an intersect operator in place of union.  Many SQL
database engines will do just that.  But the performance gain over using
just a single index is slight and so SQLite does not implement that technique
at this time.  However, a future version SQLite might be enhanced to support
AND-by-INTERSECT.
</p>


<h2>2.0 Sorting</h2>

<p>
SQLite (like all other SQL database engines) can also use indices to
satisfy the ORDER BY clauses in a query, in addition to expediting
lookup.  In other words, indices can be used to speed up sorting as
well as searching.







|







455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
by AND, by using an intersect operator in place of union.  Many SQL
database engines will do just that.  But the performance gain over using
just a single index is slight and so SQLite does not implement that technique
at this time.  However, a future version SQLite might be enhanced to support
AND-by-INTERSECT.
</p>

<tcl>hd_fragment sorting sorting</tcl>
<h2>2.0 Sorting</h2>

<p>
SQLite (like all other SQL database engines) can also use indices to
satisfy the ORDER BY clauses in a query, in addition to expediting
lookup.  In other words, indices can be used to speed up sorting as
well as searching.
579
580
581
582
583
584
585
586

























































































587
588
589
590
591
592
593
594
595
596
597
598
599
600
</tcl>

<p>
With a covering index, SQLite can simply walk the index from one end to the
other and deliver the output in time proportional to N and without having
allocate a large buffer to hold the result set.
</p>


























































































<h2>To Be Continued...</h2>

<p>Additional topics to discuss include:</b></p>

<p><ul>
<li>Search and sorting at the same time
<li>How to construct a good index
<li>Joins and join-order
<li>Automatic transient indices
<li>How to determine what query plan SQLite is using
<li>Automatic detection of inefficient query plans
<li>Techniques for influcing what query plan SQLite chooses
</ul>
</p>








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


|


<





|


580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681

682
683
684
685
686
687
688
689
</tcl>

<p>
With a covering index, SQLite can simply walk the index from one end to the
other and deliver the output in time proportional to N and without having
allocate a large buffer to hold the result set.
</p>

<h2>3.0 Searching And Sorting At The Same Time</h2>

<p>
The previous discussion has treated searching and sorting as separate
topics.  But in practice, it is often the case that one wants to search
and sort at the same time.  Fortunately, it is possible to do this
using a single index.
</p>

<h3>3.1 Searching And Sorting With A Multi-Column Index</h3>

<p>
Suppose we want to find the prices of all kinds of oranges sorted in
order of the state where they are grown.  The query is this:
</p>

<tcl>
code {SELECT price FROM fruitforsale WHERE fruit='Orange' ORDER BY state}
</tcl>

<p>
The query contains both a search restriction in the WHERE clause
and a sort order in the ORDER BY clause.  Both the search and the sort
can be accomplished at the same time using the two-column index Idx3.
</p>

<tcl>
figure 20 #fig20 fruitobstate0.gif {Search And Sort By Multi-Column Index}
</tcl>

<p>
The query does a binary search on the index to find the subset of rows
that have fruit='Orange'.  (Because the fruit column is the left-most column
of the index and the rows of the index are in sorted order, all such 
rows will be adjacent.)  Then it scans the matching index rows from top to
bottom to get the rowids for the original table, and for each rowid does
a binary search on the original table to find the price.
</p>

<p>
You will notice that there is no "sort" box anywhere in the above diagram.
The ORDER BY clause of the query has become a no-op.  No sorting has to be
done here because the output order is by the state column and the state
column also happens to be the first column after the fruit column in the
index.  So, if we scan entries of the index that have the same value for
the fruit column from top to bottom, those index entries are guaranteed to
be ordered by the state column.
</p>

<tcl>hd_fragment {srchsortcovidx}</tcl>
<h3>3.2 Searching And Sorting With A Covering Index</h3>

<p>
A [covering index] can also be used to search and sort at the same time.
Consider the following:
</p>

<tcl>
code {SELECT * FROM fruitforsale WHERE fruit='Orange' ORDER BY state}
figure 21 #fig21 fruitobstate.gif {Search And Sort By Covering Index}
</tcl>

<p>
As before, SQLite does single binary search
for the range of rows in the covering
index that satisfy the WHERE clause, the scans that range from top to 
bottom to get the desired results.  
The rows that satisfy the WHERE clause are guaranteed to be adjacent
since the WHERE clause is an equality constraint on the left-most
column of the index.  And by scanning the matching index rows from
top to bottom, the output is guaranteed to be order by state since the
state column is the very next column to the right of the fruit column.
And so the resulting query is very efficient.
</p>

<p>
SQLite can pull a similar trick for a descending ORDER BY:
</p>

<tcl>
code {SELECT * FROM fruitforsale WHERE fruit='Orange' ORDER BY state DESC}
</tcl>

<p>
The same basic algorithm is followed, except this time the matching rows
of the index are scanned from bottom to top instead of from top to bottom,
so that the states will appear in descending order.
</p>

<h2>To Be Continued...</h2>

<p>Further topics:</p>

<p><ul>

<li>How to construct a good index
<li>Joins and join-order
<li>Automatic transient indices
<li>How to determine what query plan SQLite is using
<li>Automatic detection of inefficient query plans
<li>Techniques for influencing what query plan SQLite chooses
</ul>
</p>
Changes to pages/rtree.in.
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
command-line.
</p>

<h1>3.0 Using the R*Tree Module</h1>

<p>
The SQLite R*Tree module is implemented as a
[sqlite3_create_module | virtual table].  Each R*Tree index is a
virtual table with an odd number of columns between 3 and 11.
The first column is always a 64-bit signed integer primary key.
The other columns are minimum- and maximum-value pairs (stored as
32-bit floating point numbers) for each
dimension.  A 1-dimensional R*Tree thus has 3 columns.  
A 2-dimensional R*Tree (the most common case) has 5 columns.
A 5-dimensional R*Tree has 11 columns.  The SQLite R*Tree implementation
does not support R*Trees wider than 5 dimensions.
</p>

<p>
The first column of an SQLite R*Tree must always be an integer
primary key.
The min/max-value pair columns are always stored as
32-bit floating point values.  Unlike regular SQLite tables which
can store data in a variety of datatypes and formats, the R*Tree
indices rigidly enforce these two storage types.  Attempts to insert
something other than an integer into the first column, or something
other than a floating point value into the other columns, will result
in an error.
</p>

<h2>3.1 Creating An R*Tree Index</h2>

<p>
A new R*Tree index is created as follows:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE <em>&lt;name&gt;</em> USING rtree(<em>&lt;column-names&gt;</em>);
</pre></blockquote>

<p>
The <em>&lt;name&gt;</em> is the name your application chooses for the
R*Tree index and <em>&lt;column-names&gt;</em> is a comma separated list
of between 3 and 11 columns.
The virtual &lt;name&gt; table creates three "shadow" tables to actually
store its content.  The names of these shadow tables are:
</p>

<blockquote>
<em>&lt;name&gt;</em><strong>_node</strong><br>
<em>&lt;name&gt;</em><strong>_rowid</strong><br>
<em>&lt;name&gt;</em><strong>_parent</strong>
</blockquote>

<p>
The shadow tables are ordinary SQLite data tables.  You can query them 
directly if you like, though this unlikely to to reveal anything particularly
useful. 
And you can [UPDATE], [DELETE], [INSERT] or even [DROP TABLE | DROP] 
the shadow tables, though doing so will corrupt your R*Tree index.
So it is best to simply ignore the shadow tables.  Recognize that they
are there to hold your R*Tree index information and let it go as that.
</p>

<p>
As an example, consider creating a two-dimensional R*Tree index for use in 
spatial queries:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE demo_index USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY       -- Minimum and maximum Y coordinate
);
</pre></blockquote>

<h2>3.2 Populating An R*Tree Index</h2>

<p>
The usual [INSERT], [UPDATE], and [DELETE] commands work on an R*Tree
index just like on regular tables.  So to insert some data into our sample
R*Tree index, we can do something like this:
</p>

<blockquote><pre>
INSERT INTO demo_index VALUES(
    1,                   -- Primary key
    -80.7749, -80.7747,  -- Longitude range
    30.3776, 30.3778     -- Latitude range
);
INSERT INTO demo_index VALUES(
    2,
    -81.0, -79.6,
    35.0, 36.2
);
</pre></blockquote>

<p>
The entries above might represent (for example) a bounding box around
the offices for SQLite.org and bounding box around the
12th Congressional District of North Carolina in which SQLite.org is located.  
</p>

<h2>3.3 Querying An R*Tree Index</h2>

<p>
Any valid query will work against an R*Tree index.  But the R*Tree
implementation is designed to make two kinds of queries especially
efficient.  First, queries against the primary key are efficient:
</p>

<blockquote><pre>
SELECT * FROM demo_index WHERE id=1;
</pre></blockquote>

<p>
Of course, an ordinary SQLite table will do a query against its
integer primary key efficiently, so the previous is no big deal.
The real reason for using an R*Tree in the first place is so that
you can efficiently do inequality queries against the coordinate
ranges.  To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE minX>=-81.08 AND maxX<=-80.58
   AND minY>=35.00  AND maxY<=35.44;
</pre></blockquote>

<p>
The query above would very quickly locate id of 1 even if the
R*Tree contained millions of entries.  The previous is an example
of a "contained-within" query.  The R*Tree also supports "overlapping"
queries.  For example, to find all bounding boxes that overlap the
Charlotte area:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxX>=-81.08 AND minX<=-80.58
   AND maxY>=35.00  AND minY<=35.44;
</pre></blockquote>

<p>
This second query would find both entry 1 (the SQLite.org office) which
is entirely contained within the query box and also
Mel Watt's Congressional District which extends well outside the
query box but still overlaps the query box.
</p>

<p>
Note that is not necessary for all coordinates in an R*Tree index
to be constrained in order for the index search to be efficient.
One might, for example, want to query all objects that overlap with
the 35th parallel:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxY>=35.0  AND minY<=35.0;
</pre></blockquote>

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>

<h1>4.0 Using R*Trees Effectively</h1>

<p>
The only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  For the example above, one might create an auxiliary
table as follows:
</p>

<blockquote><pre>
CREATE TABLE demo_data(
  id INTEGER PRIMARY KEY,  -- primary key
  objname TEXT,            -- name of the object
  objtype TEXT,            -- object type
  boundary BLOB            -- detailed boundary of object
);
</pre></blockquote>

<p>
In this example, the demo_data.boundary field is intended to hold some
kind of binary representation of the precise boundaries of the object.
The R*Tree index only holds an axis-aligned rectangular boundary for the
object.  The R*Tree boundary is just an approximation of the true object
boundary.  So what typically happens is that the R*Tree index is used to







|

|
|

|
|
|




|

|
|

|







|





|





|







|


|


|






|









|




|
|














|










|

|




|






|







|


|


|







|


|

|
|



|

|






|













|










|







43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
command-line.
</p>

<h1>3.0 Using the R*Tree Module</h1>

<p>
The SQLite R*Tree module is implemented as a
[sqlite3_create_module | virtual table].  ^Each R*Tree index is a
virtual table with an odd number of columns between 3 and 11.
^The first column is always a 64-bit signed integer primary key.
^The other columns are minimum- and maximum-value pairs (stored as
32-bit floating point numbers) for each
dimension.  ^A 1-dimensional R*Tree thus has 3 columns.  
^A 2-dimensional R*Tree (the most common case) has 5 columns.
^A 5-dimensional R*Tree has 11 columns.  The SQLite R*Tree implementation
does not support R*Trees wider than 5 dimensions.
</p>

<p>
^The first column of an SQLite R*Tree must always be an integer
primary key.
^The min/max-value pair columns are always stored as
32-bit floating point values.  ^Unlike regular SQLite tables which
can store data in a variety of datatypes and formats, the R*Tree
indices rigidly enforce these two storage types.  ^Attempts to insert
something other than an integer into the first column, or something
other than a floating point value into the other columns, will result
in an error.
</p>

<h2>3.1 Creating An R*Tree Index</h2>

^(<p>
A new R*Tree index is created as follows:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE <em>&lt;name&gt;</em> USING rtree(<em>&lt;column-names&gt;</em>);
</pre></blockquote>)^

<p>
The <em>&lt;name&gt;</em> is the name your application chooses for the
R*Tree index and <em>&lt;column-names&gt;</em> is a comma separated list
of between 3 and 11 columns.
^(The virtual &lt;name&gt; table creates three "shadow" tables to actually
store its content.  The names of these shadow tables are:
</p>

<blockquote>
<em>&lt;name&gt;</em><strong>_node</strong><br>
<em>&lt;name&gt;</em><strong>_rowid</strong><br>
<em>&lt;name&gt;</em><strong>_parent</strong>
</blockquote>)^

<p>
^The shadow tables are ordinary SQLite data tables.  You can query them 
directly if you like, though this unlikely to to reveal anything particularly
useful. 
^And you can [UPDATE], [DELETE], [INSERT] or even [DROP TABLE | DROP] 
the shadow tables, though doing so will corrupt your R*Tree index.
So it is best to simply ignore the shadow tables.  Recognize that they
are there to hold your R*Tree index information and let it go as that.
</p>

<p>
^(As an example, consider creating a two-dimensional R*Tree index for use in 
spatial queries:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE demo_index USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY       -- Minimum and maximum Y coordinate
);
</pre></blockquote>)^

<h2>3.2 Populating An R*Tree Index</h2>

<p>
^The usual [INSERT], [UPDATE], and [DELETE] commands work on an R*Tree
index just like on regular tables.  ^(So to insert some data into our sample
R*Tree index, we can do something like this:
</p>

<blockquote><pre>
INSERT INTO demo_index VALUES(
    1,                   -- Primary key
    -80.7749, -80.7747,  -- Longitude range
    30.3776, 30.3778     -- Latitude range
);
INSERT INTO demo_index VALUES(
    2,
    -81.0, -79.6,
    35.0, 36.2
);
</pre></blockquote>)^

<p>
The entries above might represent (for example) a bounding box around
the offices for SQLite.org and bounding box around the
12th Congressional District of North Carolina in which SQLite.org is located.  
</p>

<h2>3.3 Querying An R*Tree Index</h2>

<p>
^Any valid query will work against an R*Tree index.  But the R*Tree
implementation is designed to make two kinds of queries especially
efficient.  ^(First, queries against the primary key are efficient:
</p>

<blockquote><pre>
SELECT * FROM demo_index WHERE id=1;
</pre></blockquote>)^

<p>
Of course, an ordinary SQLite table will do a query against its
integer primary key efficiently, so the previous is no big deal.
The real reason for using an R*Tree in the first place is so that
you can efficiently do inequality queries against the coordinate
ranges.  ^(To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE minX>=-81.08 AND maxX<=-80.58
   AND minY>=35.00  AND maxY<=35.44;
</pre></blockquote>)^

<p>
^The query above would very quickly locate the id of 1 even if the
R*Tree contained millions of entries.  The previous is an example
of a "contained-within" query.  The R*Tree also supports "overlapping"
queries.  ^(For example, to find all bounding boxes that overlap the
Charlotte area:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxX>=-81.08 AND minX<=-80.58
   AND maxY>=35.00  AND minY<=35.44;
</pre></blockquote>)^

<p>
^(This second query would find both entry 1 (the SQLite.org office) which
is entirely contained within the query box and also
the 12th Congressional District which extends well outside the
query box but still overlaps the query box.)^
</p>

<p>
^Note that is not necessary for all coordinates in an R*Tree index
to be constrained in order for the index search to be efficient.
^(One might, for example, want to query all objects that overlap with
the 35th parallel:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxY>=35.0  AND minY<=35.0;
</pre></blockquote>)^

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>

<h1>4.0 Using R*Trees Effectively</h1>

<p>
The only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  ^(For the example above, one might create an auxiliary
table as follows:
</p>

<blockquote><pre>
CREATE TABLE demo_data(
  id INTEGER PRIMARY KEY,  -- primary key
  objname TEXT,            -- name of the object
  objtype TEXT,            -- object type
  boundary BLOB            -- detailed boundary of object
);
</pre></blockquote>)^

<p>
In this example, the demo_data.boundary field is intended to hold some
kind of binary representation of the precise boundaries of the object.
The R*Tree index only holds an axis-aligned rectangular boundary for the
object.  The R*Tree boundary is just an approximation of the true object
boundary.  So what typically happens is that the R*Tree index is used to
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
</p></blockquote>

<p>
Suppose the demo_data.boundary field holds some proprietary data description
of a complex two-dimensional boundary for an object and suppose that the
application has used the [sqlite3_create_function()] interface to
created application-defined functions "contained_in" and
"overlaps" accept two demo_data.boundary objects and return true or false.
One may assume that "contained_in" and "overlaps" are relatively slow
functions that we do not want to invoke too frequently.
Then an efficient way to find the name of all objects located within
the North Carolina 12th District, one may run a query like this:
</p>

<blockquote><pre>
SELECT objname FROM demo_data, demo_index
 WHERE demo_data.id=demo_index.id
   AND contained_in(demo_data.boundary, :boundary)
   AND minX>=-81.0 AND max<=-79.6
   AND minY>=35.0 AND maxY<=36.2;
</pre></blockquote>

<p>In the query above, one would presumably bind the binary BLOB 
description of the precise boundary of the 12th district to the
":boundary" parameter.</p>

<p>Notice how the query above works:  The R*Tree index runs in the outer
loop to find entries that are contained within the bounding box







|


|
|








|







247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
</p></blockquote>

<p>
Suppose the demo_data.boundary field holds some proprietary data description
of a complex two-dimensional boundary for an object and suppose that the
application has used the [sqlite3_create_function()] interface to
created application-defined functions "contained_in" and
"overlaps" accepting two demo_data.boundary objects and return true or false.
One may assume that "contained_in" and "overlaps" are relatively slow
functions that we do not want to invoke too frequently.
^(Then an efficient way to find the name of all objects located within
the North Carolina 12th District, one may be to run a query like this:
</p>

<blockquote><pre>
SELECT objname FROM demo_data, demo_index
 WHERE demo_data.id=demo_index.id
   AND contained_in(demo_data.boundary, :boundary)
   AND minX>=-81.0 AND max<=-79.6
   AND minY>=35.0 AND maxY<=36.2;
</pre></blockquote>)^

<p>In the query above, one would presumably bind the binary BLOB 
description of the precise boundary of the 12th district to the
":boundary" parameter.</p>

<p>Notice how the query above works:  The R*Tree index runs in the outer
loop to find entries that are contained within the bounding box