Documentation Source Text

Check-in [cb9d13f931]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Convert the RTree documentation to fancy-format.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: cb9d13f9313bb61e01e8f81df58c51b150156ba4
User & Date: drh 2016-07-13 14:17:01.539
Context
2016-07-15
02:59
Fix typo in the when-to-use document. (check-in: faf1f46a56 user: drh tags: trunk)
2016-07-13
14:17
Convert the RTree documentation to fancy-format. (check-in: cb9d13f931 user: drh tags: trunk)
13:57
Add documentation on the dbhash.exe utility and an entry in the change-log. (check-in: f80207b028 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/rtree.in.
1
2
3
4
5
6
7
8
9
10
11
12
<title>The SQLite R*Tree Module</title>
<tcl>hd_keywords *rtree *RTREE {R-Tree extension} {R-Trees}</tcl>
<h1 align="center">The SQLite R*Tree Module</h1>

<h2>1.0 Overview</h2>

<p>
An [http://en.wikipedia.org/wiki/R-tree | R-Tree] is a special
index that is designed for doing range queries.  R-Trees are most commonly
used in geospatial systems where each entry is a rectangle with minimum and
maximum X and Y coordinates.  Given a query rectangle, an R-Tree is able
to quickly find all entries that are contained within the query rectangle


|

|







1
2
3
4
5
6
7
8
9
10
11
12
<title>The SQLite R*Tree Module</title>
<tcl>hd_keywords *rtree *RTREE {R-Tree extension} {R-Trees}</tcl>
<table_of_contents>

<h1>Overview</h1>

<p>
An [http://en.wikipedia.org/wiki/R-tree | R-Tree] is a special
index that is designed for doing range queries.  R-Trees are most commonly
used in geospatial systems where each entry is a rectangle with minimum and
maximum X and Y coordinates.  Given a query rectangle, an R-Tree is able
to quickly find all entries that are contained within the query rectangle
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
The implementation found in SQLite is a refinement of Guttman's original
idea, commonly called "R*Trees", that was described by
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
<em>The R*-Tree: An Efficient and Robust Access Method for Points
and Rectangles.</em> SIGMOD Conference 1990: 322-331.
</p>

<h2>2.0 Compiling The R*Tree Module</h2>

<p>
The source code to the SQLite R*Tree module is included as part
of the [amalgamation] but is disabled by default.  To enable the
R*Tree module, simply compile with the [SQLITE_ENABLE_RTREE] 
C-preprocessor macro defined.  With many compilers, this is accomplished
by adding the option "-DSQLITE_ENABLE_RTREE=1" to the compiler
command-line.
</p>

<h2>3.0 Using the R*Tree Module</h2>

<p>
The SQLite R*Tree module is implemented as a
[sqlite3_create_module | virtual table].  ^Each R*Tree index is a
virtual table with an odd number of columns between 3 and 11.
^The first column is always a 64-bit signed integer primary key.
^The other columns are pairs, one pair per dimension, containing the







|










|







29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
The implementation found in SQLite is a refinement of Guttman's original
idea, commonly called "R*Trees", that was described by
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger:
<em>The R*-Tree: An Efficient and Robust Access Method for Points
and Rectangles.</em> SIGMOD Conference 1990: 322-331.
</p>

<h1>Compiling The R*Tree Module</h1>

<p>
The source code to the SQLite R*Tree module is included as part
of the [amalgamation] but is disabled by default.  To enable the
R*Tree module, simply compile with the [SQLITE_ENABLE_RTREE] 
C-preprocessor macro defined.  With many compilers, this is accomplished
by adding the option "-DSQLITE_ENABLE_RTREE=1" to the compiler
command-line.
</p>

<h1>Using the R*Tree Module</h1>

<p>
The SQLite R*Tree module is implemented as a
[sqlite3_create_module | virtual table].  ^Each R*Tree index is a
virtual table with an odd number of columns between 3 and 11.
^The first column is always a 64-bit signed integer primary key.
^The other columns are pairs, one pair per dimension, containing the
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
"rtree" virtual tables or as 32-bit signed integers in "rtree_i32" virtual
tables.  ^Unlike regular SQLite tables which can store data in a variety of
datatypes and formats, the R*Tree rigidly enforce these storage types. 
If any other type of value is inserted into such a column, the r-tree
module silently converts it to the required type before writing the
new record to the database.

<h3>3.1 Creating An R*Tree Index</h3>

^(<p>
A new R*Tree index is created as follows:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE <em>&lt;name&gt;</em> USING rtree(<em>&lt;column-names&gt;</em>);







|







74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
"rtree" virtual tables or as 32-bit signed integers in "rtree_i32" virtual
tables.  ^Unlike regular SQLite tables which can store data in a variety of
datatypes and formats, the R*Tree rigidly enforce these storage types. 
If any other type of value is inserted into such a column, the r-tree
module silently converts it to the required type before writing the
new record to the database.

<h2>Creating An R*Tree Index</h2>

^(<p>
A new R*Tree index is created as follows:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE <em>&lt;name&gt;</em> USING rtree(<em>&lt;column-names&gt;</em>);
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
CREATE VIRTUAL TABLE demo_index USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY       -- Minimum and maximum Y coordinate
);
</pre></blockquote>)^

<h3>3.2 Populating An R*Tree Index</h3>

<p>
^The usual [INSERT], [UPDATE], and [DELETE] commands work on an R*Tree
index just like on regular tables.  ^(So to insert some data into our sample
R*Tree index, we can do something like this:
</p>








|







121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
CREATE VIRTUAL TABLE demo_index USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY       -- Minimum and maximum Y coordinate
);
</pre></blockquote>)^

<h2>Populating An R*Tree Index</h2>

<p>
^The usual [INSERT], [UPDATE], and [DELETE] commands work on an R*Tree
index just like on regular tables.  ^(So to insert some data into our sample
R*Tree index, we can do something like this:
</p>

149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
<p>
The entries above might represent (for example) a bounding box around
the main office for SQLite.org and bounding box around the
12th Congressional District of North Carolina (prior to the 2011
redistricting) in which SQLite.org was located.  
</p>

<h3>3.3 Querying An R*Tree Index</h3>

<p>
^Any valid query will work against an R*Tree index.  But the R*Tree
implementation is designed to make two kinds of queries especially
efficient.  ^(First, queries against the primary key are efficient:
</p>








|







149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
<p>
The entries above might represent (for example) a bounding box around
the main office for SQLite.org and bounding box around the
12th Congressional District of North Carolina (prior to the 2011
redistricting) in which SQLite.org was located.  
</p>

<h2>Querying An R*Tree Index</h2>

<p>
^Any valid query will work against an R*Tree index.  But the R*Tree
implementation is designed to make two kinds of queries especially
efficient.  ^(First, queries against the primary key are efficient:
</p>

215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>

<h3>3.3 Roundoff Error</h3>

<p>
By default, coordinates are stored in an R*Tree using 32-bit floating
point values.  When a coordinate cannot be exactly represented by a
32-bit floating point number, the lower-bound coordinates are rounded down
and the upper-bound coordinates are rounded up.  Thus, bounding boxes might
be slightly larger than specified, but will never be any smaller.  This







|







215
216
217
218
219
220
221
222
223
224
225
226
227
228
229

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>

<h2>Roundoff Error</h2>

<p>
By default, coordinates are stored in an R*Tree using 32-bit floating
point values.  When a coordinate cannot be exactly represented by a
32-bit floating point number, the lower-bound coordinates are rounded down
and the upper-bound coordinates are rounded up.  Thus, bounding boxes might
be slightly larger than specified, but will never be any smaller.  This
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
<p>However, for a "contained-within" style query, rounding the bounding
boxes outward might cause some entries to be excluded from the result set
if the edge of the entry bounding box corresponds to the edge of the query
bounding box.  To guard against this, applications should expand their
contained-within query boxes slightly (by 0.000012%) by rounding down the
lower coordinates and rounding up the top coordinates, in each dimension.

<h2>4.0 Using R*Trees Effectively</h2>

<p>
The only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  ^(For the example above, one might create an auxiliary
table as follows:







|







237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
<p>However, for a "contained-within" style query, rounding the bounding
boxes outward might cause some entries to be excluded from the result set
if the edge of the entry bounding box corresponds to the edge of the query
bounding box.  To guard against this, applications should expand their
contained-within query boxes slightly (by 0.000012%) by rounding down the
lower coordinates and rounding up the top coordinates, in each dimension.

<h1>Using R*Trees Effectively</h1>

<p>
The only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  ^(For the example above, one might create an auxiliary
table as follows:
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
contained_in() function to millions of entries in the demo_data table.
The use of the R*Tree in the penultimate query reduces the number of
calls to contained_in() function to a small subset of the entire table.
The R*Tree index did not find the exact answer itself, it merely
limited the search space.</p>

<tcl>hd_fragment {intrtree} {integer-valued r-trees}</tcl>
<h2>5.0 Integer-Valued R-Trees</h2>

<p>
The default virtual table ("rtree") normally stores coordinates as
single-precision (4-byte) floating point numbers.  If integer coordinates
are desired, declare the table using "rtree_i32" instead:

<blockquote><pre>







|







322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
contained_in() function to millions of entries in the demo_data table.
The use of the R*Tree in the penultimate query reduces the number of
calls to contained_in() function to a small subset of the entire table.
The R*Tree index did not find the exact answer itself, it merely
limited the search space.</p>

<tcl>hd_fragment {intrtree} {integer-valued r-trees}</tcl>
<h1>Integer-Valued R-Trees</h1>

<p>
The default virtual table ("rtree") normally stores coordinates as
single-precision (4-byte) floating point numbers.  If integer coordinates
are desired, declare the table using "rtree_i32" instead:

<blockquote><pre>
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
it might be desirable to have a pure integer implementation of r-trees.
This is accomplished by compiling with the [SQLITE_RTREE_INT_ONLY] option.
When SQLITE_RTREE_INT_ONLY is used, both the "rtree" and the "rtree_i32"
virtual tables store 32-bit integers and only integer values are used for
internal computations.

<tcl>hd_fragment {customquery} {custom r-tree queries}</tcl>
<h2>6.0 Custom R-Tree Queries</h2>

<p>By using standard SQL expressions in the WHERE clause of a SELECT query,
a programmer can query for all R*Tree entries that have a spatial relationship
(they intersect with or are contained within) a particular bounding-box.
Custom R*Tree queries, using the MATCH
operator in the WHERE clause of a SELECT, allow the programmer to query for 
the set of R*Tree entries that intersect any arbitrary region or shape, not 







|







344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
it might be desirable to have a pure integer implementation of r-trees.
This is accomplished by compiling with the [SQLITE_RTREE_INT_ONLY] option.
When SQLITE_RTREE_INT_ONLY is used, both the "rtree" and the "rtree_i32"
virtual tables store 32-bit integers and only integer values are used for
internal computations.

<tcl>hd_fragment {customquery} {custom r-tree queries}</tcl>
<h1>Custom R-Tree Queries</h1>

<p>By using standard SQL expressions in the WHERE clause of a SELECT query,
a programmer can query for all R*Tree entries that have a spatial relationship
(they intersect with or are contained within) a particular bounding-box.
Custom R*Tree queries, using the MATCH
operator in the WHERE clause of a SELECT, allow the programmer to query for 
the set of R*Tree entries that intersect any arbitrary region or shape, not 
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
</blockquote></pre>)^

<p>^The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or sqlite3_rtree_query_callback(),
is used to register the SQL function.  However, the newer query-style
callbacks give the application greater control over how the query proceeds.

<h3>6.1 The Legacy xGeom Callback</h3>

<p>^The legacy xGeom callback is invoked with four arguments.  ^The first
argument is a pointer to an sqlite3_rtree_geometry structure which provides
information about how the SQL function was invoked.  ^The second argument
is the number of coordinates in each r-tree entry, and is always the same
for any given R*Tree.  ^The number of coordinates is 2 for a 1-dimensional R*Tree,
4 for a 2-dimensional R*Tree, 6 for a 3-dimensional R*Tree, and so forth.







|







401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
</blockquote></pre>)^

<p>^The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or sqlite3_rtree_query_callback(),
is used to register the SQL function.  However, the newer query-style
callbacks give the application greater control over how the query proceeds.

<h2>The Legacy xGeom Callback</h2>

<p>^The legacy xGeom callback is invoked with four arguments.  ^The first
argument is a pointer to an sqlite3_rtree_geometry structure which provides
information about how the SQL function was invoked.  ^The second argument
is the number of coordinates in each r-tree entry, and is always the same
for any given R*Tree.  ^The number of coordinates is 2 for a 1-dimensional R*Tree,
4 for a 2-dimensional R*Tree, 6 for a 3-dimensional R*Tree, and so forth.
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
value of the pUser variable as the only argument. In other words, xDelUser
may be set to a destructor function for the pUser value.

<p>^The xGeom callback always does a depth-first search of the r-tree.

<tcl>hd_fragment xquery {xQueryFunc R*Tree callback} {sqlite3_rtree_query_callback}
</tcl>
<h3>6.2 The New xQueryFunc Callback</h3>

<p>The newer xQueryFunc callback receives more information from the r-tree
query engine on each call, and it sends more information back to the query engine
before it returns.
To help keep the interface manageable, the xQueryFunc callback sends and receives
information from the query engine as fields in the
sqlite3_rtree_query_info structure:







|







460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
value of the pUser variable as the only argument. In other words, xDelUser
may be set to a destructor function for the pUser value.

<p>^The xGeom callback always does a depth-first search of the r-tree.

<tcl>hd_fragment xquery {xQueryFunc R*Tree callback} {sqlite3_rtree_query_callback}
</tcl>
<h2>The New xQueryFunc Callback</h2>

<p>The newer xQueryFunc callback receives more information from the r-tree
query engine on each call, and it sends more information back to the query engine
before it returns.
To help keep the interface manageable, the xQueryFunc callback sends and receives
information from the query engine as fields in the
sqlite3_rtree_query_info structure:
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
is the rowid (the first of the 3 to 11 columns in the R*Tree) for the element
being considered.  iRowid is only valid for leaves.)^  ^The eParentWithin and
rParentScore values are copies of the eWithin and rScore values from the
containing subtree of the current row.  ^The anQueue field is an array
of mxLevel+1 unsigned integers that tell the current number of elements in
the priority queue at each level.

<h3>6.3 Additional Considerations for Custom Queries</h3>

<p>
^The MATCH operator of a custom R*Tree query function must be a top-level
AND-connected term of the WHERE clause, or else it will not be usable
by the R*Tree query optimizer and the query will not be runnable.
^If the MATCH operator is connected to other terms of the WHERE clause 
via an OR operator, for example, the query will fail with an error.







|







552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
is the rowid (the first of the 3 to 11 columns in the R*Tree) for the element
being considered.  iRowid is only valid for leaves.)^  ^The eParentWithin and
rParentScore values are copies of the eWithin and rScore values from the
containing subtree of the current row.  ^The anQueue field is an array
of mxLevel+1 unsigned integers that tell the current number of elements in
the priority queue at each level.

<h2>Additional Considerations for Custom Queries</h2>

<p>
^The MATCH operator of a custom R*Tree query function must be a top-level
AND-connected term of the WHERE clause, or else it will not be usable
by the R*Tree query optimizer and the query will not be runnable.
^If the MATCH operator is connected to other terms of the WHERE clause 
via an OR operator, for example, the query will fail with an error.