Documentation Source Text

Check-in [a007133016]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:First cut at documentation for the Geopoly module.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: a00713301688ce5921899f3653e85f45249c277f1b90edd75933c231ea125771
User & Date: drh 2018-08-27 17:15:11.158
Context
2018-08-27
20:52
Enhancements to the virtual table documentation to describe the operation of SQLITE_INDEX_CONSTRAINT_FUNCTION. (check-in: f3d8866e42 user: drh tags: trunk)
17:15
First cut at documentation for the Geopoly module. (check-in: a007133016 user: drh tags: trunk)
2018-08-25
14:56
Fix a duplicate fragment name in lang_expr.html (check-in: 8ffd68dac7 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/changes.in.
19
20
21
22
23
24
25

26
27
28
29
30
31
32
  set aChng($nChng) [list $date $desc $options]
  set xrefChng($date) $nChng
  incr nChng
}

chng {2018-09-00 (3.25.0)} {
<li> Add support for [window functions]

<li> Query optimizer improvements:
<ol type="a">
  <li> Avoid unnecessary loads of columns in an aggregate query that
       are not within an aggregate function and that are not part
       of the GROUP BY clause.
  <li> The IN-early-out optimization: When doing a look-up on a
       multi-column index and an IN operator is used on a column







>







19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  set aChng($nChng) [list $date $desc $options]
  set xrefChng($date) $nChng
  incr nChng
}

chng {2018-09-00 (3.25.0)} {
<li> Add support for [window functions]
<li> Added the [Geopoly module]
<li> Query optimizer improvements:
<ol type="a">
  <li> Avoid unnecessary loads of columns in an aggregate query that
       are not within an aggregate function and that are not part
       of the GROUP BY clause.
  <li> The IN-early-out optimization: When doing a look-up on a
       multi-column index and an IN operator is used on a column
Added pages/geopoly.in.




















































































































































































































































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
<title>The Geopoly Interface To The SQLite R*Tree Module</title>
<tcl>hd_keywords {geopoly} {Geopoly module}</tcl>
<table_of_contents>

<h1>Overview</h1>

<p>
The Geopoly module is an alternative interface to the [R-Tree extension] that uses
the [http://geojson.org | GeoJSON] notation
([https://tools.ietf.org/html/rfc7946 | RFC-7946]) to describe two-dimensional
polygons.  Geopoly includes functions for detecting when one polygons is
contained within or overlaps with another, for computing the
area contained within a polygon, for doing linear trasformations of polygons,
for rendering polygons as
[https://en.wikipedia.org/wiki/Scalable_Vector_Graphics | SVG], and other
similar operations.

<p>
Geopoly operates on "simple" polygons - that is, polygons for which
the boundary does not intersect itself.  Geopoly thus extends the capabilities
of the [R-Tree extension] which can only deal with rectangular areas.
On the other hand, the [R-Tree extension] is
able to handle between 1 and 5 coordinate dimensions, whereas Geopoly is restricted
to 2-dimensional shapes only.

<p>
Each polygon in the Geopoly module can be associated with an arbitrary
number of auxiliary data fields.

<h2>GeoJSON</h2>

<p>The [https://tools.ietf.org/html/rfc7946| GeoJSON standard] is syntax for
exchanging geospatial information using JSON.  GeoJSON is a rich standard
that can describe nearly any kind of geospatial content.

<p>The Geopoly module only understands
a small subset of GeoJSON, but a critical subset.  
In particular, GeoJSON understands
the JSON array of vertexes that describes a simple polygon.

<p>A polygon is defined by its vertexes.
Each vertex is a JSON array of two numeric values which are the
X and Y coordinates of the vertex.
A polygon is a JSON array of these vertexes, and hence is an array
of arrays.
The first and last vertex in the array must be the same.
The polygon follows the right-hand rule:  When tracing a line from
one vertex to the next, the area to the right of the line is outside
of the polygon and the area to the left is inside the polygon.
In other words, the net rotation of the vertexes is counter-clockwise.

<p>
For example, the following JSON describes an isosceles triangle, sitting
on the X access and with an area of 0.5:

<codeblock>
&#91;&#91;0,0],&#91;1,0],&#91;0.5,1],&#91;0,0]]
</codeblock>

<p>
A triangle has three vertexes, but the GeoJSON description of the triangle
has 4 vertexes because the first and last vertex are duplicates.

<h2>Binary storage format</h2>

<p>
Internally, Geopoly stores polygons in a binary format - an SQL BLOB.
Details of the binary format are given below.
All of the Geopoly interfaces are able to accept polygons in either the
GeoJSON format or in the binary format.

<h1>Using The Geopoly Extension</h1>

<p>
A geopoly table is created as follows:

<codeblock>
CREATE VIRTUAL TABLE newtab USING geopoly(a,b,c);
</codeblock>

<p>
The statement above creates a new geopoly table named "newtab".
Every geopoly table contains a built-in integer "rowid" column
and a "_shape" column that contains
the polygon associated with that row of the table.
The example above also defines three auxiliary data columns 
named "a", "b", and "c" that can store whatever additional
information the application needs to associate
with each polygon.  If there is no need to store auxiliary
information, the list of auxiliary columns can be omitted.

<p>
Store new polygons in the table using ordinary INSERT statements:

<codeblock>
INSERT INTO newtab(_shape) VALUES('&#91;&#91;0,0],&#91;1,0],&#91;0.5,1],&#91;0,0]]');
</codeblock>

<p>
UPDATE and DELETE statements work similarly.

<h2>Queries</h2>

<p>
To query the geopoly table using an indexed geospatial search, 
use one of the functions geopoly_overlap()
or geopoly_within() as a boolean function in the WHERE clause,
with the "_shape" column as the first argument to the function.
For example:

<codeblock>
SELECT * FROM newtab WHERE geopoly_overlap(_shape, $query_polygon);
</codeblock>

<p>
The previous example will return every row for which the _shape
overlaps the polygon in the $query_polygon parameter.  The
geopoly_within() function works similarly, but only returns rows for
which the _shape is completely contained within $query_polygon.

<p>
Querys (and also DELETE and UPDATE statements) in which the WHERE
clause contains a bare geopoly_overlap() or geopoly_within() function
make use of the underly R*Tree data structures for a fast lookup that
only has to examine a subset of the rows in the table.  The number of
rows examines depends, of course, on the size of the $query_polygon.
Large $query_polygons will normally need to look at more rows than small
ones.

<p>
Queries against the rowid of a geopoly table are also very quick, even
for tables with a vast number of rows.
However, none of the auxiliary data columns are indexes, and so queries
against the auxiliary data columns will involve a full table scan.

<h1>Special Functions</h1>

<p>
The geopoly module defines several new SQL functions that are useful for
dealing with polygons.  All polygon arguments to these functions can be
either the GeoJSON format or the internal binary format.

<h2>The geopoly_overlap(P1,P2) Function</h2>

<p>
If P1 and P2 are both polygons, then the geopoly_overlap(P1,P2) function returns
true if there is any overlap between P1 and P2, or it returns false if P1 and P2
completely disjoint.
If either P1 or P2 is not a polygon, this routine returns NULL.

<p>
The geopoly_overlap(P1,p2) function is special in that the geopoly virtual
table knows how to use R*Tree indexes to optimize queries in which the 
WHERE clause uses geopoly_overlap() as a boolean function.  Only the
geopoly_overlap(P1,P2) and geopoly_within(P1,P2) functions have this
capability.

<h2>The geopoly_within(P1,P2) Function</h2>

<p>
If P1 and P2 are both polygons, then the geopoly_within(P1,P2) function returns
true if P2 is completely contained within P1, or it returns false if any part of
P2 is outside of P1.  If P1 and P2 are the same polygon, this routine returns true.
If either P1 or P2 is not a polygon, this routine returns NULL.

<p>
The geopoly_within(P1,p2) function is special in that the geopoly virtual
table knows how to use R*Tree indexes to optimize queries in which the 
WHERE clause uses geopoly_within() as a boolean function.  Only the
geopoly_within(P1,P2) and geopoly_overlap(P1,P2) functions have this
capability.

<h2>The geopoly_area(P) Function</h2>

<p>
If P is a polygon, then geopoly_area(P) returns the area enclosed by
that polygon.  If P is not a polygon, geopoly_area(P) returns NULL.

<h2>The geopoly_blob(P) Function</h2>

<p>
If P is a polygon, then geopoly_blob(P) returns the binary encoding
of that polygon as a BLOB.
If P is not a polygon, geopoly_blob(P) returns NULL.

<h2>The geopoly_json(P) Function</h2>

<p>
If P is a polygon, then geopoly_json(P) returns the GeoJSON representation
of that polygon as a TEPT string.
If P is not a polygon, geopoly_json(P) returns NULL.

<h2>The geopoly_svg(P,...) Function</h2>

<p>
If P is a polygon, then geopoly_svg(P,...) returns a text string which is a
[https://en.wikipedia.org/wiki/Scalable_Vector_Graphics|Scalable Vector Graphics (SVG)]
representation of that polygon.  If there is more one argument, then second
and subsequent arguments are added as attributes to each SVG glyph.  For example:

<codeblock>
SELECT geopoly_svg($polygon,'class="poly"','style="fill:blue;"');
</codeblock>

<p>
If P is not a polygon, geopoly_svg(P,...) returns NULL.

<p>
Note that geopoly uses a traditional right-handed cartesian coordinate system
with the origin at the lower left, whereas SVG uses a left-handed coordinate
system with the origin at the upper left.  The geopoly_svg() routine makes no
attempt to transform the coordinate system, so the displayed images are shown
in mirror image and rotated.  If that is undesirable, the geopoly_xform() routine
can be used to transform the output from cartesian to SVG coordinates prior to
passing the polygons into geopoly_svg().

<h2>The geopoly_bbox(P) Function</h2>

<p>
If P is a polygon, then geopoly_bbox(P) returns the a new polygon that is
the smallest rectangle completely enclosing P.
If P is not a polygon, geopoly_bbox(P) returns NULL.

<h2>The geopoly_contains_point(P,X,Y) Function</h2>

<p>
If P is a polygon, then geopoly_contains_point(P,X,Y) returns true if and only
if the coordinate X,Y is inside or on the boundary of the polygon P.
If P is not a polygon, geopoly_contains_point(P,X,Y) returns NULL.

<h2>The geopoly_xform(P,A,B,C,D,E,F) Function</h2>

<p>
The geopoly_xform(P,A,B,C,D,E,F) returns a new polygon that is a
linear transformation of the polygon P and where the transformation
is defined by values A,B,C,D,E,F. If P is not a valid polygon, this
routine returns NULL.

<p>
The transformation converts each vertex of the polygon according to the
following formula:

<codeblock>
x1 = A*x0 + B*y0 + E
y1 = C*x0 + D*y0 + F
</codeblock>

<p>
So, for example, to move a polygon by some amount DX, DY without changing
its shape, use:

<codeblock>
geopoly_xform($polygon, 1, 0, 0, 1, $DX, $DY)
</codeblock>

<p>
To rotate a polygon by R radians around the point 0, 0:

<codeblock>
geopoly_xform($polygon, cos($R), sin($R), -sin($R), cos($R), 0, 0)
</codeblock>

<h1>Implementation Details</h1>

<p>The geopoly module is an extension to the [R-Tree extension].  Geopoly
uses the same underlying logic and shadow tables as the [R-Tree extension].
Geopoly merely presents a different interface, and provides some extra logic
to compute polygon decoding, overlap, and containment.

<h2>Binary Encoding of Polygons</h2>

<p>
Geopoly stores all polygons internally using a binary format.  A binary
polygon consists of a 4-byte header following by an array coordinate
pairs which each dimension of each coordinate is of 32-byte floating point
number.

<p>
The first byte of the header is a flag byte.  The least significant bit
of the flag byte determines whether the coordinate pairs the follow the
header are stored big-endian or little-endian.  A value of 0 for the least
significant bit means big-endian and a value of 1 means little endian.
Other bits of the first byte in the header are reserved for future expansion.

<p>
The next three bytes in the header record the number of vertexes in the polygon
as a big-endian integer.  Thus there is an upper bound of about 16 million
vertexes per polygon.

<p>
Following the header is the array of coordinate pairs.  Each coordinate is
a 32-bit floating point number.  The use of 32-bit floating point values for
coordinates means that any point on the earth's surface can be mapped with
a resolution of approximately 2.5 meters.  Higher resolutions are of course
possible if the map is restricted to a single continent or country.
Note that the resolution of coordinates in the geopoly module is similar
in magnitude to daily movement of points on the earth's surface due to
tidal forces.

<p>
The list of coordinates in the binary format contains no redundancy.  
The last coordinate is not a repeat of the first as it is with GeoJSON.  
Hence, there is always one fewer coordinate pair in the binary representation of
a polygon compared to the GeoJSON representation.

<h2>Shadow Tables</h2>

<p>
The geopoly module is built on top of the [R-Tree extension] and uses the
same underlying shadow tables and algorithms.  For indexing purposes, each
polygon is represented in the shadow tables as a rectangular bounding box.
The underlying R-Tree implementation uses bounding boxes to limit the search
space.  Then the geoploy_overlap() and/or geopoly_within() routines further
refine the search to the exact answer.
Changes to pages/rtree.in.
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221

<h2>Creating An R*Tree Index</h2>

^(<p>
A new R*Tree index is created as follows:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE <em>&lt;name&gt;</em> USING rtree(<em>&lt;column-names&gt;</em>);
</pre></blockquote>)^

<p>
The <em>&lt;name&gt;</em> is the name your application chooses for the
R*Tree index and <em>&lt;column-names&gt;</em> is a comma separated list
of between 3 and 11 columns.
^(The virtual &lt;name&gt; table creates three "shadow" tables to actually
store its content.  The names of these shadow tables are:
</p>

<blockquote>
<em>&lt;name&gt;</em><strong>_node</strong><br>
<em>&lt;name&gt;</em><strong>_rowid</strong><br>
<em>&lt;name&gt;</em><strong>_parent</strong>
</blockquote>)^

<p>
^The shadow tables are ordinary SQLite data tables.  You can query them 
directly if you like, though this unlikely to reveal anything particularly
useful. 
^And you can [UPDATE], [DELETE], [INSERT] or even [DROP TABLE | DROP] 
the shadow tables, though doing so will corrupt your R*Tree index.
So it is best to simply ignore the shadow tables.  Recognize that they
hold your R*Tree index information and let it go as that.
</p>

<p>
^(As an example, consider creating a two-dimensional R*Tree index for use in 
spatial queries:
</p>

<blockquote><pre>
CREATE VIRTUAL TABLE demo_index USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY       -- Minimum and maximum Y coordinate
);
</pre></blockquote>)^

<h2>Populating An R*Tree Index</h2>

<p>
^The usual [INSERT], [UPDATE], and [DELETE] commands work on an R*Tree
index just like on regular tables.  ^(So to insert some data into our sample
R*Tree index, we can do something like this:
</p>

<blockquote><pre>
INSERT INTO demo_index VALUES(
    1,                   -- Primary key -- SQLite.org headquarters
    -80.7749, -80.7747,  -- Longitude range
    35.3776, 35.3778     -- Latitude range
);
INSERT INTO demo_index VALUES(
    2,                   -- NC 12th Congressional District in 2010
    -81.0, -79.6,
    35.0, 36.2
);
</pre></blockquote>)^

<p>
The entries above might represent (for example) a bounding box around
the main office for SQLite.org and bounding box around the
12th Congressional District of North Carolina (prior to the 2011
redistricting) in which SQLite.org was located.  
</p>

<h2>Querying An R*Tree Index</h2>

<p>
^Any valid query will work against an R*Tree index.  But the R*Tree
implementation is designed to make two kinds of queries especially
efficient.  ^(First, queries against the primary key are efficient:
</p>

<blockquote><pre>
SELECT * FROM demo_index WHERE id=1;
</pre></blockquote>)^

<p>
Of course, an ordinary SQLite table will also do a query against its
integer primary key efficiently, so the previous is no big deal.
The real reason for using an R*Tree is so that
you can efficiently do inequality queries against the coordinate
ranges.  ^(To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE minX&gt;=-81.08 AND maxX&lt;=-80.58
   AND minY&gt;=35.00  AND maxY&lt;=35.44;
</pre></blockquote>)^

<p>
^The query above would very quickly locate the id of 1 even if the
R*Tree contained millions of entries.  The previous is an example
of a "contained-within" query.  The R*Tree also supports "overlapping"
queries.  ^(For example, to find all bounding boxes that overlap the
Charlotte area:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxX&gt;=-81.08 AND minX&lt;=-80.58
   AND maxY&gt;=35.00  AND minY&lt;=35.44;
</pre></blockquote>)^

<p>
^(This second query would find both entry 1 (the SQLite.org office) which
is entirely contained within the query box and also
the 12th Congressional District which extends well outside the
query box but still overlaps the query box.)^
</p>

<p>
^Note that it is not necessary for all coordinates in an R*Tree index
to be constrained in order for the index search to be efficient.
^(One might, for example, want to query all objects that overlap with
the 35th parallel:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxY&gt;=35.0  AND minY&lt;=35.0;
</pre></blockquote>)^

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>








|

|









|



|
















|





|









|










|
















|

|










|



|









|



|















|


|







80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221

<h2>Creating An R*Tree Index</h2>

^(<p>
A new R*Tree index is created as follows:
</p>

<codeblock>
CREATE VIRTUAL TABLE <em>&lt;name&gt;</em> USING rtree(<em>&lt;column-names&gt;</em>);
</codeblock>

<p>
The <em>&lt;name&gt;</em> is the name your application chooses for the
R*Tree index and <em>&lt;column-names&gt;</em> is a comma separated list
of between 3 and 11 columns.
^(The virtual &lt;name&gt; table creates three "shadow" tables to actually
store its content.  The names of these shadow tables are:
</p>

<codeblock>
<em>&lt;name&gt;</em><strong>_node</strong><br>
<em>&lt;name&gt;</em><strong>_rowid</strong><br>
<em>&lt;name&gt;</em><strong>_parent</strong>
</codeblock>

<p>
^The shadow tables are ordinary SQLite data tables.  You can query them 
directly if you like, though this unlikely to reveal anything particularly
useful. 
^And you can [UPDATE], [DELETE], [INSERT] or even [DROP TABLE | DROP] 
the shadow tables, though doing so will corrupt your R*Tree index.
So it is best to simply ignore the shadow tables.  Recognize that they
hold your R*Tree index information and let it go as that.
</p>

<p>
^(As an example, consider creating a two-dimensional R*Tree index for use in 
spatial queries:
</p>

<codeblock>
CREATE VIRTUAL TABLE demo_index USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY       -- Minimum and maximum Y coordinate
);
</codeblock>

<h2>Populating An R*Tree Index</h2>

<p>
^The usual [INSERT], [UPDATE], and [DELETE] commands work on an R*Tree
index just like on regular tables.  ^(So to insert some data into our sample
R*Tree index, we can do something like this:
</p>

<codeblock>
INSERT INTO demo_index VALUES(
    1,                   -- Primary key -- SQLite.org headquarters
    -80.7749, -80.7747,  -- Longitude range
    35.3776, 35.3778     -- Latitude range
);
INSERT INTO demo_index VALUES(
    2,                   -- NC 12th Congressional District in 2010
    -81.0, -79.6,
    35.0, 36.2
);
</codeblock>

<p>
The entries above might represent (for example) a bounding box around
the main office for SQLite.org and bounding box around the
12th Congressional District of North Carolina (prior to the 2011
redistricting) in which SQLite.org was located.  
</p>

<h2>Querying An R*Tree Index</h2>

<p>
^Any valid query will work against an R*Tree index.  But the R*Tree
implementation is designed to make two kinds of queries especially
efficient.  ^(First, queries against the primary key are efficient:
</p>

<codeblock>
SELECT * FROM demo_index WHERE id=1;
</codeblock>

<p>
Of course, an ordinary SQLite table will also do a query against its
integer primary key efficiently, so the previous is no big deal.
The real reason for using an R*Tree is so that
you can efficiently do inequality queries against the coordinate
ranges.  ^(To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<codeblock>
SELECT id FROM demo_index
 WHERE minX&gt;=-81.08 AND maxX&lt;=-80.58
   AND minY&gt;=35.00  AND maxY&lt;=35.44;
</codeblock>

<p>
^The query above would very quickly locate the id of 1 even if the
R*Tree contained millions of entries.  The previous is an example
of a "contained-within" query.  The R*Tree also supports "overlapping"
queries.  ^(For example, to find all bounding boxes that overlap the
Charlotte area:
</p>

<codeblock>
SELECT id FROM demo_index
 WHERE maxX&gt;=-81.08 AND minX&lt;=-80.58
   AND maxY&gt;=35.00  AND minY&lt;=35.44;
</codeblock>

<p>
^(This second query would find both entry 1 (the SQLite.org office) which
is entirely contained within the query box and also
the 12th Congressional District which extends well outside the
query box but still overlaps the query box.)^
</p>

<p>
^Note that it is not necessary for all coordinates in an R*Tree index
to be constrained in order for the index search to be efficient.
^(One might, for example, want to query all objects that overlap with
the 35th parallel:
</p>

<codeblock>
SELECT id FROM demo_index
 WHERE maxY&gt;=35.0  AND minY&lt;=35.0;
</codeblock>

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>

248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
the only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  ^(For the example above, one might create an auxiliary
table as follows:
</p>

<blockquote><pre>
CREATE TABLE demo_data(
  id INTEGER PRIMARY KEY,  -- primary key
  objname TEXT,            -- name of the object
  objtype TEXT,            -- object type
  boundary BLOB            -- detailed boundary of object
);
</pre></blockquote>)^

<p>
In this example, the demo_data.boundary field is intended to hold some
kind of binary representation of the precise boundaries of the object.
The R*Tree index only holds an axis-aligned rectangular boundary for the
object.  The R*Tree boundary is just an approximation of the true object
boundary.  So what typically happens is that the R*Tree index is used to







|






|







248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
the only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  ^(For the example above, one might create an auxiliary
table as follows:
</p>

<codeblock>
CREATE TABLE demo_data(
  id INTEGER PRIMARY KEY,  -- primary key
  objname TEXT,            -- name of the object
  objtype TEXT,            -- object type
  boundary BLOB            -- detailed boundary of object
);
</codeblock>

<p>
In this example, the demo_data.boundary field is intended to hold some
kind of binary representation of the precise boundaries of the object.
The R*Tree index only holds an axis-aligned rectangular boundary for the
object.  The R*Tree boundary is just an approximation of the true object
boundary.  So what typically happens is that the R*Tree index is used to
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
One may assume that "contained_in" and "overlaps" are relatively slow
functions that we do not want to invoke too frequently.
^(Then an efficient way to find the name of all objects located within
the North Carolina 12th District, one may be to run a query like this:
</p>

<a name="diquery"></a>
<blockquote><pre>
SELECT objname FROM demo_data, demo_index
 WHERE demo_data.id=demo_index.id
   AND contained_in(demo_data.boundary, :boundary)
   AND minX&gt;=-81.0 AND maxX&lt;=-79.6
   AND minY&gt;=35.0 AND maxY&gt;=36.2;
</pre></blockquote>)^

<p>In the query above, one would presumably bind the binary BLOB 
description of the precise boundary of the 12th district to the
":boundary" parameter.</p>

<p>Notice how the query above works:  The R*Tree index runs in the outer
loop to find entries that are contained within the bounding box
of longitude -81..-79.6 and latitude 35.0..36.2. 
For each object identifier found, SQLite looks up
the corresponding entry in the demo_data table.  It then uses the boundary
field from the demo_data table as a parameter to the contained_in()
function and if that function returns true, the objname field from
the demo_data table is returned as the next row of query result.</p>

<p>One would get the same answer without the use of the R*Tree index
using the following simpler query:</p>

<blockquote><pre>
SELECT objname FROM demo_data
 WHERE contained_in(demo_data.boundary, :boundary);
</pre></blockquote>

<p>The problem with this latter query is that it must apply the
contained_in() function to millions of entries in the demo_data table.
The use of the R*Tree in the penultimate query reduces the number of
calls to contained_in() function to a small subset of the entire table.
The R*Tree index did not find the exact answer itself, it merely
limited the search space.</p>







|





|

















|


|







287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
One may assume that "contained_in" and "overlaps" are relatively slow
functions that we do not want to invoke too frequently.
^(Then an efficient way to find the name of all objects located within
the North Carolina 12th District, one may be to run a query like this:
</p>

<a name="diquery"></a>
<codeblock>
SELECT objname FROM demo_data, demo_index
 WHERE demo_data.id=demo_index.id
   AND contained_in(demo_data.boundary, :boundary)
   AND minX&gt;=-81.0 AND maxX&lt;=-79.6
   AND minY&gt;=35.0 AND maxY&gt;=36.2;
</codeblock>

<p>In the query above, one would presumably bind the binary BLOB 
description of the precise boundary of the 12th district to the
":boundary" parameter.</p>

<p>Notice how the query above works:  The R*Tree index runs in the outer
loop to find entries that are contained within the bounding box
of longitude -81..-79.6 and latitude 35.0..36.2. 
For each object identifier found, SQLite looks up
the corresponding entry in the demo_data table.  It then uses the boundary
field from the demo_data table as a parameter to the contained_in()
function and if that function returns true, the objname field from
the demo_data table is returned as the next row of query result.</p>

<p>One would get the same answer without the use of the R*Tree index
using the following simpler query:</p>

<codeblock>
SELECT objname FROM demo_data
 WHERE contained_in(demo_data.boundary, :boundary);
</codeblock>

<p>The problem with this latter query is that it must apply the
contained_in() function to millions of entries in the demo_data table.
The use of the R*Tree in the penultimate query reduces the number of
calls to contained_in() function to a small subset of the entire table.
The R*Tree index did not find the exact answer itself, it merely
limited the search space.</p>
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
<p>
Auxiliary columns are marked with a "+" symbol before the column name.
Auxiliary columns must come after all of the coordinate boundary columns.
There is a limit of no more than 100 auxiliary columns.
The following example shows an r-tree table with auxiliary columns that
is equivalent to the two tables "demo_index" and "demo_data" above:

^(<blockquote><pre>
CREATE VIRTUAL TABLE demo_index2 USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY,      -- Minimum and maximum Y coordinate
   +objname TEXT,   -- name of the object
   +objtype TEXT,   -- object type
   +boundary BLOB   -- detailed boundary of object
);
</pre></blockquote>)^

<p>
By combining location data and related information into the same
table, auxiliary columns can provide a cleaner model
and reduce the need to joins.
For example, the earlier
<a href="#diquery">join between demo_index and demo_data</a> can now
be written as a simple query, like this:

^(<blockquote><pre>
SELECT objname FROM demo_index2
 WHERE contained_in(boundary, :boundary)
   AND minX&gt;=-81.0 AND maxX&lt;=-79.6
   AND minY&gt;=35.0 AND maxY&gt;=36.2;
</pre></blockquote>)^

<tcl>hd_fragment {intrtree} {integer-valued r-trees}</tcl>
<h1>Integer-Valued R-Trees</h1>

<p>
The default virtual table ("rtree") normally stores coordinates as
single-precision (4-byte) floating point numbers.  If integer coordinates
are desired, declare the table using "rtree_i32" instead:

<blockquote><pre>
CREATE VIRTUAL TABLE intrtree USING rtree_i32(id,x0,x1,y0,y1,z0,z1);
</pre></blockquote>

<p>
An rtree_i32 stores coordinates as 32-bit signed integers.  But it still
using floating point computations internally as part of the r-tree algorithm.

<tcl>hd_fragment {customquery} {custom r-tree queries}</tcl>
<h1>Custom R-Tree Queries</h1>







|








|









|




|









|

|







339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
<p>
Auxiliary columns are marked with a "+" symbol before the column name.
Auxiliary columns must come after all of the coordinate boundary columns.
There is a limit of no more than 100 auxiliary columns.
The following example shows an r-tree table with auxiliary columns that
is equivalent to the two tables "demo_index" and "demo_data" above:

^(<codeblock>
CREATE VIRTUAL TABLE demo_index2 USING rtree(
   id,              -- Integer primary key
   minX, maxX,      -- Minimum and maximum X coordinate
   minY, maxY,      -- Minimum and maximum Y coordinate
   +objname TEXT,   -- name of the object
   +objtype TEXT,   -- object type
   +boundary BLOB   -- detailed boundary of object
);
</codeblock>)^

<p>
By combining location data and related information into the same
table, auxiliary columns can provide a cleaner model
and reduce the need to joins.
For example, the earlier
<a href="#diquery">join between demo_index and demo_data</a> can now
be written as a simple query, like this:

^(<codeblock>
SELECT objname FROM demo_index2
 WHERE contained_in(boundary, :boundary)
   AND minX&gt;=-81.0 AND maxX&lt;=-79.6
   AND minY&gt;=35.0 AND maxY&gt;=36.2;
</codeblock>)^

<tcl>hd_fragment {intrtree} {integer-valued r-trees}</tcl>
<h1>Integer-Valued R-Trees</h1>

<p>
The default virtual table ("rtree") normally stores coordinates as
single-precision (4-byte) floating point numbers.  If integer coordinates
are desired, declare the table using "rtree_i32" instead:

<codeblock>
CREATE VIRTUAL TABLE intrtree USING rtree_i32(id,x0,x1,y0,y1,z0,z1);
</codeblock>

<p>
An rtree_i32 stores coordinates as 32-bit signed integers.  But it still
using floating point computations internally as part of the r-tree algorithm.

<tcl>hd_fragment {customquery} {custom r-tree queries}</tcl>
<h1>Custom R-Tree Queries</h1>
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
subset of objects in the R*Tree that are visible from a camera positioned 
in 3-D space.

<p>Regions for custom R*Tree queries are defined by R*Tree geometry callbacks
implemented by the application and registered with SQLite via a call to one
of the following two APIs:

<blockquote><pre>
int sqlite3_rtree_query_callback(
  sqlite3 *db,
  const char *zQueryFunc,
  int (*xQueryFunc)(sqlite3_rtree_query_info*),
  void *pContext,
  void (*xDestructor)(void*)
);
int sqlite3_rtree_geometry_callback(
  sqlite3 *db,
  const char *zGeom,
  int (*xGeom)(sqlite3_rtree_geometry *, int nCoord, double *aCoord, int *pRes),
  void *pContext
);
</pre></blockquote>

<p>The sqlite3_rtree_query_callback() became available with SQLite
[version 3.8.5] ([dateof:3.8.5]) and is the preferred interface.
The sqlite3_rtree_geometry_callback() is an older and less flexible
interface that is supported for backwards compatibility.

<p>^A call to one of the above APIs creates a new SQL function named by the
second parameter (zQueryFunc or zGeom).  ^When that SQL function appears
on the right-hand side of the MATCH operator and the left-hand side of the
MATCH operator is any column in the R*Tree virtual table, then the callback 
defined by the third argument (xQueryFunc or xGeom) is invoked to determine
if a particular object or subtree overlaps the desired region.

<p>^(For example, a query like the following might be used to find all
R*Tree entries that overlap with a circle centered a 45.3,22.9 with a
radius of 5.0:

<blockquote><pre>
SELECT id FROM demo_index WHERE id MATCH circle(45.3, 22.9, 5.0)
</blockquote></pre>)^

<p>^The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or sqlite3_rtree_query_callback(),
is used to register the SQL function.  However, the newer query-style
callbacks give the application greater control over how the query proceeds.







|













|

















|







398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
subset of objects in the R*Tree that are visible from a camera positioned 
in 3-D space.

<p>Regions for custom R*Tree queries are defined by R*Tree geometry callbacks
implemented by the application and registered with SQLite via a call to one
of the following two APIs:

<codeblock>
int sqlite3_rtree_query_callback(
  sqlite3 *db,
  const char *zQueryFunc,
  int (*xQueryFunc)(sqlite3_rtree_query_info*),
  void *pContext,
  void (*xDestructor)(void*)
);
int sqlite3_rtree_geometry_callback(
  sqlite3 *db,
  const char *zGeom,
  int (*xGeom)(sqlite3_rtree_geometry *, int nCoord, double *aCoord, int *pRes),
  void *pContext
);
</codeblock>

<p>The sqlite3_rtree_query_callback() became available with SQLite
[version 3.8.5] ([dateof:3.8.5]) and is the preferred interface.
The sqlite3_rtree_geometry_callback() is an older and less flexible
interface that is supported for backwards compatibility.

<p>^A call to one of the above APIs creates a new SQL function named by the
second parameter (zQueryFunc or zGeom).  ^When that SQL function appears
on the right-hand side of the MATCH operator and the left-hand side of the
MATCH operator is any column in the R*Tree virtual table, then the callback 
defined by the third argument (xQueryFunc or xGeom) is invoked to determine
if a particular object or subtree overlaps the desired region.

<p>^(For example, a query like the following might be used to find all
R*Tree entries that overlap with a circle centered a 45.3,22.9 with a
radius of 5.0:

<codeblock>
SELECT id FROM demo_index WHERE id MATCH circle(45.3, 22.9, 5.0)
</blockquote></pre>)^

<p>^The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or sqlite3_rtree_query_callback(),
is used to register the SQL function.  However, the newer query-style
callbacks give the application greater control over how the query proceeds.
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
sqlite3_rtree_geometry
structure is used for every callback for same MATCH operator in the same
query.  ^The contents of the sqlite3_rtree_geometry
structure are initialized by SQLite but are
not subsequently modified.  The callback is free to make changes to the
pUser and xDelUser elements of the structure if desired.

<blockquote><pre>
typedef struct sqlite3_rtree_geometry sqlite3_rtree_geometry;
struct sqlite3_rtree_geometry {
  void *pContext;                 /* Copy of pContext passed to s_r_g_c() */
  int nParam;                     /* Size of array aParam */
  double *aParam;                 /* Parameters passed to SQL geom function */
  void *pUser;                    /* Callback implementation user data */
  void (*xDelUser)(void *);       /* Called by SQLite to clean up pUser */
};
</pre></blockquote>

<p>^The pContext member of the sqlite3_rtree_geometry
structure is always set to a copy of the pContext
argument passed to sqlite3_rtree_geometry_callback() when the
callback is registered. ^The aParam[] array (size nParam) contains the parameter
values passed to the SQL function on the right-hand side of the MATCH operator.
In the example "circle" query above, nParam would be set to 3 and the aParam[]







|








|







465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
sqlite3_rtree_geometry
structure is used for every callback for same MATCH operator in the same
query.  ^The contents of the sqlite3_rtree_geometry
structure are initialized by SQLite but are
not subsequently modified.  The callback is free to make changes to the
pUser and xDelUser elements of the structure if desired.

<codeblock>
typedef struct sqlite3_rtree_geometry sqlite3_rtree_geometry;
struct sqlite3_rtree_geometry {
  void *pContext;                 /* Copy of pContext passed to s_r_g_c() */
  int nParam;                     /* Size of array aParam */
  double *aParam;                 /* Parameters passed to SQL geom function */
  void *pUser;                    /* Callback implementation user data */
  void (*xDelUser)(void *);       /* Called by SQLite to clean up pUser */
};
</codeblock>

<p>^The pContext member of the sqlite3_rtree_geometry
structure is always set to a copy of the pContext
argument passed to sqlite3_rtree_geometry_callback() when the
callback is registered. ^The aParam[] array (size nParam) contains the parameter
values passed to the SQL function on the right-hand side of the MATCH operator.
In the example "circle" query above, nParam would be set to 3 and the aParam[]
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
<p>The newer xQueryFunc callback receives more information from the r-tree
query engine on each call, and it sends more information back to the query engine
before it returns.
To help keep the interface manageable, the xQueryFunc callback sends and receives
information from the query engine as fields in the
sqlite3_rtree_query_info structure:

<blockquote><pre>
struct sqlite3_rtree_query_info {
  void *pContext;                   /* pContext from when function registered */
  int nParam;                       /* Number of function parameters */
  sqlite3_rtree_dbl *aParam;        /* value of function parameters */
  void *pUser;                      /* callback can use this, if desired */
  void (*xDelUser)(void*);          /* function to free pUser */
  sqlite3_rtree_dbl *aCoord;        /* Coordinates of node or entry to check */
  unsigned int *anQueue;            /* Number of pending entries in the queue */
  int nCoord;                       /* Number of coordinates */
  int iLevel;                       /* Level of current node or entry */
  int mxLevel;                      /* The largest iLevel value in the tree */
  sqlite3_int64 iRowid;             /* Rowid for current entry */
  sqlite3_rtree_dbl rParentScore;   /* Score of parent node */
  int eParentWithin;                /* Visibility of parent node */
  int eWithin;                      /* OUT: Visiblity */
  sqlite3_rtree_dbl rScore;         /* OUT: Write the score here */
  /* The following fields are only available in 3.8.11 and later */
  sqlite3_value **apSqlParam;       /* Original SQL values of parameters */
};
</pre></blockquote>

<p>The first five fields of the sqlite3_rtree_query_info structure are identical
to the sqlite3_rtree_geometry structure, and have exactly the same meaning.
The sqlite3_rtree_query_info structure also contains nCoord and aCoord fields 
which have the same meaning as the parameter of the same name in the xGeom callback.

<p>The xQueryFunc must set the eWithin field of sqlite3_rtree_query_info to







|



















|







507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
<p>The newer xQueryFunc callback receives more information from the r-tree
query engine on each call, and it sends more information back to the query engine
before it returns.
To help keep the interface manageable, the xQueryFunc callback sends and receives
information from the query engine as fields in the
sqlite3_rtree_query_info structure:

<codeblock>
struct sqlite3_rtree_query_info {
  void *pContext;                   /* pContext from when function registered */
  int nParam;                       /* Number of function parameters */
  sqlite3_rtree_dbl *aParam;        /* value of function parameters */
  void *pUser;                      /* callback can use this, if desired */
  void (*xDelUser)(void*);          /* function to free pUser */
  sqlite3_rtree_dbl *aCoord;        /* Coordinates of node or entry to check */
  unsigned int *anQueue;            /* Number of pending entries in the queue */
  int nCoord;                       /* Number of coordinates */
  int iLevel;                       /* Level of current node or entry */
  int mxLevel;                      /* The largest iLevel value in the tree */
  sqlite3_int64 iRowid;             /* Rowid for current entry */
  sqlite3_rtree_dbl rParentScore;   /* Score of parent node */
  int eParentWithin;                /* Visibility of parent node */
  int eWithin;                      /* OUT: Visiblity */
  sqlite3_rtree_dbl rScore;         /* OUT: Write the score here */
  /* The following fields are only available in 3.8.11 and later */
  sqlite3_value **apSqlParam;       /* Original SQL values of parameters */
};
</codeblock>

<p>The first five fields of the sqlite3_rtree_query_info structure are identical
to the sqlite3_rtree_geometry structure, and have exactly the same meaning.
The sqlite3_rtree_query_info structure also contains nCoord and aCoord fields 
which have the same meaning as the parameter of the same name in the xGeom callback.

<p>The xQueryFunc must set the eWithin field of sqlite3_rtree_query_info to