Documentation Source Text

Check-in [010e1a52d7]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Additional work on the R-Tree documentation.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 010e1a52d7a452b87be8a26708b4695180c1f0f5
User & Date: drh 2008-06-27 18:56:49
Context
2008-06-28
13:31
Copy the date+time function documentation out of the wiki. check-in: 5178724f27 user: drh tags: trunk
2008-06-27
18:56
Additional work on the R-Tree documentation. check-in: 010e1a52d7 user: drh tags: trunk
17:20
Initial check-in of the rtree documentation. Very incomplete. check-in: a7b39daebb user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/35to36.in.

257
258
259
260
261
262
263
264
265
266
267
268
269

  <li><p>The [sqlite3_next_stmt()] interface allows an application to discover
  all [prepared statements] associated with a [database connection].</p></li>

  <li><p>Added the [page_count] PRAGMA for returning the size of the underlying
  database file in pages.</p></li>

  <li><p>Added a new <a href="http://en.wikipedia.org/wiki/R%2a_tree">R*Tree</a>
  virtual table.</p></li>

  </ol>
}
</tcl>







|
<




257
258
259
260
261
262
263
264

265
266
267
268

  <li><p>The [sqlite3_next_stmt()] interface allows an application to discover
  all [prepared statements] associated with a [database connection].</p></li>

  <li><p>Added the [page_count] PRAGMA for returning the size of the underlying
  database file in pages.</p></li>

  <li><p>Added a new [rtree | R*Tree index extension].</p></li>


  </ol>
}
</tcl>

Changes to pages/amalgamation.in.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<title>The SQLite Amalgamation</title>
<tcl>hd_keywords {amalgamation} {the amalgamation}</tcl>

<h2>The SQLite Amalgamation</h2>

<p>The core SQLite library consists of about 76 files of C code
(as of version 3.6.0) in the core with 11 additional files
in the FTS3 and RTREE extensions.
Most of these are "source" files in the sense that they are stored 
in the configuration management system and are maintained directly. 
But 6 of the core C files are generated automatically during the 
compilation process.  Of the 87 code files, 66 are C code and 
21 are C header files.</p>

<p>The standard makefiles for SQLite have a target for building
an object we call the "amalgamation".  
The amalgamation is a single C code file, named "sqlite3.c",
that contains all C code 
for the core SQLite library and the FTS3 and RTREE extensions.
This file contains about 91K lines of code 
(55K if you omit blank lines and comments) and is over 3 megabytes
in size.</p>

<p>The amalgamation contains everything you need to integrate SQLite 
into a larger project.  Just copy the amalgamation into your source 
directory and compile it along with the other C code files in your project.







|










|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<title>The SQLite Amalgamation</title>
<tcl>hd_keywords {amalgamation} {the amalgamation}</tcl>

<h2>The SQLite Amalgamation</h2>

<p>The core SQLite library consists of about 76 files of C code
(as of version 3.6.0) in the core with 11 additional files
in the FTS3 and [RTREE] extensions.
Most of these are "source" files in the sense that they are stored 
in the configuration management system and are maintained directly. 
But 6 of the core C files are generated automatically during the 
compilation process.  Of the 87 code files, 66 are C code and 
21 are C header files.</p>

<p>The standard makefiles for SQLite have a target for building
an object we call the "amalgamation".  
The amalgamation is a single C code file, named "sqlite3.c",
that contains all C code 
for the core SQLite library and the FTS3 and [RTREE] extensions.
This file contains about 91K lines of code 
(55K if you omit blank lines and comments) and is over 3 megabytes
in size.</p>

<p>The amalgamation contains everything you need to integrate SQLite 
into a larger project.  Just copy the amalgamation into your source 
directory and compile it along with the other C code files in your project.

Changes to pages/changes.in.

64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
<li>Added the [sqlite3_config()] interface for doing run-time configuration
    of the entire SQLite library.</li>
<li>Added the [sqlite3_status()] interface used for querying run-time status
    information about the overall SQLite library and its subsystems.</li>
<li>Added the [sqlite3_initialize()] and [sqlite3_shutdown()] interfaces.</li>
<li>Added the [PRAGMA page_count] command.</li>
<li>Added the [sqlite3_next_stmt()] interface.</li>
<li>Added a new R*Tree virtual table</li>
}

chng {2008 May 14 (3.5.9)} {
<li>Added <em>experimental</em>
    support for the [journal_mode] PRAGMA and persistent journal.</li>
<li>[journal_mode | Journal mode PERSIST] is the default behavior in
    [locking_mode | exclusive locking mode].</li>







|







64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
<li>Added the [sqlite3_config()] interface for doing run-time configuration
    of the entire SQLite library.</li>
<li>Added the [sqlite3_status()] interface used for querying run-time status
    information about the overall SQLite library and its subsystems.</li>
<li>Added the [sqlite3_initialize()] and [sqlite3_shutdown()] interfaces.</li>
<li>Added the [PRAGMA page_count] command.</li>
<li>Added the [sqlite3_next_stmt()] interface.</li>
<li>Added a new [rtree | R*Tree virtual table]</li>
}

chng {2008 May 14 (3.5.9)} {
<li>Added <em>experimental</em>
    support for the [journal_mode] PRAGMA and persistent journal.</li>
<li>[journal_mode | Journal mode PERSIST] is the default behavior in
    [locking_mode | exclusive locking mode].</li>

Changes to pages/compile.in.

225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
  option is not used, the [sqlite3_release_memory()] interface is a 
  no-op.  Since [sqlite3_soft_heap_limit()] depends on
  [sqlite3_release_memory()], this option is also necessary for
  the correct operation of [sqlite3_soft_heap_limit()].
}

COMPILE_OPTION {SQLITE_ENABLE_RTREE} {
  This option causes SQLite to include support for the R*Tree index
  extension.
}
</tcl>

<a name="omitfeatures"></a>
<h2>1.5 Options To Omit Features</h2>

<p>The following options can used to reduce the size of the compiled







|
|







225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
  option is not used, the [sqlite3_release_memory()] interface is a 
  no-op.  Since [sqlite3_soft_heap_limit()] depends on
  [sqlite3_release_memory()], this option is also necessary for
  the correct operation of [sqlite3_soft_heap_limit()].
}

COMPILE_OPTION {SQLITE_ENABLE_RTREE} {
  This option causes SQLite to include support for the
  [rtree | R*Tree index extension].
}
</tcl>

<a name="omitfeatures"></a>
<h2>1.5 Options To Omit Features</h2>

<p>The following options can used to reduce the size of the compiled

Changes to pages/rtree.in.

1
2
3
4
5
6
7
8
9
...
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193






































































































<title>The SQLite R*Tree Module</title>
<tcl>hd_keywords rtree</tcl>

<h1>1.0 Overview</h1>

<p>
An <a href="http://en.wikipedia.org/wiki/R-tree">R-Tree</a> is a special
index that is designed for doing range queries.  R-Trees are most commonly
used in geospatial systems where each entry is a rectangle with minimum and
................................................................................
you can efficiently do inequality queries against the coordinate
ranges.  To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index
WHERE minX>=-81.08 AND maxX<=-80.58
  AND minY>=35.00 AND maxY<=35.44;
</pre></blockquote>

<p>
The query above would very quickly locate id of 1 even if the
R*Tree contained millions of entries.  The previous is an example
of a "contained-within" query.  The R*Tree also supports "overlapping"
queries.  For example, to find all bounding boxes that overlap the
Charlotte area:
</p>

<blockquote><pre>
SELECT id FROM demo_index
WHERE maxX>=-81.08 AND minX<=-80.58
  AND maxY>=35.00 AND minY<=35.44;
</pre></blockquote>

<p>
This second query would find both entry 1 (the SQLite.org office) which
is entirely contained within the query box and also
Mel Watt's Congressional District which extends well outside the
query box but still overlaps the query box.
</p>







































































































|







 







|
|












|
|








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
...
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
<title>The SQLite R*Tree Module</title>
<tcl>hd_keywords rtree RTREE</tcl>

<h1>1.0 Overview</h1>

<p>
An <a href="http://en.wikipedia.org/wiki/R-tree">R-Tree</a> is a special
index that is designed for doing range queries.  R-Trees are most commonly
used in geospatial systems where each entry is a rectangle with minimum and
................................................................................
you can efficiently do inequality queries against the coordinate
ranges.  To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE minX>=-81.08 AND maxX<=-80.58
   AND minY>=35.00  AND maxY<=35.44;
</pre></blockquote>

<p>
The query above would very quickly locate id of 1 even if the
R*Tree contained millions of entries.  The previous is an example
of a "contained-within" query.  The R*Tree also supports "overlapping"
queries.  For example, to find all bounding boxes that overlap the
Charlotte area:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxX>=-81.08 AND minX<=-80.58
   AND maxY>=35.00  AND minY<=35.44;
</pre></blockquote>

<p>
This second query would find both entry 1 (the SQLite.org office) which
is entirely contained within the query box and also
Mel Watt's Congressional District which extends well outside the
query box but still overlaps the query box.
</p>

<p>
Note that is not necessary for all coordinates in an R*Tree index
to be constrained in order for the index search to be efficient.
One might, for example, want to query all objects that overlap with
the 35th parallel:
</p>

<blockquote><pre>
SELECT id FROM demo_index
 WHERE maxY>=35.0  AND minY<=35.0;
</pre></blockquote>

<p>
But, generally speaking, the more constraints that the R*Tree module
has to work with, and the smaller the bounding box, the faster the
results will come back.
</p>

<h1>4.0 Using R*Trees Effectively</h1>

<p>
The only information that an R*Tree index stores about an object is
its integer ID and its bounding box.  Additional information needs to
be stored in separate tables and related to the R*Tree index using
the primary key.  For the example above, one might create an auxiliary
table as follows:
</p>

<blockquote><pre>
CREATE TABLE demo_data(
  id INTEGER PRIMARY KEY,  -- primary key
  objname TEXT,            -- name of the object
  objtype TEXT,            -- object type
  boundary BLOB            -- detailed boundary of object
);
</pre></blockquote>

<p>
In this example, the demo_data.boundary field is intended to hold some
kind of binary representation of the precise boundaries of the object.
The R*Tree index only holds an axis-aligned rectanglar boundary for the
object.  The R*Tree boundary is just an approximation of the true object
boundary.  So what typically happens is that the R*Tree index is used to
narrow a search down to a list of candidate objects and then more detailed
and expensive computations are done on each candidate to find if the
candidate truely meets the search criteria.
</p>

<blockquote><p>
<b>Key Point:</b>
An R*Tree index does not normally provide the exact answer but merely
reduces the set of potential answers from millions to dozens.
</p></blockquote>

<p>
Suppose the demo_data.boundary field holds some proprietary data description
of a complex two-dimensional boundary for an object and suppose that the
application has used the [sqlite3_create_function()] interface to
created application-defined functions "contained_in" and
"overlaps" accept two demo_data.boundary objects and return true or false.
One may assume that "contained_in" and "overlaps" are relatively slow
functions that we do not want to invoke too frequently.
Then an efficient way to find the name of all objects located within
the North Carolina 12th District, one my run a query like this:
</p>

<blockquote><pre>
SELECT objname FROM demo_data, demo_index
 WHERE demo_data.id=demo_index.id
   AND contained_in(demo_data.boundary, :boundary)
   AND minX>=-81.0 AND max<=-79.6
   AND minY>=35.0 AND maxY<=36.2;
</pre></blockquote>

<p>In the query above, one would presumably bind the binary BLOB 
description of the precise boundary of the 12th district to the
":boundary" parameter.</p>

<p>Notice how the query above works:  The R*Tree index runs in the outer
loop to find entries that are contained within the bounding box
of longitude -81..-79.6 and latitude 35.0..36.2. 
For each object identifer found, SQLite looks up
the corresponding entry in the demo_data table.  It then uses the boundary
field from the demo_data table as a parameter to the contained_in()
function and if that function returns true, the objname field from
the demo_data table is returned as the next row of query result.</p>

<p>One would get the same answer without the use of the R*Tree index
using the following simpler query:</p>

<blockquote><pre>
SELECT objname FROM demo_data
 WHERE contained_in(demo_data.boundary, :boundary);
</pre></blockquote>

<p>The problem with this latter query is that it must apply the
contained_in() function to millions of entires in the demo_data table.
The use of the R*Tree in the penultimate query reduces the number of
calls to contained_in() function to a small subset of the entire table.
The R*Tree index did not find the exact answer itself, it merely
limited the search space.</p>