Documentation Source Text

Check-in [bd0fe1c3e1]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updates to the R*Tree documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: bd0fe1c3e1097519f695e45476672108f3382aed
User & Date: drh 2014-05-01 14:44:03.929
Context
2014-05-07
16:00
Add notes on the SQLITE_IOCAP_IMMUTABLE and "nolock" and "immutable" query parameter changes. (check-in: 7aa71cbb0c user: drh tags: trunk)
2014-05-01
14:44
Updates to the R*Tree documentation. (check-in: bd0fe1c3e1 user: drh tags: trunk)
2014-04-29
15:08
Added documentation for the sqlite3_rtree_query_callback() enhancement. (check-in: 89557fb903 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/rtree.in.
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67


68
69
70
71
72
73
74
75
76
77
78
79
<h2>3.0 Using the R*Tree Module</h2>

<p>
The SQLite R*Tree module is implemented as a
[sqlite3_create_module | virtual table].  ^Each R*Tree index is a
virtual table with an odd number of columns between 3 and 11.
^The first column is always a 64-bit signed integer primary key.
^The other columns are minimum- and maximum-value pairs (stored as
32-bit floating point numbers) for each
dimension.  ^A 1-dimensional R*Tree thus has 3 columns.  
^A 2-dimensional R*Tree has 5 columns.
^A 3-dimensional R*Tree has 7 columns.
^A 4-dimensional R*Tree has 9 columns.
^And a 5-dimensional R*Tree has 11 columns.  The SQLite R*Tree implementation
does not support R*Trees wider than 5 dimensions.
</p>

<p>
^The first column of an SQLite R*Tree must always be an integer
primary key.
^The min/max-value pair columns are always stored as


32-bit floating point values.  ^Unlike regular SQLite tables which
can store data in a variety of datatypes and formats, the R*Tree
indices rigidly enforce these two storage types.  ^Attempts to insert
something other than an integer into the first column, or something
other than a floating point value into the other columns, will result
in an error.
</p>

<h3>3.1 Creating An R*Tree Index</h3>

^(<p>
A new R*Tree index is created as follows:







|
|
|



|






|
>
>
|

|

|







47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
<h2>3.0 Using the R*Tree Module</h2>

<p>
The SQLite R*Tree module is implemented as a
[sqlite3_create_module | virtual table].  ^Each R*Tree index is a
virtual table with an odd number of columns between 3 and 11.
^The first column is always a 64-bit signed integer primary key.
^The other columns are pairs, one pair per dimension, containing the
minimum and maximum values for that dimension, respectively.
^A 1-dimensional R*Tree thus has 3 columns.  
^A 2-dimensional R*Tree has 5 columns.
^A 3-dimensional R*Tree has 7 columns.
^A 4-dimensional R*Tree has 9 columns.
^And a 5-dimensional R*Tree has 11 columns.  ^The SQLite R*Tree implementation
does not support R*Trees wider than 5 dimensions.
</p>

<p>
^The first column of an SQLite R*Tree must always be an integer
primary key.
^The min/max-value pair columns are stored as
32-bit floating point values for "rtree" virtual tables or as
32-bit signed integers in "rtree_i32" virtual tables.
^Unlike regular SQLite tables which
can store data in a variety of datatypes and formats, the R*Tree
indices rigidly enforce these storage types.  ^Attempts to insert
something other than an integer into the first column, or something
other than a numeric value into the other columns, will result
in an error.
</p>

<h3>3.1 Creating An R*Tree Index</h3>

^(<p>
A new R*Tree index is created as follows:
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
</p>

<blockquote><pre>
SELECT * FROM demo_index WHERE id=1;
</pre></blockquote>)^

<p>
Of course, an ordinary SQLite table will do a query against its
integer primary key efficiently, so the previous is no big deal.
The real reason for using an R*Tree in the first place is so that
you can efficiently do inequality queries against the coordinate
ranges.  ^(To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index







|

|







159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
</p>

<blockquote><pre>
SELECT * FROM demo_index WHERE id=1;
</pre></blockquote>)^

<p>
Of course, an ordinary SQLite table will also do a query against its
integer primary key efficiently, so the previous is no big deal.
The real reason for using an R*Tree is so that
you can efficiently do inequality queries against the coordinate
ranges.  ^(To find all elements of the index that are contained within
the vicinity of Charlotte, North Carolina, one might do:
</p>

<blockquote><pre>
SELECT id FROM demo_index
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421

422

423
424
425
426
427
428
429
430
431
432
433
434
435
436
437

438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
</pre></blockquote>

<p>The sqlite3_rtree_query_callback() became available with SQLite
[version 3.8.5] and is the preferred interface.
The sqlite3_rtree_geometry_callback() is an older and less flexible
interface that is supported for backwards compatiblity.

<p>A call to one of the above APIs creates a new SQL function named by the
second parameter (zQueryFunc or zGeom).  When that SQL function appears
on the right-hand side of the MATCH operator and the left-hand side of the
MATCH operator is any column in the R*Tree virtual table, then the callback 
defined by the third argument (xQueryFunc or xGeom) is invoked to determine
if a particular object or subtree overlaps the desired region.

<p>For example, a query like the following might be used to find all
R*Tree entries that overlap with a circle centered a 45.3,22.9 with a
radius of 5.0:

<blockquote><pre>
SELECT id FROM demo_index WHERE id MATCH circle(45.3, 22.9, 5.0)
</blockquote></pre>

<p>The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or sqlite3_rtree_query_callback(),
is used to register the SQL function.  However, the newer query-style
callbacks give the application greater control over how the query proceeds.

<h3>6.1 The Legacy xGeom Callback</h3>

<p>The legacy xGeom callback is invoked with four arguments.  The first
argument is a pointer to an sqlite3_rtree_geometry structure which provides
information about how the SQL function was invoked.  The second argument
is the number of coordinates in each r-tree entry, and is always the same
for any given R*Tree.  The number coordinates is 2 for a 1-dimensional R*Tree,
4 for a 2-dimensional R*Tree, 6 for a 3-dimensional R*Tree, and so forth.
The third argument, aCoord[], is an array of nCoord coordinates that defines
a bounding box to be tested.  The last argument is a pointer into which 
the callback result should be written.  The result is zero
if the bounding-box defined by aCoord[] is completely outside
the region defined by the xGeom callback and the result is non-zero if
the bounding-box is inside or overlaps with the xGeom region.  The xGeom
callback should normally return SQLITE_OK.  If xGeom returns anything other
than SQLITE_OK, then the r-tree query will abort with an error.

<p>The sqlite3_rtree_geometry structure that the first argument to the
xGeom callback points to has a structure shown below.  The exact same

structure is used for every callback for same MATCH operator in the same

query.  The contents of the structure are initialized by SQLite but are
not subsequently modifed.  The callback is free to make changes to the
pUser and xDelUser elements of the structure if desired.

<blockquote><pre>
typedef struct sqlite3_rtree_geometry sqlite3_rtree_geometry;
struct sqlite3_rtree_geometry {
  void *pContext;                 /* Copy of pContext passed to s_r_g_c() */
  int nParam;                     /* Size of array aParam */
  double *aParam;                 /* Parameters passed to SQL geom function */
  void *pUser;                    /* Callback implementation user data */
  void (*xDelUser)(void *);       /* Called by SQLite to clean up pUser */
};
</pre></blockquote>


<p>The pContext member of the structure is always set to a copy of the pContext
argument passed to sqlite3_rtree_geometry_callback() when the
callback is registered. The aParam[] array (size nParam) contains the parameter
values passed to the SQL function on the right-hand side of the MATCH operator.
In the example "circle" query above, nParam would be set to 3 and the aParam[]
array would contain the three values 45.3, 22.9 and 5.0.

<p>The pUser and xDelUser members of the sqlite3_rtree_geometry structure are
initially set to NULL. The pUser variable may be set by the callback
implementation to any arbitrary value that may be useful to subsequent
invocations of the callback within the same query (for example, a
pointer to a complicated data structure used to test for region intersection).
If the xDelUser variable is set to a non-NULL value, then after the
query has finished running SQLite automatically invokes it with the
value of the pUser variable as the only argument. In other words, xDelUser
may be set to a destructor function for the pUser value.

<p>The xGeom callback always does a depth-first search of the r-tree.

<tcl>hd_fragment xquery {xQueryFunc R*Tree callback} {sqlite3_rtree_query_callback}
</tcl>
<h3>6.2 The New xQueryFunc Callback</h3>

<p>The newer xQueryFunc callback receives more information from the r-tree
query engine on each call, and it send more information back to the query engine







|
|





|





|

|






|

|

|

|
|




|



|
>

>
|














>
|

|




|
|



|




|







378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
</pre></blockquote>

<p>The sqlite3_rtree_query_callback() became available with SQLite
[version 3.8.5] and is the preferred interface.
The sqlite3_rtree_geometry_callback() is an older and less flexible
interface that is supported for backwards compatiblity.

<p>^A call to one of the above APIs creates a new SQL function named by the
second parameter (zQueryFunc or zGeom).  ^When that SQL function appears
on the right-hand side of the MATCH operator and the left-hand side of the
MATCH operator is any column in the R*Tree virtual table, then the callback 
defined by the third argument (xQueryFunc or xGeom) is invoked to determine
if a particular object or subtree overlaps the desired region.

<p>^(For example, a query like the following might be used to find all
R*Tree entries that overlap with a circle centered a 45.3,22.9 with a
radius of 5.0:

<blockquote><pre>
SELECT id FROM demo_index WHERE id MATCH circle(45.3, 22.9, 5.0)
</blockquote></pre>)^

<p>^The SQL syntax for custom queries is the same regardless of which
interface, sqlite3_rtree_geometry_callback() or sqlite3_rtree_query_callback(),
is used to register the SQL function.  However, the newer query-style
callbacks give the application greater control over how the query proceeds.

<h3>6.1 The Legacy xGeom Callback</h3>

<p>^The legacy xGeom callback is invoked with four arguments.  ^The first
argument is a pointer to an sqlite3_rtree_geometry structure which provides
information about how the SQL function was invoked.  ^The second argument
is the number of coordinates in each r-tree entry, and is always the same
for any given R*Tree.  ^The number of coordinates is 2 for a 1-dimensional R*Tree,
4 for a 2-dimensional R*Tree, 6 for a 3-dimensional R*Tree, and so forth.
^The third argument, aCoord[], is an array of nCoord coordinates that defines
a bounding box to be tested.  ^The last argument is a pointer into which 
the callback result should be written.  The result is zero
if the bounding-box defined by aCoord[] is completely outside
the region defined by the xGeom callback and the result is non-zero if
the bounding-box is inside or overlaps with the xGeom region.  The xGeom
callback should normally return SQLITE_OK.  ^If xGeom returns anything other
than SQLITE_OK, then the r-tree query will abort with an error.

<p>The sqlite3_rtree_geometry structure that the first argument to the
xGeom callback points to has a structure shown below.  ^The exact same
sqlite3_rtree_geometry
structure is used for every callback for same MATCH operator in the same
query.  ^The contents of the sqlite3_rtree_geometry
structure are initialized by SQLite but are
not subsequently modifed.  The callback is free to make changes to the
pUser and xDelUser elements of the structure if desired.

<blockquote><pre>
typedef struct sqlite3_rtree_geometry sqlite3_rtree_geometry;
struct sqlite3_rtree_geometry {
  void *pContext;                 /* Copy of pContext passed to s_r_g_c() */
  int nParam;                     /* Size of array aParam */
  double *aParam;                 /* Parameters passed to SQL geom function */
  void *pUser;                    /* Callback implementation user data */
  void (*xDelUser)(void *);       /* Called by SQLite to clean up pUser */
};
</pre></blockquote>

<p>^The pContext member of the sqlite3_rtree_geometry
structure is always set to a copy of the pContext
argument passed to sqlite3_rtree_geometry_callback() when the
callback is registered. ^The aParam[] array (size nParam) contains the parameter
values passed to the SQL function on the right-hand side of the MATCH operator.
In the example "circle" query above, nParam would be set to 3 and the aParam[]
array would contain the three values 45.3, 22.9 and 5.0.

<p>^The pUser and xDelUser members of the sqlite3_rtree_geometry structure are
initially set to NULL. ^The pUser variable may be set by the callback
implementation to any arbitrary value that may be useful to subsequent
invocations of the callback within the same query (for example, a
pointer to a complicated data structure used to test for region intersection).
^If the xDelUser variable is set to a non-NULL value, then after the
query has finished running SQLite automatically invokes it with the
value of the pUser variable as the only argument. In other words, xDelUser
may be set to a destructor function for the pUser value.

<p>^The xGeom callback always does a depth-first search of the r-tree.

<tcl>hd_fragment xquery {xQueryFunc R*Tree callback} {sqlite3_rtree_query_callback}
</tcl>
<h3>6.2 The New xQueryFunc Callback</h3>

<p>The newer xQueryFunc callback receives more information from the r-tree
query engine on each call, and it send more information back to the query engine
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506

<p>The xQueryFunc must set the eWithin field of sqlite3_rtree_query_info to
on of the values NOT_WITHIN, PARTLY_WITHIN, or FULLY_WITHIN depending on whether
or not the bounding box defined by aCoord[] is completely outside the region,
overlaps the region, or is completely inside the region, respectively.  In
addition, the xQueryFunc must set the rScore field to a non-negative value that
indicates the order in which subtrees and entries of the query should be analyzed
and returned.  Smaller scores are processed first.

<p>As its name implies, an R*Tree is organized as a tree.  Each node of the
tree is a bounding box.  The root of the tree is a bounding box that encapsulates
all elements of the tree.  Beneath the root are a number of subtrees (typically
20 or more) each with their own smaller bounding boxes and each containing some
subset of the R*Tree entries.  The subtrees may have sub-subtrees, and so forth
until finally one reaches the leaves of the tree which are the actual R*Tree







|







497
498
499
500
501
502
503
504
505
506
507
508
509
510
511

<p>The xQueryFunc must set the eWithin field of sqlite3_rtree_query_info to
on of the values NOT_WITHIN, PARTLY_WITHIN, or FULLY_WITHIN depending on whether
or not the bounding box defined by aCoord[] is completely outside the region,
overlaps the region, or is completely inside the region, respectively.  In
addition, the xQueryFunc must set the rScore field to a non-negative value that
indicates the order in which subtrees and entries of the query should be analyzed
and returned.  ^Smaller scores are processed first.

<p>As its name implies, an R*Tree is organized as a tree.  Each node of the
tree is a bounding box.  The root of the tree is a bounding box that encapsulates
all elements of the tree.  Beneath the root are a number of subtrees (typically
20 or more) each with their own smaller bounding boxes and each containing some
subset of the R*Tree entries.  The subtrees may have sub-subtrees, and so forth
until finally one reaches the leaves of the tree which are the actual R*Tree
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
















then sub-subtrees or leaves contained within that entry are passed to the
xQueryFunc callback, one by one.  Those subelements for which the xQueryFunc
callback sets eWithin to PARTLY_WITHIN or FULLY_WITHIN are added to the priority
queue using the score supplied by the callback.  Subelements that return
NOT_WITHIN are discarded.  The query runs until the priority queue is empty.

<p>Every leaf entry and node (subtree) within the R*Tree has an integer "level".
The leaves have a level of 0.  The first containing subtree of the leaves has
a level of 1.  The root of the R*Tree has the largest level value.  The
mxLevel entry in the sqlite3_rtree_query_info structure is the level value for
the root of the R*Tree.  The iLevel entry in sqlite3_rtree_query_info gives the
level for the object being interrogated.

<p>Most R*Tree queries use a depth-first search.  This is accomplished by setting
the rScore equal to iLevel.  A depth-first search is usually preferred since it
minimizes the number of elements in the priority queue, which reduces memory
requirements and speeds processing.  However, some application may prefer a
breadth-first search, which can be accomplished by setting rScore to mxLevel-iLevel.
By creating more complex formulas for rScore, applications can exercise
detailed control over the order in which subtree are searched and leaf
R*Tree entries are returned.  For example, in an application with many
millions of R*Tree entries, the rScore might be arranged so that the
largest or most significant entries are returned first, allowing the
application to display the most important information quickly, and
filling in smaller and less important details as they become available.

<p>Other information fields of the sqlite3_rtree_query_info structure are
available for use by the xQueryFunc callback, if desired.  The iRowid field
is the rowid (the first of the 3 to 11 columns in the R*Tree) for the element
being considered.  iRowid is only valid for leaves.  The eParentWithin and
rParentScore values are copies of the eWithin and rScore values from the
containing subtree of the current row.  The anQueue field is an array
of mxLevel+1 unsigned integers that tell the current number of elements in
the priority queue at each level.























|
|




|
|

|










|

|

|


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
then sub-subtrees or leaves contained within that entry are passed to the
xQueryFunc callback, one by one.  Those subelements for which the xQueryFunc
callback sets eWithin to PARTLY_WITHIN or FULLY_WITHIN are added to the priority
queue using the score supplied by the callback.  Subelements that return
NOT_WITHIN are discarded.  The query runs until the priority queue is empty.

<p>Every leaf entry and node (subtree) within the R*Tree has an integer "level".
^The leaves have a level of 0.  The first containing subtree of the leaves has
a level of 1.  The root of the R*Tree has the largest level value.  ^The
mxLevel entry in the sqlite3_rtree_query_info structure is the level value for
the root of the R*Tree.  The iLevel entry in sqlite3_rtree_query_info gives the
level for the object being interrogated.

<p>^(Most R*Tree queries use a depth-first search.  This is accomplished by setting
the rScore equal to iLevel.)^  A depth-first search is usually preferred since it
minimizes the number of elements in the priority queue, which reduces memory
requirements and speeds processing.  ^However, some application may prefer a
breadth-first search, which can be accomplished by setting rScore to mxLevel-iLevel.
By creating more complex formulas for rScore, applications can exercise
detailed control over the order in which subtree are searched and leaf
R*Tree entries are returned.  For example, in an application with many
millions of R*Tree entries, the rScore might be arranged so that the
largest or most significant entries are returned first, allowing the
application to display the most important information quickly, and
filling in smaller and less important details as they become available.

<p>Other information fields of the sqlite3_rtree_query_info structure are
available for use by the xQueryFunc callback, if desired.  ^(The iRowid field
is the rowid (the first of the 3 to 11 columns in the R*Tree) for the element
being considered.  iRowid is only valid for leaves.)^  ^The eParentWithin and
rParentScore values are copies of the eWithin and rScore values from the
containing subtree of the current row.  ^The anQueue field is an array
of mxLevel+1 unsigned integers that tell the current number of elements in
the priority queue at each level.

<h3>6.3 Additional Considerations for Custom Queries</h3>

<p>
^The MATCH operator of a custom R*Tree query function must be a top-level
AND-connected term of the WHERE clause, or else it will not be usable
by the R*Tree query optimizer and the query will not be runable.
^If the MATCH operator is connected to other terms of the WHERE clause 
via an OR operator, for example, the query will fail with an error.

<p>
^Two or more MATCH operators are allowed in the same WHERE clause, as long
as they are connected by AND operators.  However,
the R*Tree query engine only contains a single priority queue.  ^The priority
assigned to each node in the search is the lowest priority returned by any
of the MATCH operators.