Documentation Source Text

Artifact Content
Login

Artifact 200d7ce97f9e370064d256a4e44ff471ec7b31e24cbf51dd5a1c68f37d49ff43:


<title>The Geopoly Interface To The SQLite R*Tree Module</title>
<tcl>hd_keywords {geopoly} {Geopoly module} {Geopoly extension}</tcl>
<table_of_contents>

<h1>Overview</h1>

<p>
The Geopoly module is an alternative interface to the [R-Tree extension] that uses
the [http://geojson.org | GeoJSON] notation
([https://tools.ietf.org/html/rfc7946 | RFC-7946]) to describe two-dimensional
polygons.  Geopoly includes functions for detecting when one polygon is
contained within or overlaps with another, for computing the
area enclosed by a polygon, for doing linear transformations of polygons,
for rendering polygons as
[https://en.wikipedia.org/wiki/Scalable_Vector_Graphics | SVG], and other
similar operations.

<p>
The source code for Geopoly is included in the [amalgamation] but is not
included in the library unless the [-DSQLITE_ENABLE_GEOPOLY] compile-time option
is used.

<p>
Geopoly operates on "simple" polygons - that is, polygons for which
the boundary does not intersect itself.  Geopoly thus extends the capabilities
of the [R-Tree extension] which can only deal with rectangular areas.
On the other hand, the [R-Tree extension] is
able to handle between 1 and 5 coordinate dimensions, whereas Geopoly is restricted
to 2-dimensional shapes only.

<p>
Each polygon in the Geopoly module can be associated with an arbitrary
number of auxiliary data fields.

<h2>GeoJSON</h2>

<p>The [https://tools.ietf.org/html/rfc7946| GeoJSON standard] is syntax for
exchanging geospatial information using JSON.  GeoJSON is a rich standard
that can describe nearly any kind of geospatial content.

<p>The Geopoly module only understands
a small subset of GeoJSON, but a critical subset.  
In particular, GeoJSON understands
the JSON array of vertexes that describes a simple polygon.

<p>A polygon is defined by its vertexes.
Each vertex is a JSON array of two numeric values which are the
X and Y coordinates of the vertex.
A polygon is a JSON array of at least four of these vertexes, 
and hence is an array of arrays.
The first and last vertex in the array must be the same.
The polygon follows the right-hand rule:  When tracing a line from
one vertex to the next, the area to the right of the line is outside
of the polygon and the area to the left is inside the polygon.
In other words, the net rotation of the vertexes is counter-clockwise.

<p>
For example, the following JSON describes an isosceles triangle, sitting
on the X axis and with an area of 0.5:

<codeblock>
&#91;&#91;0,0],&#91;1,0],&#91;0.5,1],&#91;0,0]]
</codeblock>

<p>
A triangle has three vertexes, but the GeoJSON description of the triangle
has 4 vertexes because the first and last vertex are duplicates.

<h2>Binary storage format</h2>

<p>
Internally, Geopoly stores polygons in a binary format - an SQL BLOB.
Details of the binary format are given below.
All of the Geopoly interfaces are able to accept polygons in either the
GeoJSON format or in the binary format.

<h1>Using The Geopoly Extension</h1>

<p>
A geopoly table is created as follows:

<codeblock>
CREATE VIRTUAL TABLE newtab USING geopoly(a,b,c);
</codeblock>

<p>
The statement above creates a new geopoly table named "newtab".
Every geopoly table contains a built-in integer "rowid" column
and a "_shape" column that contains
the polygon associated with that row of the table.
The example above also defines three auxiliary data columns 
named "a", "b", and "c" that can store whatever additional
information the application needs to associate
with each polygon.  If there is no need to store auxiliary
information, the list of auxiliary columns can be omitted.

<p>
Store new polygons in the table using ordinary INSERT statements:

<codeblock>
INSERT INTO newtab(_shape) VALUES('&#91;&#91;0,0],&#91;1,0],&#91;0.5,1],&#91;0,0]]');
</codeblock>

<p>
UPDATE and DELETE statements work similarly.

<h2>Queries</h2>

<p>
To query the geopoly table using an indexed geospatial search, 
use one of the functions geopoly_overlap()
or geopoly_within() as a boolean function in the WHERE clause,
with the "_shape" column as the first argument to the function.
For example:

<codeblock>
SELECT * FROM newtab WHERE geopoly_overlap(_shape, $query_polygon);
</codeblock>

<p>
The previous example will return every row for which the _shape
overlaps the polygon in the $query_polygon parameter.  The
geopoly_within() function works similarly, but only returns rows for
which the _shape is completely contained within $query_polygon.

<p>
Queries (and also DELETE and UPDATE statements) in which the WHERE
clause contains a bare geopoly_overlap() or geopoly_within() function
make use of the underly R*Tree data structures for a fast lookup that
only has to examine a subset of the rows in the table.  The number of
rows examines depends, of course, on the size of the $query_polygon.
Large $query_polygons will normally need to look at more rows than small
ones.

<p>
Queries against the rowid of a geopoly table are also very quick, even
for tables with a vast number of rows.
However, none of the auxiliary data columns are indexes, and so queries
against the auxiliary data columns will involve a full table scan.

<h1>Special Functions</h1>

<p>
The geopoly module defines several new SQL functions that are useful for
dealing with polygons.  All polygon arguments to these functions can be
either the GeoJSON format or the internal binary format.

<tcl>hd_fragment goverlap geopoly_overlap</tcl>
<h2>The geopoly_overlap(P1,P2) Function</h2>

<p>
If P1 and P2 are both polygons, then the geopoly_overlap(P1,P2) function returns
true if there is any overlap between P1 and P2, or it returns false if P1 and P2
completely disjoint.
If either P1 or P2 is not a polygon, this routine returns NULL.

<p>
The geopoly_overlap(P1,P2) function is special in that the geopoly virtual
table knows how to use R*Tree indexes to optimize queries in which the 
WHERE clause uses geopoly_overlap() as a boolean function.  Only the
geopoly_overlap(P1,P2) and geopoly_within(P1,P2) functions have this
capability.

<tcl>hd_fragment gwithin geopoly_within</tcl>
<h2>The geopoly_within(P1,P2) Function</h2>

<p>
If P1 and P2 are both polygons, then the geopoly_within(P1,P2) function returns
true if P2 is completely contained within P1, or it returns false if any part of
P2 is outside of P1.  If P1 and P2 are the same polygon, this routine returns true.
If either P1 or P2 is not a polygon, this routine returns NULL.

<p>
The geopoly_within(P1,P2) function is special in that the geopoly virtual
table knows how to use R*Tree indexes to optimize queries in which the 
WHERE clause uses geopoly_within() as a boolean function.  Only the
geopoly_within(P1,P2) and geopoly_overlap(P1,P2) functions have this
capability.

<tcl>hd_fragment garea geopoly_area</tcl>
<h2>The geopoly_area(P) Function</h2>

<p>
If P is a polygon, then geopoly_area(P) returns the area enclosed by
that polygon.  If P is not a polygon, geopoly_area(P) returns NULL.

<tcl>hd_fragment gblob geopoly_blob</tcl>
<h2>The geopoly_blob(P) Function</h2>

<p>
If P is a polygon, then geopoly_blob(P) returns the binary encoding
of that polygon as a BLOB.
If P is not a polygon, geopoly_blob(P) returns NULL.

<tcl>hd_fragment gjson geopoly_json</tcl>
<h2>The geopoly_json(P) Function</h2>

<p>
If P is a polygon, then geopoly_json(P) returns the GeoJSON representation
of that polygon as a TEXT string.
If P is not a polygon, geopoly_json(P) returns NULL.

<tcl>hd_fragment gsvg geopoly_svg</tcl>
<h2>The geopoly_svg(P,...) Function</h2>

<p>
If P is a polygon, then geopoly_svg(P,...) returns a text string which is a
[https://en.wikipedia.org/wiki/Scalable_Vector_Graphics|Scalable Vector Graphics (SVG)]
representation of that polygon.  If there is more one argument, then second
and subsequent arguments are added as attributes to each SVG glyph.  For example:

<codeblock>
SELECT geopoly_svg($polygon,'class="poly"','style="fill:blue;"');
</codeblock>

<p>
If P is not a polygon, geopoly_svg(P,...) returns NULL.

<p>
Note that geopoly uses a traditional right-handed cartesian coordinate system
with the origin at the lower left, whereas SVG uses a left-handed coordinate
system with the origin at the upper left.  The geopoly_svg() routine makes no
attempt to transform the coordinate system, so the displayed images are shown
in mirror image and rotated.  If that is undesirable, the geopoly_xform() routine
can be used to transform the output from cartesian to SVG coordinates prior to
passing the polygons into geopoly_svg().

<tcl>hd_fragment gbbox geopoly_bbox geopoly_group_bbox</tcl>
<h2>The geopoly_bbox(P) and geopoly_group_bbox(P) Functions</h2>

<p>
If P is a polygon, then geopoly_bbox(P) returns a new polygon that is
the smallest (axis-aligned) rectangle completely enclosing P.
If P is not a polygon, geopoly_bbox(P) returns NULL.

<p>
The geopoly_group_bbox(P) function is an aggregate version of geopoly_bbox(P).
The geopoly_group_bbox(P) function returns the smallest rectangle that will
enclose all P values seen during aggregation.

<tcl>hd_fragment gpoint geopoly_constains_point</tcl>
<h2>The geopoly_contains_point(P,X,Y) Function</h2>

<p>
If P is a polygon, then geopoly_contains_point(P,X,Y) returns true if and only
if the coordinate X,Y is inside or on the boundary of the polygon P.
If P is not a polygon, geopoly_contains_point(P,X,Y) returns NULL.

<tcl>hd_fragment xform geopoly_xform</tcl>
<h2>The geopoly_xform(P,A,B,C,D,E,F) Function</h2>

<p>
The geopoly_xform(P,A,B,C,D,E,F) function returns a new polygon that is an
affine transformation of the polygon P and where the transformation
is defined by values A,B,C,D,E,F. If P is not a valid polygon, this
routine returns NULL.

<p>
The transformation converts each vertex of the polygon according to the
following formula:

<codeblock>
x1 = A*x0 + B*y0 + E
y1 = C*x0 + D*y0 + F
</codeblock>

<p>
So, for example, to move a polygon by some amount DX, DY without changing
its shape, use:

<codeblock>
geopoly_xform($polygon, 1, 0, 0, 1, $DX, $DY)
</codeblock>

<p>
To rotate a polygon by R radians around the point 0, 0:

<codeblock>
geopoly_xform($polygon, cos($R), sin($R), -sin($R), cos($R), 0, 0)
</codeblock>


<tcl>hd_fragment regpoly geopoly_regular</tcl>
<h2>The geopoly_regular(X,Y,R,N) Function</h2>

<p>
The geopoly_regular(X,Y,R,N) function returns a convex, simple, regular,
equilateral, equiangular polygon with N sides, centered at X,Y, and with
a circumradius of R.  Or, if R is negative or if N is less than 3, the
function returns NULL.  The N value is capped at 1000 so that the routine
will never render a polygon with more than 1000 sides even if the N value
is larger than 1000.

<p>
As an example, the following graphic:

<blockquote>
<svg width="600" height="300" style="border:1px solid black">
<polyline points="140.003,100 80.0019,134.644 80.0019,65.3565 140.003,100" style="fill:none;stroke:red;stroke-width:2"></polyline> <text x="100" y="106" alignment-baseline="central" text-anchor="middle">3</text>
<polyline points="240.003,100 200,140.003 159.997,100 200,59.9973 240.003,100" style="fill:none;stroke:orange;stroke-width:2"></polyline> <text x="200" y="106" alignment-baseline="central" text-anchor="middle">4</text>
<polyline points="340.003,100 312.358,138.042 267.637,123.511 267.637,76.4893 312.358,61.9583 340.003,100" style="fill:none;stroke:green;stroke-width:2"></polyline> <text x="300" y="106" alignment-baseline="central" text-anchor="middle">5</text>
<polyline points="440.003,100 419.998,134.644 380.002,134.644 359.997,100 380.002,65.3565 419.998,65.3565 440.003,100" style="fill:none;stroke:blue;stroke-width:2"></polyline> <text x="400" y="106" alignment-baseline="central" text-anchor="middle">6</text>
<polyline points="540.003,100 524.94,131.276 491.101,138.995 463.959,117.353 463.959,82.6471 491.101,61.005 524.94,68.7243 540.003,100" style="fill:none;stroke:purple;stroke-width:2"></polyline> <text x="500" y="106" alignment-baseline="central" text-anchor="middle">7</text>
<polyline points="140.003,200 128.286,228.286 100,240.003 71.7143,228.286 59.9973,200 71.7143,171.714 100,159.997 128.286,171.714 140.003,200" style="fill:none;stroke:red;stroke-width:2"></polyline> <text x="100" y="206" alignment-baseline="central" text-anchor="middle">8</text>
<polyline points="240.003,200 232.363,223.511 212.358,238.042 187.642,238.042 167.637,223.511 159.997,200 167.637,176.489 187.642,161.958 212.358,161.958 232.363,176.489 240.003,200" style="fill:none;stroke:orange;stroke-width:2"></polyline> <text x="200" y="206" alignment-baseline="central" text-anchor="middle">10</text>
<polyline points="340.003,200 334.644,219.998 319.998,234.644 300,240.003 280.002,234.644 265.356,219.998 259.997,200 265.356,180.002 280.002,165.356 300,159.997 319.998,165.356 334.644,180.002 340.003,200" style="fill:none;stroke:green;stroke-width:2"></polyline> <text x="300" y="206" alignment-baseline="central" text-anchor="middle">12</text>
<polyline points="440.003,200 436.956,215.305 428.286,228.286 415.305,236.956 400,240.003 384.695,236.956 371.714,228.286 363.044,215.305 359.997,200 363.044,184.695 371.714,171.714 384.695,163.044 400,159.997 415.305,163.044 428.286,171.714 436.956,184.695 440.003,200" style="fill:none;stroke:blue;stroke-width:2"></polyline> <text x="400" y="206" alignment-baseline="central" text-anchor="middle">16</text>
<polyline points="540.003,200 538.042,212.358 532.363,223.511 523.511,232.363 512.358,238.042 500,240.003 487.642,238.042 476.489,232.363 467.637,223.511 461.958,212.358 459.997,200 461.958,187.642 467.637,176.489 476.489,167.637 487.642,161.958 500,159.997 512.358,161.958 523.511,167.637 532.363,176.489 538.042,187.642 540.003,200" style="fill:none;stroke:purple;stroke-width:2"></polyline> <text x="500" y="206" alignment-baseline="central" text-anchor="middle">20</text>
</svg>
</blockquote>

<p>Was generated by this script:

<codeblock>
SELECT '&lt;svg width="600" height="300">';
WITH t1(x,y,n,color) AS (VALUES
   (100,100,3,'red'),
   (200,100,4,'orange'),
   (300,100,5,'green'),
   (400,100,6,'blue'),
   (500,100,7,'purple'),
   (100,200,8,'red'),
   (200,200,10,'orange'),
   (300,200,12,'green'),
   (400,200,16,'blue'),
   (500,200,20,'purple')
)
SELECT
   geopoly_svg(geopoly_regular(x,y,40,n),
        printf('style="fill:none;stroke:%s;stroke-width:2"',color))
   || printf(' &lt;text x="%d" y="%d" alignment-baseline="central" text-anchor="middle">%d&lt;/text>',x,y+6,n)
  FROM t1;
SELECT '&lt;/svg>';
</codeblock>

<tcl>hd_fragment ccw geopoly_ccw</tcl>
<h2>The geopoly_ccw(J) Function</h2>

<p>The geopoly_ccw(J) function returns the polygon J with counter-clockwise (CCW) rotation.

<p>
[https://tools.ietf.org/html/rfc7946 | RFC-7946] requires that polygons use CCW rotation.
But the spec also observes that many legacy GeoJSON files do not following the spec and
contain polygons with clockwise (CW) rotation.  The geopoly_ccw() function is useful for
applications that are reading legacy GeoJSON scripts.  If the input to geopoly_ccw() is
a correctly-formatted polygon, then no changes are made.  However, if the circulation of
the input polygon is backwards, then geopoly_ccw() reverses the circulation order so that
it conforms to the spec and so that it will work correctly with the Geopoly module.



<h1>Implementation Details</h1>

<p>The geopoly module is an extension to the [R-Tree extension].  Geopoly
uses the same underlying logic and shadow tables as the [R-Tree extension].
Geopoly merely presents a different interface, and provides some extra logic
to compute polygon decoding, overlap, and containment.

<h2>Binary Encoding of Polygons</h2>

<p>
Geopoly stores all polygons internally using a binary format.  A binary
polygon consists of a 4-byte header following by an array of coordinate
pairs in which each dimension of each coordinate is a 32-bit floating point
number.

<p>
The first byte of the header is a flag byte.  The least significant bit
of the flag byte determines whether the coordinate pairs that follow the
header are stored big-endian or little-endian.  A value of 0 for the least
significant bit means big-endian and a value of 1 means little endian.
Other bits of the first byte in the header are reserved for future expansion.

<p>
The next three bytes in the header record the number of vertexes in the polygon
as a big-endian integer.  Thus there is an upper bound of about 16 million
vertexes per polygon.

<p>
Following the header is the array of coordinate pairs.  Each coordinate is
a 32-bit floating point number.  The use of 32-bit floating point values for
coordinates means that any point on the earth's surface can be mapped with
a resolution of approximately 2.5 meters.  Higher resolutions are of course
possible if the map is restricted to a single continent or country.
Note that the resolution of coordinates in the geopoly module is similar
in magnitude to daily movement of points on the earth's surface due to
tidal forces.

<p>
The list of coordinates in the binary format contains no redundancy.  
The last coordinate is not a repeat of the first as it is with GeoJSON.  
Hence, there is always one fewer coordinate pair in the binary representation of
a polygon compared to the GeoJSON representation.

<h2>Shadow Tables</h2>

<p>
The geopoly module is built on top of the [R-Tree extension] and uses the
same underlying shadow tables and algorithms.  For indexing purposes, each
polygon is represented in the shadow tables as a rectangular bounding box.
The underlying R-Tree implementation uses bounding boxes to limit the search
space.  Then the geoploy_overlap() and/or geopoly_within() routines further
refine the search to the exact answer.