Overview

<title>The Geopoly Interface To The SQLite R*Tree Module</title>
<tcl>hd_keywords {geopoly} {Geopoly module}</tcl>
<table_of_contents>

<h1>Overview</h1>

<p>
The Geopoly module is an alternative interface to the [R-Tree extension] that uses
the [http://geojson.org | GeoJSON] notation
([https://tools.ietf.org/html/rfc7946 | RFC-7946]) to describe two-dimensional
polygons.  Geopoly includes functions for detecting when one polygons is
contained within or overlaps with another, for computing the
area contained within a polygon, for doing linear trasformations of polygons,
for rendering polygons as
[https://en.wikipedia.org/wiki/Scalable_Vector_Graphics | SVG], and other
similar operations.

<p>
Geopoly operates on "simple" polygons - that is, polygons for which
the boundary does not intersect itself.  Geopoly thus extends the capabilities
of the [R-Tree extension] which can only deal with rectangular areas.
On the other hand, the [R-Tree extension] is
able to handle between 1 and 5 coordinate dimensions, whereas Geopoly is restricted
to 2-dimensional shapes only.

<p>
Each polygon in the Geopoly module can be associated with an arbitrary
number of auxiliary data fields.

<h2>GeoJSON</h2>

<p>The [https://tools.ietf.org/html/rfc7946| GeoJSON standard] is syntax for
exchanging geospatial information using JSON.  GeoJSON is a rich standard
that can describe nearly any kind of geospatial content.

<p>The Geopoly module only understands
a small subset of GeoJSON, but a critical subset.  
In particular, GeoJSON understands
the JSON array of vertexes that describes a simple polygon.

<p>A polygon is defined by its vertexes.
Each vertex is a JSON array of two numeric values which are the
X and Y coordinates of the vertex.
A polygon is a JSON array of these vertexes, and hence is an array
of arrays.
The first and last vertex in the array must be the same.
The polygon follows the right-hand rule:  When tracing a line from
one vertex to the next, the area to the right of the line is outside
of the polygon and the area to the left is inside the polygon.
In other words, the net rotation of the vertexes is counter-clockwise.

<p>
For example, the following JSON describes an isosceles triangle, sitting
on the X access and with an area of 0.5:

<codeblock>
&#91;&#91;0,0],&#91;1,0],&#91;0.5,1],&#91;0,0]]
</codeblock>

<p>
A triangle has three vertexes, but the GeoJSON description of the triangle
has 4 vertexes because the first and last vertex are duplicates.

<h2>Binary storage format</h2>

<p>
Internally, Geopoly stores polygons in a binary format - an SQL BLOB.
Details of the binary format are given below.
All of the Geopoly interfaces are able to accept polygons in either the
GeoJSON format or in the binary format.

<h1>Using The Geopoly Extension</h1>

<p>
A geopoly table is created as follows:

<codeblock>
CREATE VIRTUAL TABLE newtab USING geopoly(a,b,c);
</codeblock>

<p>
The statement above creates a new geopoly table named "newtab".
Every geopoly table contains a built-in integer "rowid" column
and a "_shape" column that contains
the polygon associated with that row of the table.
The example above also defines three auxiliary data columns 
named "a", "b", and "c" that can store whatever additional
information the application needs to associate
with each polygon.  If there is no need to store auxiliary
information, the list of auxiliary columns can be omitted.

<p>
Store new polygons in the table using ordinary INSERT statements:

<codeblock>
INSERT INTO newtab(_shape) VALUES('&#91;&#91;0,0],&#91;1,0],&#91;0.5,1],&#91;0,0]]');
</codeblock>

<p>
UPDATE and DELETE statements work similarly.

<h2>Queries</h2>

<p>
To query the geopoly table using an indexed geospatial search, 
use one of the functions geopoly_overlap()
or geopoly_within() as a boolean function in the WHERE clause,
with the "_shape" column as the first argument to the function.
For example:

<codeblock>
SELECT * FROM newtab WHERE geopoly_overlap(_shape, $query_polygon);
</codeblock>

<p>
The previous example will return every row for which the _shape
overlaps the polygon in the $query_polygon parameter.  The
geopoly_within() function works similarly, but only returns rows for
which the _shape is completely contained within $query_polygon.

<p>
Querys (and also DELETE and UPDATE statements) in which the WHERE
clause contains a bare geopoly_overlap() or geopoly_within() function
make use of the underly R*Tree data structures for a fast lookup that
only has to examine a subset of the rows in the table.  The number of
rows examines depends, of course, on the size of the $query_polygon.
Large $query_polygons will normally need to look at more rows than small
ones.

<p>
Queries against the rowid of a geopoly table are also very quick, even
for tables with a vast number of rows.
However, none of the auxiliary data columns are indexes, and so queries
against the auxiliary data columns will involve a full table scan.

<h1>Special Functions</h1>

<p>
The geopoly module defines several new SQL functions that are useful for
dealing with polygons.  All polygon arguments to these functions can be
either the GeoJSON format or the internal binary format.

<h2>The geopoly_overlap(P1,P2) Function</h2>

<p>
If P1 and P2 are both polygons, then the geopoly_overlap(P1,P2) function returns
true if there is any overlap between P1 and P2, or it returns false if P1 and P2
completely disjoint.
If either P1 or P2 is not a polygon, this routine returns NULL.

<p>
The geopoly_overlap(P1,p2) function is special in that the geopoly virtual
table knows how to use R*Tree indexes to optimize queries in which the 
WHERE clause uses geopoly_overlap() as a boolean function.  Only the
geopoly_overlap(P1,P2) and geopoly_within(P1,P2) functions have this
capability.

<h2>The geopoly_within(P1,P2) Function</h2>

<p>
If P1 and P2 are both polygons, then the geopoly_within(P1,P2) function returns
true if P2 is completely contained within P1, or it returns false if any part of
P2 is outside of P1.  If P1 and P2 are the same polygon, this routine returns true.
If either P1 or P2 is not a polygon, this routine returns NULL.

<p>
The geopoly_within(P1,p2) function is special in that the geopoly virtual
table knows how to use R*Tree indexes to optimize queries in which the 
WHERE clause uses geopoly_within() as a boolean function.  Only the
geopoly_within(P1,P2) and geopoly_overlap(P1,P2) functions have this
capability.

<h2>The geopoly_area(P) Function</h2>

<p>
If P is a polygon, then geopoly_area(P) returns the area enclosed by
that polygon.  If P is not a polygon, geopoly_area(P) returns NULL.

<h2>The geopoly_blob(P) Function</h2>

<p>
If P is a polygon, then geopoly_blob(P) returns the binary encoding
of that polygon as a BLOB.
If P is not a polygon, geopoly_blob(P) returns NULL.

<h2>The geopoly_json(P) Function</h2>

<p>
If P is a polygon, then geopoly_json(P) returns the GeoJSON representation
of that polygon as a TEPT string.
If P is not a polygon, geopoly_json(P) returns NULL.

<h2>The geopoly_svg(P,...) Function</h2>

<p>
If P is a polygon, then geopoly_svg(P,...) returns a text string which is a
[https://en.wikipedia.org/wiki/Scalable_Vector_Graphics|Scalable Vector Graphics (SVG)]
representation of that polygon.  If there is more one argument, then second
and subsequent arguments are added as attributes to each SVG glyph.  For example:

<codeblock>
SELECT geopoly_svg($polygon,'class="poly"','style="fill:blue;"');
</codeblock>

<p>
If P is not a polygon, geopoly_svg(P,...) returns NULL.

<p>
Note that geopoly uses a traditional right-handed cartesian coordinate system
with the origin at the lower left, whereas SVG uses a left-handed coordinate
system with the origin at the upper left.  The geopoly_svg() routine makes no
attempt to transform the coordinate system, so the displayed images are shown
in mirror image and rotated.  If that is undesirable, the geopoly_xform() routine
can be used to transform the output from cartesian to SVG coordinates prior to
passing the polygons into geopoly_svg().

<h2>The geopoly_bbox(P) Function</h2>

<p>
If P is a polygon, then geopoly_bbox(P) returns the a new polygon that is
the smallest rectangle completely enclosing P.
If P is not a polygon, geopoly_bbox(P) returns NULL.

<h2>The geopoly_contains_point(P,X,Y) Function</h2>

<p>
If P is a polygon, then geopoly_contains_point(P,X,Y) returns true if and only
if the coordinate X,Y is inside or on the boundary of the polygon P.
If P is not a polygon, geopoly_contains_point(P,X,Y) returns NULL.

<h2>The geopoly_xform(P,A,B,C,D,E,F) Function</h2>

<p>
The geopoly_xform(P,A,B,C,D,E,F) returns a new polygon that is a
linear transformation of the polygon P and where the transformation
is defined by values A,B,C,D,E,F. If P is not a valid polygon, this
routine returns NULL.

<p>
The transformation converts each vertex of the polygon according to the
following formula:

<codeblock>
x1 = A*x0 + B*y0 + E
y1 = C*x0 + D*y0 + F
</codeblock>

<p>
So, for example, to move a polygon by some amount DX, DY without changing
its shape, use:

<codeblock>
geopoly_xform($polygon, 1, 0, 0, 1, $DX, $DY)
</codeblock>

<p>
To rotate a polygon by R radians around the point 0, 0:

<codeblock>
geopoly_xform($polygon, cos($R), sin($R), -sin($R), cos($R), 0, 0)
</codeblock>

<h1>Implementation Details</h1>

<p>The geopoly module is an extension to the [R-Tree extension].  Geopoly
uses the same underlying logic and shadow tables as the [R-Tree extension].
Geopoly merely presents a different interface, and provides some extra logic
to compute polygon decoding, overlap, and containment.

<h2>Binary Encoding of Polygons</h2>

<p>
Geopoly stores all polygons internally using a binary format.  A binary
polygon consists of a 4-byte header following by an array coordinate
pairs which each dimension of each coordinate is of 32-byte floating point
number.

<p>
The first byte of the header is a flag byte.  The least significant bit
of the flag byte determines whether the coordinate pairs the follow the
header are stored big-endian or little-endian.  A value of 0 for the least
significant bit means big-endian and a value of 1 means little endian.
Other bits of the first byte in the header are reserved for future expansion.

<p>
The next three bytes in the header record the number of vertexes in the polygon
as a big-endian integer.  Thus there is an upper bound of about 16 million
vertexes per polygon.

<p>
Following the header is the array of coordinate pairs.  Each coordinate is
a 32-bit floating point number.  The use of 32-bit floating point values for
coordinates means that any point on the earth's surface can be mapped with
a resolution of approximately 2.5 meters.  Higher resolutions are of course
possible if the map is restricted to a single continent or country.
Note that the resolution of coordinates in the geopoly module is similar
in magnitude to daily movement of points on the earth's surface due to
tidal forces.

<p>
The list of coordinates in the binary format contains no redundancy.  
The last coordinate is not a repeat of the first as it is with GeoJSON.  
Hence, there is always one fewer coordinate pair in the binary representation of
a polygon compared to the GeoJSON representation.

<h2>Shadow Tables</h2>

<p>
The geopoly module is built on top of the [R-Tree extension] and uses the
same underlying shadow tables and algorithms.  For indexing purposes, each
polygon is represented in the shadow tables as a rectangular bounding box.
The underlying R-Tree implementation uses bounding boxes to limit the search
space.  Then the geoploy_overlap() and/or geopoly_within() routines further
refine the search to the exact answer.