1. Technical Field
The present disclosure relates generally to manipulation of spatial data in relational databases, and more specifically to efficiently finding entities based on both spatial and entity filtering criteria.
2. Background Information
Relational databases are electronic databases that store related data in tables of rows and columns, and allow links to be established between tables that have matching fields, such that multiple tables may be simultaneously queried. Many relational databases utilize a version of the SQL language, a special-purpose programming language adapted to managing data. SQL code may be implemented by a relational database system that executes as a separate process and is accessed by a client application. For example, SQL code may be executed by a MySQL® database system available from open source, the Oracle Database available from Oracle Corp, or a Microsoft SQL Server database system available from Microsoft Corp. Alternatively, SQL code may be implemented by a relational database system implemented by a self-contained programming library that may be integrated within the client application itself. For example, SQL code may be executed by the SQLite® embedded SQL database system, available in the public domain.
Among other uses, relational databases may be utilized to store spatial data that describes the position, shape and orientation of entities in multi-dimensional space. The position, shape and orientation of entities are typically represented using a spatial reference system. Among other functions, the spatial reference system provides a coordinate system (e.g., a Cartesian coordinate system) through which spatial information may be defined. For example, a point on an entity may be defined by x-axis, y-axis, and z-axis coordinates.
Spatial data stored in a relational database is typically indexed using a spatial index that is designed to speed spatial queries. Often, the spatial index is structured as an R-tree. An R-tree is a type of balanced tree data structure that is premised upon the use of bounding boxes. Bounding boxes are regions of multi-dimensional space that span from a minimum to a maximum extent of a related entity or group of entities. Typically, bounding boxes are aligned with the axes of the coordinate system being employed, and are therefore are referred to as axis-aligned bounding boxes (AABBs). For example, in 2-D spatial data, the bounding box of a particular entity may be a rectangle that spans from the entity's minimum x-axis coordinate to its maximum x-axis coordinate, and from its minimum y-axis coordinate to its maximum y-axis coordinate. In the case of 3-D spatial data, the bounding box of a particular entity may be a rectangular prism that spans from the entity's minimum x-axis coordinate to its maximum x-axis coordinate, from its minimum y-axis coordinate to its maximum y-axis coordinate, and from its minimum z-axis coordinate to its maximum z-axis coordinate.
An R-tree at its lowest level has leaf nodes that include one or more individual entities. If the leaf node includes a single entity, the bounding box of the leaf node is simply the bounding box of the individual entity itself. If the leaf node includes multiple entities, the bounding box of the leaf node is a minimum sized box that encloses all the bounding boxes of the individual entities included in the leaf node. At higher levels, an R-tree includes internal nodes that represent groups of leaf nodes or other internal nodes. The bounding box of each internal node is a minimum sized box that encloses all the bounding boxes of the group. At the very highest level of an R-tree is a root node. The bounding box of the root node is a minimum sized box that encloses all the bounding boxes of all the entities indexed by the R-tree.
A search using an R-tree generally begins at the root node, and progress down-ward through the levels of the tree. For example, if a user desires to find entities located within a region of interest (i.e., a query box), a search query is initiated based on the query box. At the root node, a determination is made which lower-level internal nodes have bounding boxes that overlap the query box. The search then proceeds to these lower-level internal nodes, where it is determined which still-lower-level internal nodes have bounding boxes that overlap the query box. The process continues down the R-tree, through as many levels as may be present, until leaf nodes are eventually encountered. The bounding boxes of the individual entities included in the encountered leaf nodes are tested against the query box, and the corresponding entities are placed into a result set for the search query if their bounding box overlaps the query box.
In addition to simple search queries, a variety of other types of operations may be performed using a spatial index in connection with a relational database including spatial data. For example, it is sometimes desirable to find a particular number (“X”) of “best” entities that satisfy both spatial and entity filtering criteria, where “best” is determined based on upon a score. Once found, such “X best” entries may be loaded and used for a variety of purposes.
While it is possible to find a particular number of “best” entities (“X best”) that satisfy both spatial and entity filtering criteria using operators provided in many relational database systems (e.g., SQL database systems), typically such an operation is quite inefficient. A “brute force” approach may be to return all entities that satisfy both the spatial criteria and the entity filtering criteria, calculate a score for each retuned entity, sort the list on score, and truncate the list after the first “X” entries. One way to implement this brute force approach in SQL is via “ORDER BY” and “LIMIT” clauses, combined with a user-function for calculating score. It should be obvious that this approach can be very inefficient and consume large amounts of memory and processing resources. An alternative technique may be to employ an iterative “trial-and-error” approach. In a typical “trial-and-error” approach, an arbitrary threshold is selected for the score. A query is executed in which every entity is compared against the spatial and entity filtering criteria, and its score is compared against the arbitrary threshold. Entities whose score is less than the threshold are rejected. The query result will be a certain number of entities (“Y”) that meet the spatial and entity filtering criteria. The number of entities returned (“Y”) is compared with the desired number of entities (“X”). If more entities are return than desired, the threshold is increased by an increment, and the query re-executed. The process may be repeated until the number of entities returned (“Y”) is less than or equal to the desired number of entities (“X”).
Compounding the general inefficiency of such a “trial-and-error” approach is that its accuracy and inefficiency are not predictable. Different numbers of iterations may be required in different circumstances. The number of iteration may depend on the starting threshold, the increment for the threshold, and the data itself. If the increment is too large, the result may be a number of entities “Y” that is less than “X”, but by an arbitrary amount. In that case, “Y-X” entities are “missed” in the result. If the increment is too small, the number of iterations can be unpredictably large. Many client applications are poorly adapted to dealing with the unpredictable performance resulting from the number of iterations varying.
Accordingly, there is a need for improved techniques that address these and other shortcomings.