A geospatial join is a costly operation is typically handled like a product join in a relational database system. For example, in the geospatial join query below:
SELECT*
FROM Store, State
WHERE Store.location.ST_WITHIN (State.geo)=1;
The typical approach is to do a full product join on the Store and State table, then filter the result with the geospatial join predicate.
Some existing techniques have a geospatial nested join algorithm that works as follows: assuming that there is a geospatial index on the store location (Store.location), a state table on all Application Module Processors (AMPs) of a multiprocessor architecture, the geospatial index on Store.location is processed to narrow down the rows that would satisfy the geospatial join predicate.
However, this approach is still costly. One of the tables (state in the case above) must be fully duplicated on all AMPs. This operation is very costly in terms of both spool space and transmission cost. A lot of time elapses when both tables are large, and the system may not have enough spool space to accommodate such a join. The problem is exacerbated as the number of AMPs in a multiprocessor relational database management (RDBMS) architecture increases.
Additionally, the approach mentioned above has to execute other single table predicates on Store table AFTER the index search on the Store.location. This is despite the possibility of more restrictive single table predicates that could occur on the store table (for instance via a different index).
Therefore, there is a need for improved geospatial join processing within a RDBMS.