1. Field of the Invention
This invention relates to executing a distributed spatial data query. More specifically, the invention relates to creating a homogenous, uniform data spatial source from distributed, heterogeneous spatial and non-spatial data sources and executing a spatial data query in such an environment.
2. Description of the Related Art
Spatial data (or, as it is sometimes known, geoinformation) is data which provides real world position information. In many instances, it is one piece of data in a set related to a particular object. For example, a restaurant object may include data concerning the restaurant name, type of food served, menu, and hours of operation. It may additionally contain spatial data which indicates its position relative to other objects in a real spatial world. Spatial data may include point data (for example, the restaurant may be represented as a point), line data (e.g., a highway or street), polygon data (e.g., a lake or a park), or others. Spatial data constitutes a distinct data type which can be the object of, and result from, spatial functions and predicates.
A spatial database is capable of storing and understanding spatial data and also providing a user with information concerning how different spatial objects relate to one another; for example, typical database functions allow a user to determine the distance between two points, determine the area of a polygon object, create new spatial objects, and make true or false type queries concerning the data (e.g., is there a residence for sale within one mile of my workplace?).
The importance of spatial data and effective methods for providing spatial information to clients is continuing to grow. Services providing users with spatial information rely heavily on the ability of a user to interact with the data, often in the form of queries. A client may want to know how many Chinese restaurants are within a 1 mile radius of a current location. As services making use of spatial data grow, the need to provide effective, quick answers to queries also grows.
In many instances, the spatial data is not located in a single database; rather, it is located in a number of different databases which are not necessarily of the same type. This poses a significant problem to an entity which wants to provide clients with a service that requires access to data from various data sources. For example, a real estate service may wish to provide clients with information about homes listed in a particular city and also provide relevant information such as the proximity of schools, school district information, and the proximity of registered sex offenders. However, the real estate listings may be stored as spatial information in an Oracle database, the school information as spatial information in a Sybase database, and the sex offender registry as a flat file which contains addresses, but does not have a computer-recognizable spatial data type within it.
Building a solution that allows a client to access these three data sources and use them to run searches presents a problem. One possible solution is to extract, transform, and load (ETL) the data into a single spatial database and then allow the client to run queries against the single database. This, however, creates a number of problems. First, ETL wastes resources; the solution requires storing locally information that is already stored elsewhere. This process of storing the same information again also creates a significant processing burden on the database, a process which repeats itself with each necessary update to the consolidated database. In addition, the data becomes stale as time passes—because the consolidated database does not reflect the real-time status of the data in the source databases, as time passes the likelihood that the data is no longer correct increases. Finally, extracting, transforming, and loading data is not a trivial task, particularly where the data is stored in distributed, heterogeneous databases.