Databases that contain geographically-oriented data are becoming increasingly common, in large part fueled by the growth of the Internet and the World Wide Web. Examples include Electronic Yellow Pages (EYP) and classified ad directories that allow users to search for businesses based on some combination of location and non-geographic attributes. In the EYP case, a user might want to locate a specific type of business, or businesses, with specific words in their name, within a given area. An online automobile classified ad system needs to locate cars with specific characteristics within a given distance from a user's home.
As spatial data becomes more common, the requirement that such data be storable and accessible with off-the-shelf hardware and software becomes mandatory. In particular, Relational Database Management systems (RDBMS) are commonly used to store and access large datasets of many varieties, including data with spatial content. RDBMS systems provide security, safety, transactional control, high speed and multi-user access, all of which are important for information systems, including World Wide Web-based information systems. RDBMS systems are also pervasive.
Research has explored access methods that efficiently support the retrieval of spatial data. Such research usually confines itself to exploring data structures and algorithms that efficiently handle the creation and retrieval of data based on spatial attributes, but not spatial attributes in conjunction with non-spatial attributes. Further, most research involves the definition of specialized storage structures and access methods without particular concern over how well the structures and access methods can be overlaid onto those provided by commercial RDBMS systems.
Quadtrees are commonly used to represent spatial data. A quadtree region is represented as a single attribute by interleaving the base-two representations of the X and Y coordinates of a two-dimensional space (more generally an n-dimensional space). Efficient retrieval of spatial data requires the mapping of an n-dimensional space into a single attribute. Without such a mapping, it would be impractical to use modem database systems, including relational database systems, as an access mechanism for large spatial databases. In some circumstances, the performance of database systems can degrade dramatically when concatenated index keys are used. Because a point or a region in space can be represented as a single attribute, quadtrees make effective use of the indexing structures supplied by modem database systems.
In at least one prior art system, when a proximity search is initiated, a search engine determines the set of regions it is prepared to search to locate the entities within the geographic search radius. In addition to proximity, an entity must match other search criteria specified by a user. The search engine examines regions in increasing distance from the search center until enough matching entities are located or all regions in the search radius have been examined. By searching regions in increasing distance from the search center, closer entities will be examined before more distance ones. If the requisite number of entities is found before examining all the regions, some regions will not have to be examined at all if the closest point on the region is further away than the entities that have already been found.
Because the regions are examined solely on the basis of proximity without knowing whether the region contains any relevant entities, the search engine spends considerable time doing unproductive work. In the case of a relational database system, searching a region involves examining an index, and perhaps some associated data buffers to examine attributes in more detail. Index searching is a relatively efficient operation, but the cumulative effect of such searching adds up. For Electronic Yellow Pages, one analysis of query logs consistently show that for keyword queries (e.g., Find businesses in this area with "Central" and "Supplies" in their names), more than 85% of the searched regions do not have any businesses with matching keywords; 58% of the total search time is wasted fruitlessly examining these regions. The numbers are only slightly better for category queries (e.g., Find Restaurants in this area).
What is needed is an improved proximity searching technique that reduces, or even minimizes, the amount of wasted searching.