1. Field of the Invention
The embodiments of the invention relate to location-based applications, and more particularly to a method for associating user related data with spatial hierarchy identifiers for efficient location-based processing. Although embodiments of the invention are suitable for a wide scope of applications, it is particularly suitable for improving the performance and scalability of location-based search and location-based analytical processing.
2. Discussion of the Related Art
In general, existing location-based applications require too much processing because of the calculations required by the geospatial indexes typically used by location-based applications.
A location can be represented by a street address. Before a location-based search can be performed, the street address must be converted into a form than can be used by the search. Typically, a location-based search algorithm would expect to receive a latitude/longitude or possibly a set of latitude/longitude points that make up some shape. Converting a street address into a multi-dimensional coordinate system such as latitude/longitude is inefficient because of the slow geo-coding process required to convert the address. The geo-coding process requires searching through a very large data set of address range information indexed by street name, city, state and/or zip code. Once a matching street segment and address range is found, the multi-dimensional position is interpolated from the start and end coordinates stored with the street segment. Performance of online and mobile applications would not be acceptable to users if this geo-coding process were required for every location-based operation. Typically, the geo-coding process is performed once and the resulting multi-coordinate system representation used for any subsequent location-based operations.
Locations can also be represented in several multi-dimensional coordinate systems, the most prominent example being a geographic coordinate systems represented by latitude and longitude. The common approach in the art is to represent all data and input parameters using a multi-dimensional coordinate system. This includes location-based data such a business locations, user locations, results presented to a user and associated with a location, and actions taken by users at a location. A geospatial index such as an Rtree or QuadTree is typically used to perform operations such as location-based search on geospatial data sets. Searches of the tree indexes associated with Rtree and QuadTree begin at the root node of the associated tree data structure. The boundaries of each child node are compared with the incoming point or shape to determine which child nodes must be searched to find all the possible matches. Each child node that matches the incoming point or shape is searched recursively. The calculations to determine whether a child node matches an input point or shape are computationally expensive. While the index provides the primary filtering, the secondary filtering required to check each data item held in the matching child nodes against the input point or shape is also very computationally expensive. In order to gain some amount of scalability, geospatial indexes can simply be replicated across multiple computers. However, replication is not an effective method of scaling because as the data sets get large, the performance of the index deteriorates. Each replicated index has the same poor performance. Existing indexes typically use disk to store the data, which can be a thousand times or more slower than memory access. Unfortunately, existing indexes do not support efficient distribution, partitioning and load-balancing across multiple computers and therefore must use disk storage to handle a large amount of data. To handle a large number of users, the only existing solution is to replicate the entire index and all of its data held in disk storage across multiple computers. Even if the indexes could be efficiently distributed, partitioned, and load-balanced across multiple computers, the input point or shape would have to be initially processed to determine which child nodes in the index contained candidates. If the input point or shape was represented using multi-dimensional coordinates, the initial request processor would have to have a copy of the index so that it could calculate which child nodes contained candidates. The initial request processor's copy of the index would not have to hold the actual data, but the initial request processor would have to know the boundaries of each child node. Given the difficulties described, existing indexing methods can not be efficiently distributed, partitioned, and load-balanced across multiple computers and therefore can not efficiently scale to large data sets and a large number of users.
After a location-based search has taken place and a user has received the results, the user may perform some action upon one or more of the results. Capturing the results and the actions associated with the location(s) initially provided to the search is desirable for analysis as to the relevance of a result at a specific location. However, capturing the location associated with the results and subsequent actions using only multi-dimensional coordinates is not desirable because of the inefficient indexing already described. Capturing all of the results and actions taken by users over time will generate very large data sets and require more efficient processing than can be provided by existing geospatial indexing methods. Multi-coordinate system representations of location are not sufficient for efficient processing because of the limitations of the existing indexing methods, including the inability to distribute, partition or load-balance the indexes across multiple computers. The association of a user locations and actions taken by a user at those locations must be sufficient to allow for efficient processing of location-based operations as well as for timely analytical processing of a very large number of collected user actions.