The following relates to the information processing arts, information storage and retrieval arts, spatial mapping arts, and related arts.
Spatial databases store content with its spatial information maintained. As used herein, the term “spatial” and the like encompasses any of two-dimensional, three-dimensional, four-dimensional, five-dimensional, or more generally d-dimensional space. The term multidimensional space is used herein to denote any of two-dimensional, three-dimensional, four-dimensional, five-dimensional, or more generally d-dimensional space. The term “record” is used herein as a general term encompassing any information having spatial localization respective to a point, area, or other portion of the multidimensional space.
By storing content organized as records with the spatial information maintained, it is possible to retrieve records within a selected area of space, or records that are within a defined distance of a point of interest, or so forth. For example, a spatial database may be used in a geographical information system (GIS), with each record representing a point of interest such as a city, a hotel, a country, a state, a restaurant, or so forth. Information can be retrieved, such as: the identity of all restaurants within a two-mile radius of a current location; a nearest city to a given city; or so forth. Spatial databases are also used in other applications such as in computer graphics, computational geometry applications, in peer-to-peer computing, in time series processing, in efficient feature selection for clustering and categorization applications, and in the efficient storage and indexing of deeply-nested XML documents or other structured documents.
Spatial databases employ spatial indices that enable the content to be retrieved in a systematic fashion. A diversity of spatial indices have been developed, such as quadtrees, octrees, UB-trees, R-trees, k-d trees, nested interpolation-based grid (NIBG) indices, and so forth. These spatial indices partition a multidimensional space into spatial regions each containing no more than b points. Each partition region containing b or fewer points is also sometimes referred to as a “data bucket”. As more points are added to the spatial database, further partitioning may be employed to accommodate the new data points with each data bucket containing no more than b points. Conversely, if data points are removed then a “reverse” partitioning or region-joining process may optionally be employed to combine partitions. Region joining may also be employed for other tasks, such as to simplify the indexing structure. The spatial index enables rapid identification of records from a selected region or regions of the spatial index, enabling rapid retrieval of records defined at least in part by spatial location.
The efficiency of content retrieval using a spatial database is dependent upon the choice of spatial index. Different spatial indices may be more or less efficient for different spatial databases. Further, the computational complexity of a given retrieval operation may be strongly dependent upon the specific spatial locale from which the content is to be retrieved.
Regardless of the choice of spatial index, however, content retrieval is highly computationally intensive for a large spatial database. Moreover, the initial spatial indexing is also computationally complex, making it inconvenient and sometimes impractical to switch or convert to a new type of spatial index.
Accordingly, it is advantageous to choose an efficient spatial index for generating a given spatial database. An intuitive definition of “efficiency” is the average retrieval complexity for a retrieval operation. By choosing a spatial index providing low (ideally, lowest) average retrieval complexity, the efficiency of the resulting spatial database is enhanced.
Unfortunately, existing techniques for estimating or measuring average retrieval complexity are less than ideal. Average case complexity has been estimated based on first principles for a few spatial indices, including k-d trees and quadtrees. See Devroye et al., “An Analysis of Random d-Dimensional QuadTrees”, SIAM J. Comput. vol. 18 no. 5 pp. 821-32(1990); Flajolet et al., “Analytic Variations on Quadtrees”, Acta Informatica vol. 10 pp. 473-500 (1993). For other types of spatial indices, the general solution has heretofore been to execute a (hopefully representative) series of simulations. See Nakamura et al., “A Balanced Hierarchial Data Structure for Multidimensional Data with Highly Efficient Dynamic Characteristics”, IEEE Trans. Knowl. Data Engineering vol. 5 no. 4 pp. 682-94(1993). Simulations are computationally intensive, however, and as an empirical approach do not provide conceptual insight or assure that the average retrieval complexity has been reasonably approximated.