1. Field of the Invention
The present invention relates to the field of database management, and more particularly to a method and an apparatus for achieving efficient database indexing structures which permit high-speed access to high-dimensional data points from a large repository of points stored in memory.
2. Description of Related Art
Database management systems are widely accepted as a standard tool for manipulating large volumes of data in secondary storage media. To enable fast access to stored data according to its content, databases typically use structures known as indexes. Although indexes are optional because data can always be located by an exhaustive search, indexes are the primary means of reducing the volume of data that must be retrieved and processed in response to a query. Therefore, in practice, large database files must be indexed to satisfy performance requirements.
Recent years have seen an explosive growth in use of new database applications such as CAD/CAM systems, spatial information systems, and multimedia information systems. The needs of these applications are far more complex than traditional business applications. In particular, data objects are typically represented as high-dimensional points. Traditional indexing techniques such as the B-tree and its variants, which are single-dimensional indexing structures, do not efficiently support such new database applications, thereby requiring the design of new and more complex indexing mechanisms.
Consequently, many indexing methods for multi-dimensional data have been developed, including hierarchical tree structures (such as R-trees), linear quad-trees, and grid-files. Although hierarchical tree structures perform well when the tree nodes exhibit a large degree of fan-out, with an increasing number of dimensions, a low degree of fan-out contributes to increased overlap between node entries and increased tree height, resulting in rapid deterioration in performance. Linear quad-trees and grid-files also work well for low dimensionalities, but the response time explodes exponentially for high dimensionalities. In fact, for high dimensionality, sequential scanning becomes more efficient.
Recent efforts have sought to address these problems by reducing the dimensionality of the indexing attribute so that one direction corresponds to projecting high-dimensional points on a hyperplane containing the axis. One such method (e.g., that disclosed by Friedman, et al. An Algorithm For Finding Nearest Neighbors, IEEE Transaction on Computers, Vol C-24, pp.1000–1006) truncates high dimensional data. Searching on projections, however, produces false drops, which can reduce the effectiveness of the technique. Another recent method groups high-dimensional data into smaller buckets so that a search can be performed by sequentially scanning the smaller number of buckets. This approach is not expected to scale for large amounts of high-dimensional data, as the number of buckets will be too large to allow efficient searching.
Therefore, there is a need for indexing technique which reduces the dimensionality of a high-dimensional database, while at the same time ensuring that objects are not missed and false drops do not frequently occur when answering a query.