1. Technical Field
The present invention relates generally to managing data in a multi-dimensional space and, more specifically, to a system and method for indexing data by partitioning the data in each dimension into a plurality of grids and searching the indexed data to find similarity candidates of target data based on matching grid values.
2. Description of Related Art
With the increasing availability of large repositories of high dimensional data in numerous application domains, techniques for indexing such data and providing efficient, real-time, query processing and searching of such data are becoming increasingly important. There are various techniques that are utilized or have been proposed for indexing and searching multi-dimensional data. One such technique is similarity indexing and searching which, in general, involves finding the closest record to a given target record in a multi-dimensional space based on some predefined similarity measure. This technique may be employed in numerous domains such as spatial databases, multimedia systems, data mining, and image retrieval.
By way of example, in the World Wide Web (WWW) environment, similarity document searches may be implemented wherein each document is represented as a vector of distinct words with an associated frequency count for each word. Moreover, in an E-commerce environment, a collaborative filtering technique may be employed to identify peer groups in order to make more effective product recommendations. A general idea of collaborative filtering is to have each user provide ratings on a set of products and services, and then identify other users with similar taste to form peer groups so that the collective group knowledge can be shared within a peer group. With this method, each user can be represented by a vector of products with an associated rating for each product. The above examples illustrate the need for implementing high dimensional indexing and searching, since the number of distinct words or products can be in the tens of thousands. Thus, similarity indexing and searching techniques, which can handle high dimensional data space, are important for applications requiring real-time processing.
Traditional indexing techniques generally work well for very low dimensional problems, but degrade rapidly with increasing dimensionality, so that each query requires the access of almost all of the data. Accordingly, an efficient and accurate similarity indexing and searching technique which can handle high dimensional data without having to fully specify each data point in every dimension, and which provides real-time searching capability, is highly desirable.