Natural information can best be described by multi-dimensional feature vectors. For example, to identify objects, or video sequences, or bio-molecular structures, or detect actions and behavior, a multi-dimensional search is required on measurements or features of the object or structure or sequence that is detected. Some of the video identification approaches use motion signatures derived from detected motion between frames of a video sequence or description of patches, analogous to visual words, in each frame. Motion signatures for a video sequence can be extracted by using statistical data or object tracking. Another popular method uses a bag of words approach to describe any image or sequence. Such an approach describes the regions around a keypoint or selected patches in a frame as words and hence the information of a frame or video sequence may be indexed on a word by word basis. This approach uses a keypoint detection algorithm to detect points of interest and describe a patch around this keypoint. A well known implementation is the scale invariant feature transform (SIFT) algorithm which uses scale invariant keypoint detection and signature values for an area around the keypoint. Another recent algorithm for detecting keypoints or points of interest is the “Speeded Up Robust Features” (SURF) algorithm. Selected patches may be tracked and connected by visual tubes between frames in some implementations. Visual tubes are abstract tubes connecting the same object across multiple frames. Other video search approaches use color histograms to describe an image or image sequence. However, such approaches do not include unique information about each video and are not generally accurate. The other drawbacks of conventional video search approaches are the size and complexity of the individual signatures generally used, and the absence of an indexing system for these complex signatures. Together these drawbacks impact the size of databases and performance of searching for video sequences through multi-dimensional databases.
Current retrieval systems are generally based on massive parallelization. Documents are organized as one dimensional inverted lists. In a large database with 100 billion (B) documents, a one dimensional inverted index may list as many as 1-10B documents. Further, a multi-dimensional query with 10 inputs will require analysis of all the associated documents listed. This complexity impacts the update time to update new entries into the database, query performance, and thoroughness of querying. Current systems usually need to limit the size of associated documents for practical reasons. As a consequence, all the documents in a database are not generally evaluated. To limit the impact of the above issue on accuracy and performance, most current solutions rely on a technique for dividing the database into smaller sections and then evaluating a few of these sections resulting in better accuracy and performance, but such a techniques are impacted by the size of inverted list documents, and the accuracy is still limited.