The advent of highly distributable, high volume data storage has allowed for the storage of vast amounts of information on a variety of topics and in a variety of forms such as text, images, music, and videos.
The availability and ease of communications continues to increase. Accompanying this increase is an interest in combining various communications with additional information. For example, an individual may hear one communication (e.g. a song) and want to know additional information, such as the song title, artist, etc. about what was heard.
The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity. Capturing items can introduce anomalies that are not similar across capture mechanisms and can be affected by the capture environment, adding additional complexity.