The advent of highly distributable, high volume data storage has allowed for the storage of vast amounts of information on a variety of topics and in a variety of forms such as text, images, music, and videos.
The problem of efficiently finding similar items in a large corpus of high-dimensional data points arises in many real-world tasks, such as music, image, and video retrieval. Beyond the scaling difficulties that arise with lookups in large data sets, the complexity in these domains is exacerbated by an imprecise definition of similarity.