Vectors are commonly used to represent the feature space of various phenomena. For example, vectors are used to represent the features of images, videos, audio clips, and other media. It should be noted that the utility of vector space operations is not limited to digital media, but may additionally be applied to other data, to physical objects, or to any other entity capable of feature representation. In the media space, features include color distributions (using, for example, 4×4 pixel hue and saturation histograms), the mean and variance of color intensities across color channels, color intensity difference inside and outside of pixel rectangles, edges, mean edge energy, texture, video motion, audio volume, audio spectrogram features, the presence of words or faces in images, or any other suitable media property.
Vector space representations are particularly useful in the classification, indexing, and determination of similarity in digital media; determining the distance between digital media feature vectors is fundamental to these operations. The manual classification and indexing of digital media requires a human operator, and results in, for large media collections, prohibitively expensive and expansive operations. Further, similarity search within a large media library requires analysis of all entries in the library, and even automated library analysis requires processing resource-intensive capabilities. Unfortunately, high-dimensional feature vectors of digital media are also prone to noise, reducing the effectiveness of vector distance determinations on such vectors, and reducing the ability to detect vector distance differences resulting from changes to a small number of vector features.
Many data classification tasks rely on vector space representations to represent the particular data of interest. One common data classification operation involves determining the similarity between two data objects. Using a vector space representation of the data objects allows a determination of similarity to be made based on the distance, such as the Euclidean distance, between the two vectors, such as coordinate vectors, representing the data objects. A change in the value of single vector component has an effect on the distance between the vectors that is inversely proportional to the number of dimensions of the vectors. Thus, the larger the number of dimension in a vector, the smaller the effect changes in a single vector component has on the distance between the vectors.
In use, the elements of vectors in vector space operations are susceptible to noise, whether naturally occurring or otherwise. As the number of dimensions in a vector space increases, the determination of the distance between two vectors is increasingly affected by the compounding of noise affecting individual elements of the vectors. The magnitude of the compounded noise in distance determinations may exceed the magnitude of the change in distance determinations as a result of changes to a single vector dimension at high dimensional vector spaces. This is problematic in instances where it is desirable to measure the change in distance between vectors caused by the change of a small number of elements in the vectors.