1. Technical Field
The invention is related to identifying similarity between music objects, and in particular, to a system and method for using a set of music similarities, expressed as a graph with weighted links, for constructing a set of coordinate vectors, with the coordinate vectors then being used for approximating a similarity distance between any two or more music objects.
2. Related Art
One of the most reliable methods for determining similarity between two or more pieces of music is for a human listener to listen to each piece of music and then to manually rate or classify the similarity of that particular piece of music to other pieces of music. Unfortunately, such methods are very time consuming and are limited by the library of music available to the person that is listening to the music.
This problem has been at least partially addressed by a number of conventional schemes by using collaborative filtering techniques to combine the preferences of many users or listeners to generate composite similarity lists. In general, such techniques typically rely on individual users to provide one or more lists of music or songs that they like. The lists of many individual users are then combined using statistical techniques to generate lists of statistically similar music or songs. Unfortunately, one drawback of such schemes is that less well known music or songs rarely make it to the user lists. Consequently, even where such songs are very similar to other well known songs, the less well known songs are not likely to be identified as being similar to anything. As a result, such lists tend to be more heavily weighted towards popular songs, thereby presenting a skewed similarity profile.
Other conventional schemes for determining similarity between two or more pieces of music rely on a comparison of metadata associated with each individual song. For example, many music type media files or media streams provide embedded metadata which indicates artist, title, genre, etc. of the music being streamed. Consequently, in the simplest case, this metadata is used to select one or more matching songs, based on artist, genre, style, etc. Unfortunately, not all media streams include metadata. Further, even songs or other media objects within the same genre, or by the same artist, may be sufficiently different that simply using metadata alone to measure similarity sometimes erroneously results in identifying media objects as being similar that a human listener would consider to be substantially dissimilar. Another problem with the use of metadata is the reliability of that data. For example, when relying on the metadata alone, if that data is either entered incorrectly, or is otherwise inaccurate, then any similarity analysis based on that metadata will also be inaccurate.
Still other conventional schemes for determining similarity between two or more pieces of music rely on an analysis of the beat structure of particular pieces of music. For example, in the case of heavily beat oriented music, such as, for example, dance or techno type music, one commonly used technique for providing similar music is to compute a beats-per-minute (BPM) count of media objects and then find other media objects that have a similar BPM count. Such techniques have been successfully used to identify similar songs. However, conventional schemes based on such techniques tend to perform poorly where the music being compared is not heavily beat oriented. Further, such schemes also sometimes identify songs as being similar that a human listener would consider as being substantially dissimilar.
Another conventional technique for inferring or computing audio similarity includes computing similarity measures based on statistical characteristics of temporal or spectral features of one or more frames of an audio signal. The computed statistics are then used to describe the properties of a particular audio clip or media object. Similar objects are then identified by comparing the statistical properties of two or more media objects to find media objects having matching or similar statistical properties. Similar techniques for inferring or computing audio similarity include the use of Mel Frequency Cepstral Coefficients (MFCCs) for modeling music spectra. Some of these methods then correlate Mel-spectral vectors to identify similar media objects having similar audio characteristics.
Still other conventional methods for inferring or computing audio similarity involve having human editors produce graphs of similarity, and then using conventional clustering or multidimensional scaling (MDS) techniques to identify similar media objects. Unfortunately, such schemes tend to be expensive to implement, by requiring a large amount of editorial time. Further, these conventional MDS-based techniques also typically require large amounts of computational overhead.
For example, well known conventional MDS algorithms, such as “ALSCAL,” or “Isomap,” to name only two of many, typically apply an MDS algorithm to a sparse matrix of dissimilarities and then use the results to find vectors whose inter-vector distances are well matched to the dissimilarities. In other words, identifying a matrix of artists and/or music as the sparse matrix, and then using conventional MDS techniques for embedding the artists/music into a low-dimensional space allows similarities between any two or more artists/music to be determined. Unfortunately, the computational complexity of the embedding techniques employed by these methods typically inhibits their use on large data sets which can potentially include many thousands of music artists and potentially millions of songs.
Therefore, what is needed is a system and method for efficiently identifying similar media objects such as songs or music. Further, such a system and method should be capable of operation without the need to perform computationally expensive audio matching analyses. Finally, this system and method should be capable of quickly embedding potentially very large sparse graphs of music similarity (i.e., large data sets of artists and songs) into a multi-dimensional space while reducing computational overhead.