1. Field of the Invention
The present invention relates to audio file representation and processing. More specifically, it relates to creating a representation of an audio file using Gabor vectors and comparing those vectors with the Gabor vectors representing another audio file to determine the degree of similarity between the two audio files and to calculate what are referred to herein as observables.
2. Description of the Related Art
In certain industries, there is often a need to compare an audio file with other audio files in order to evaluate the first file. However, comparing two audio files can be time-consuming, error-prone, and expensive, often requiring human intervention. In one scenario, a company has a database of, for example, 30,000 (or considerably more) audio files that vary in length, generally in the range of 15 seconds to 120 seconds but can be longer. In a typical process, the company receives an audio file from a third-party. The first thing the company needs to do is determine whether it has an audio file that is similar, different, or exactly the same as the sample audio file in its database (leaving open for the moment what it means to be exactly the same or different). One reason to do this is so that the company does not have to replicate analysis of the audio file, that is, essentially repeat work already done. If it is in the database, the audio file has already been studied. Another reason to do this is to determine the degree of similarity between the new file and an existing file in the database (e.g. is one a subset of another?) to detect whether one is a derivative and/or complimentary work of another. This is particularly useful for categorization and attribution of the new audio file.
However, as noted, comparing the audio file to a large number of audio files in a database in a meaningful and accurate way is not a trivial task. Those skilled in the art of audio recording and engineering are aware of the numerous issues that can arise when dealing with the comparison of two or more audio files. Some of the more common ones include two audio signals starting at different times. For example, two audio files may have the same content but one starts after x seconds of silence (or “hiss”) and the other starts after y seconds of silence but otherwise the content is the same. There are numerous issues that can arise: 1) when the two files are of vastly different audio quality; 2) one of the files has missing or completely corrupted sections; 3) one of the files has extra audio in the beginning or the end; 4) the two signals have different amplitudes (e.g., the record levels are different such that one plays louder than the other on the same system); 5) one has significantly more noise (hiss, background sound, distortion, etc.) than the other; or 6) any combination of the above. All of these being very common when the same commercial is acquired from two different sources. Comparing the two audio files and arriving at the correct conclusion, that is, that they are the same or that they are meaningfully similar to degree s where s is expressible as a number between zero and one, requires sophisticated processing or human intervention. This is especially important in another typical example in which two audio files are the same except for one, two or just few words (very typical of automotive commercials) that are different. These two audio files should likely be considered duplicates in that any analysis done on one may be used for the other but it should also be made explicit that they are indeed different and by how much. However, if the processing is not intelligent, has insufficient sophistication, uses an insufficiently rich representation, or is overly sensitive to differences that a human would ignore, then even the best “brute force” type algorithm will most likely fail, and they may be regarded as two different audio files. In all of these cases, a simple comparison of the audio files will likely not show that they are duplicates. The processing or comparison would have to be more sophisticated or require manual intervention in order to determine that they are duplicates or near duplicates. Clearly, doing a more sophisticated comparison or one that requires manual intervention on a mass scale (e.g., with 30,000 audio files) would be infeasible in many situations for the company, especially if such comparisons or scans had to be done often.
As noted, one industry or area where such audio comparisons maybe needed is in the advertising field. In one specific area, an advertiser or ad agency has a commercial that it wants tested or evaluated. The commercial has an audio portion which may be comprised of voices, music, sound effects, and so on. The advertiser sends this file to a testing company (“tester”) which has a database of audio files of a large number of commercials, all of which have already been tested. Currently, there is no naming convention or standard with respect to commercials (or their audio segments) so comparing the audio files using labels or names to see if there are duplicates is not presently feasible. To save resources and increase efficiency, neither the advertiser nor the tester wants to evaluate a commercial (or nearly the same commercial) twice. The tester needs to determine whether it already has the commercial or something close to it in its database; that is, it wants to know whether there is a duplicate commercial or a near-duplicate commercial. However, as described above, comparing audio files to make this determination, especially in large scale, is a difficult engineering challenge. This is due in part to some of the factors noted earlier, such as timing, small differences in words, noise, and so on. In many cases an advertiser or ad agency may have numerous ads that need to be tested. Having a human watch or listen to each one and determine whether any of the ads already exist in a database is a laborious task and in many cases is not feasible. It is possible to have a machine, such as a computer, listen to them, store an audio representation in the database and make quick audio comparisons to commercials in the database. However, using current technology, comparing audio files of commercials to find duplicates and near-duplicates, does not produce accurate results.