The field of audio signal classification is well developed and has many commercial applications. Audio classifiers are used to recognize or discriminate among different types of sounds. Classifiers are used to organize sounds in a database based on common attributes, and to recognize types of sounds in audio scenes. Classifiers are used to pre-process audio so that certain desired sounds are distinguished from other sounds, enabling the distinguished sounds to be extracted and processed further. Examples include distinguishing a voice among background noise, for improving communication over a network, or for performing speech recognition.
Additionally, there are various forms of audio signal recognition and identification in commercial use. Particular examples include audio watermarking and audio fingerprinting. Audio watermarking is a signal processing field encompassing techniques for embedding and then detecting that embedded data in audio signals. The embedded data serves as an auxiliary data channel within the audio. This auxiliary channel can be used for many applications, and has the benefit of not requiring a separate channel outside the audio information.
Audio fingerprinting is another signal processing field encompassing techniques for content based identification or classification. This form of signal processing includes an enrollment process and a recognition process. Enrollment is the process of entering a reference feature set or sets (e.g., sound fingerprints) for a sound into a database along with metadata for the sound. Recognition is the process of computing features and then querying the database to find corresponding features. Feature sets can be used to organize similar sounds based on a clustering of similar features. They can also provide more granular recognition, such as identifying a particular song or audio track of an audio visual program, by matching the feature set with a corresponding reference feature set of a particular song or program. Of course, with such systems, there is a potential for false positive or false negative recognition, which is caused by variety of factors. Systems are designed with trade-offs of accuracy, speed, database size and scalability, etc. in mind.
This document describes a variety of inventions in audio watermarking and audio signal recognition that reach across these fields. The inventions include electronic audio signal processing methods, as well as implementations of these methods in devices, such as computers (including various computer configurations in mobile devices like mobile phones or tablet PCs).
One category of invention is the use of audio classifiers to optimize audio watermark embedding and detecting. For example, audio classifiers are used to determine the type of audio in an audio segment. Based on the audio type, the watermark embedder is adapted to optimize the insertion of a watermark signal in terms of audio perceptual quality, watermark robustness, or watermark data capacity. The watermark embedder is adapted by selecting a configuration of watermark type, perceptual model, watermark protocol and insertion function that is best suited for the audio type. In some embodiments, the classifier determines noise or other types of distortion that are present in the incoming audio signal (“detected noise”), or that are anticipated to be incurred by the watermarked audio after it is distributed (“anticipated noise”). These detected and anticipated noise types are used in selecting the configurations of the watermark embedder. Similar classifiers are used in the detector to provide an efficient means to predict the watermark embedding that has been applied, as well as detected noise in the signal for noise mitigation in the watermark detector. Alternatively or additionally, the watermark may convey information about the variable watermark protocol in a component of the watermark signal.
Another category of invention is watermark signal design, which provides a variety of different watermarking embedding methods, each of which can be adapted for the application or audio type. These watermark signal designs employ novel modulations schemes, support variable protocols, and operate in conjunction with novel perceptual modeling techniques. They also, in some implementations, are integrated with audio fingerprinting.
Other categories of invention are novel watermark embedder and detector processing flows and modular designs enabling adaptive configuration of the embedder and detector. These categories include inventions where objective quality metrics are integrated to simulate subjective quality evaluation, and robustness evaluation is used to tune the insertion of the watermark. Various embedding techniques are described that take advantage of perceptual audio features (e.g., harmonics) or data modulation or insertion methods (e.g., reversing polarity, pairwise and pairwise informed embedding, OFDM watermark designs).
Another category of invention is detector design. Examples include rake receiver configurations to deal with multipath in ambient detection, compensating for time scale modifications, and applying a variety of pre-filters and signal accumulation to increase watermark signal to noise ratio.
Another category of invention is signal pre-conditioning in which an audio signal is evaluated and then adaptively pre-conditioned (e.g., boosted and/or equalized to improve signal content for watermark insertion).
Some of these inventions are recited in claim sets at the end of this document. Further inventions, and various configurations for combining them, are described in more detail in the description that follows. As such, further inventive features will become apparent with reference to the following detailed description and accompanying drawings.