1. Field of the Invention
The present invention generally relates to system and method for audio signal processing. More particularly, the present invention relates to a system and method for extracting characteristic audio signal values, calculating signature metrics therefor, and associating custom markers with signal sources for matching purposes.
2. Brief Description of the Prior Art
As is noted in U.S. Pat. No. 7,516,074 (the '074 Patent), which issued to Bilobrov, variations in file formats, compression technologies, and other methods of representing data, the ability to quickly identify an otherwise uniform data signal for comparison purposes often raises certain technical difficulties. The relatively recent prior art developed in this field of inquiry is somewhat well-developed, and typically requires a great deal of data extraction in order to properly match audio files with one another. Some of the more pertinent prior art relating to (audio) file matching techniques and the like are briefly described hereinafter.
U.S. Pat. No. 6,990,453 ('453 Patent), which issued to Wang et al., discloses a System and Method of Recognizing Sound and Music Signals in High Noise and Distortion. The '453 Patent describes a method for recognizing an audio sample locates an audio file that most closely matches the audio sample from a database indexing a large set of original recordings. Each indexed audio file is represented in the database index by a set of landmark time points and associated fingerprints. Landmarks occur at reproducible locations within the file, while fingerprints represent features of the signal at or near the landmark time points. To perform recognition, landmarks and fingerprints are computed for the unknown sample and used to retrieve matching fingerprints from the database.
For each file containing matching fingerprints, the landmarks are compared with landmarks of the sample at which the same fingerprints were computed. If a large number of corresponding landmarks are linearly related, i.e., if equivalent fingerprints of the sample and retrieved file have the same time evolution, then the file is identified with the sample.
The method can be used for any type of sound or music, and is particularly effective for audio signals subject to linear and nonlinear distortion such as background noise, compression artifacts, or transmission dropouts. The sample can be identified in a time proportional to the logarithm of the number of entries in the database; given sufficient computational power, recognition can be performed in nearly real time as the sound is being sampled.
U.S. Pat. No. 7,277,766 ('766 Patent), which issued to Khan et al., discloses a Method and System for Analyzing Digital Audio Files. The '766 Patent describes a method and system for analyzing audio files wherein plural audio file feature vector values are based on an audio file's content are determined and the audio file feature vectors are stored in a database that also stores other pre-computed audio file features. The process determines if the audio files feature vectors match the stored audio file vectors. The process also associates a plurality of known attributes to the audio file.
The '074 Patent describes a technique for extracting an audio fingerprint from an audio sample, where the fingerprint contains information that is characteristic of the content in the sample. The fingerprint may be generated by computing an energy spectrum for the audio sample, resampling the energy spectrum logarithmically in the time dimension, transforming the resampled energy spectrum to produce a series of feature vectors, and computing the fingerprint using differential coding of the feature vectors. The generated fingerprint can be compared to a set of reference fingerprints in a database to identify the original audio content.
U.S. Pat. No. 7,549,052 ('052 Patent), which issued to Haitsma et al., discloses a method for Generating and Matching Hashes of Multimedia Content. The disclosed method generates robust hashes for multimedia content, for example, audio clips. The audio clip is divided into successive frames. For each frame, the frequency spectrum is divided into bands. A robust property of each band is computed and represented by a respective hash bit.
An audio clip is thus represented by a concatenation of binary hash words, one for each frame. To identify a possibly compressed audio signal, a block of hash words derived therefrom is matched by a computer with a large database. Such matching strategies are also disclosed. In an advantageous embodiment, the extraction process also provides information as to which of the hash bits are the least reliable. Flipping these bits considerably improves the speed and performance of the matching process.
U.S. Pat. No. 7,624,012 ('012 Patent), which issued to Pachet et al., discloses a Method and Apparatus for Automatically Generating a General Extraction Function. The '012 Patent describes a general function generator operative on an input signal to extract from the latter a value of a global characteristic value expressing a feature of the information conveyed by that signal.
It operates by generating at least one compound function, said compound function being generated from at least one of a set of elementary functions by considering the elementary functions as symbolic objects, operating said compound function on at least one reference signal having a pre-attributed global characteristic value serving for evaluation, by processing the elementary functions as executable operators, determining the matching between: i) the value(s) extracted by said compound function as a result of operating on said reference signal and, ii) the pre-attributed global characteristic value of said reference signal, and selecting at least one compound function on the basis of the matching to produce the general extraction function. The invention can be used, for instance, for the automatic extraction of audio/music descriptors from their signals contained as music file data. Notably, the method utilizes means and variances.
U.S. Pat. No. 7,627,477 ('477 Patent), which also issued to Wang et al., discloses a Robust and Invariant Pattern Matching technique. The '477 Patent describes an innovative technique for rapidly and accurately determining whether two audio samples match, as well as being immune to various kinds of transformations, such as playback speed variation. The relationship between the two audio samples is characterized by first matching certain fingerprint objects derived from the respective samples. A set of fingerprint objects based on audio sample amplitude information is generated for each audio sample.
Each location is determined in dependence upon the content of the respective audio sample and each fingerprint object characterizes one or more local features at or near the respective particular location. A relative value is next determined for each pair of matched fingerprint objects. A histogram of the relative values is then generated. If a statistically significant peak is found, the two audio samples can be characterized as substantially matching.
U.S. Pat. No. 7,707,425 ('425 Patent), which was issued to Mihcak et al. (and assigned to Microsoft), discloses a Recognizer of Content of Digital Signals. The '425 Patent describes a computer-implemented method facilitating identification of a digital signals comprising the steps of obtaining a digital signal; deriving an identification value representative of the digital signal such that perceptually distinct digital signals result in identification values that are approximately independent of one another and perceptually digital signals result in identical identification values, wherein the deriving comprises a series of steps.
The digital signal is transformed into a digital signal transform whereafter the digital signal transform is randomly divided into multiple chunks, each chunk containing signal data, wherein the dividing is carried out recursively to form hierarchical levels of overlapping chunks. Each of the chunks is averaged and the signal data produce corresponding chunk averages. An exponential distribution having multiple distinct quantization levels is generating based, in part, on the chunk averages. Each of the chunk averages is randomly rounded to one of the quantization levels to produce rounded values; a composite of the rounded values is then hashed; and the digital signal using the identification value is indexed.
U.S. Pat. No. 7,715,934 ('934 Patent), which was issued to Bland et al., discloses a technique for Identification of Input Files using Reference File. The '934 Patent describes an input profile which is generated from an input audio file using a measurable attribute that was also used to generate reference profiles from reference audio files.
The input profile is then subjected to a process that was also used to generate a reference profiles tree, which is structured as a sparse binary tree, from the reference profiles. As a result of the process, information of reference profiles having similar characteristics as the input profile, with respect to the measurable attribute, are retrieved from resulting nodes of the reference profiles tree. The input profile is then compared with this subset of the reference profiles, representing potential matches, to determine that either it matches one of the reference profiles, or that it is a spoof, or that it does not match any of the reference profiles.
United States Patent Application Publication No. 2003/0195851, which was authored by Ong, describes a system and method for managing distribution of digital audio content employing vector encoding of audio content representing segments of the audio waveform. High frequency vectors are discriminated by their amplitude increment per short traversal times exceeding a predetermined level, and are flagged in the audio data file. A distributor or host ID code is embedded in the audio data file designating the authorized source or host environment for playback of the audio data file. A vector-decoding-enabled player associated with the authorized host is allowed to playback the audio data file with full quality and an unlimited number of times.
If the audio data file is copied or downloaded to a new host environment, then the player associated with the new host will detect that the host ID code embedded in the audio data file does not match the new host, and will playback only the low frequency vectors of the audio data file and only for a limited number of times. The recipient of the audio data file is required to log-on to an online registration site and pay a license fee in order to obtain a host-ID code for the audio data file matching the current host environment in order to have full usage rights. The system allows multiple users to sample or share copies of the vector-encoded audio data files on peer-to-peer networks without infringing the rights of copyright holders. An improvement for flattening out noisy input signals is also provided in the method of vector encoding of the audio waveforms.
United States Patent Application Publication No. 2004/0215447, which was authored by Sundareson, describes an audio file which is divided into frames in the time domain and each frame is compressed, according to a psycho-acoustic algorithm, into file in the frequency domain. Each frame is divided into sub-bands and each sub-band is further divided into split sub-bands. The spectral energy over each split sub-band is averaged for all frames. The resulting quantity for each split sub-band provides a parameter.
The set of parameters can be compared to a corresponding set of parameters generated from a different audio file to determine whether the audio files are similar. In order to provide for the higher sensitivity of the auditory response, the comparison of individual split sub-bands of the lower order sub-bands can be performed. Selected constants can be used in the comparison process to improve further the sensitivity of the comparison. In the side-information generated by the psycho-acoustic compression, data related to the rhythm, i.e., related percussive effects, is present. The data known as attack flags can also be used as part of the audio frame comparison.
United States Patent Application Publication No. 2007/0220592, which was authored by Muehlbauer, describes a computer system and method executing artificial intelligence that audits media files (audio, video and graphical image, and/or other content) submitted for a Universal Media Code (UMC) database cataloging to minimize duplicate claims of ownership.
In some embodiments, during the cataloging of media files into the UMC database, the system performs a comparison of the description, location, file format and fingerprint with other UMC database content. Once the new UMC media file is declared unique by the system, the UMC record is enabled for Internet distribution. If a question of duplication arises, the system notifies the audit administrator who will manually take over the investigation and enabling process.
United States Patent Application Publication Number 2007/0276668, which was authored by Xu et al., describes a method and apparatus for accessing at least one audio file from a collection comprising more than one audio file stored within or accessible with an electronic device. The method includes generating one index comprising information entries obtained from each of the more than one audio file in the collection, with each audio file in the collection information being linked to at least one information entry; receiving a vocal input during a voice reception mode; converting the vocal input into a digital signal using a digital-analog converter; analyzing the digital signal using frequency spectrum analysis into discrete portions; and comparing the discrete portions with the entries in the index. It is advantageous that the audio file is accessed when the discrete portions substantially match at least one of the information entries in the index. It is preferable that the discrete portions are either musical notes or waveforms.
United States Patent Application Publication Number 2008/0249982, which was authored by Lakowske, describes certain systems and methods for identifying audio files (e.g., music files) with user-established search criteria. The systems and methods allow a user to use an audio file to search for audio files having similar audio characteristics. The audio characteristics are identified by an automated system using statistical comparison of audio files. The searches are preferably based on audio characteristics inherent in the audio file submitted by the user.
United States Patent Application Publication Number 2009/0205483, which was authored by Kim, describes a music recognition method based on harmonic features and a motion generation method for a mobile robot. It is here being cited based on its aspect of extracting harmonic peaks for further usage. The music recognition method preferably includes: extracting harmonic peaks from an audio signal of an input song; computing a harmonic feature related to the average of distances between extracted harmonic peaks; and recognizing the input song by harmonic component analysis based on the computed harmonic feature.
The motion generation method for a mobile robot includes: extracting a musical feature from an audio signal of an input song; generating an initial musical score after identifying the input song on the basis of the extracted musical feature; generating a final musical score by synchronizing the initial musical score and musical feature together; and generating robot motions or a motion script file by matching a motion pattern of the mobile robot with the final musical score.
From a consideration of the foregoing, it will be noted that the prior art appears to be silent on a technique for extracting summary data from audio file amplitude information comprising twenty-two (22) characteristic matching metrics for providing distinct audio file signatures for significantly decreasing the time associated with proper file matching. Accordingly, the prior art perceives a need for a fast-match method of the foregoing type, which method or technique is described and/or summarized in more detail hereinafter.