This invention relates to apparatus and method for recognizing signals, and in particular to apparatus and method for recognizing signals by utilizing statistical moments of sampled signal values to produce feature vectors, and to quantization of the feature vectors in order to compare the signal to a predetermined signal data base, and to derive the signal data base.
While the present invention will be described with respect to a system for recognizing broadcast signals such as music, it is to be understood that the teachings of this application are applicable to a broad spectrum of signal recognition fields.
The accurate recognition of broadcast signals is important to marketing executives, royalty collection agencies, music promoters, etc. It is well known that a wide variety of legal, economic, and social concerns require the regular monitoring of broadcast information. All such requirements share a common need for certain information such as which information is broadcast and when. In the prior art, broadcast stations were monitored manually by a plurality of listeners who would physically monitor the broadcast program and manually tabulate which information was broadcast at what time. Problems of reliability and cost have stimulated the effort toward realizing automated broadcast signal recognition systems. An initial automated method included encoding a unique cue signal in each song, and then monitoring each broadcast station to detect the cue signal. However, the associated encoding and decoding circuitry is expensive and complicated, and government regulatory agencies are adverse to providing additional bandwidth necessary for providing a large plurality of unique cue signals.
A further advance in the field of automated broadcast signal recognition is disclosed in U.S. Pat. No. 3,919,479 to Moon et al. In Moon et al., an audio signal is digitally sampled to provide a reference signal segment which is stored in a reference library. Then, when the audio signal is broadcast, successive portions thereof are digitized and compared with the reference segment in the library. The comparison is carried out in a correlation process which produces a correlation function signal. If the reference and broadcast signal segments are not the same, a correlation function with a relatively small amplitude results. On the other hand, if the reference and broadcast signal segments are relatively the same, a large correlation function signal is produced. The amplitude of the correlation function signal is sensed to provide a recognition signal when the amplitude exceeds a predetermined threshold level.
However, the single segment correlation system of Moon et al. is subject to signal drop-out which may disable the system altogether. Also, the Moon et al. system is relatively susceptible to time-axis variations in the broadcast information itself. For example, it is known that many disc-jockeys "compress" broadcast songs by speeding-up the drive mechanism. It is also known that other disc-jockeys regularly "compress" and/or "stretch" broadcast information to produce certain desired effects in the audience. Moon et al. attempts to overcome such time-axis variations by reducing the bandwidth of the broadcast signal by envelope-detecting the broadcast signal and providing envelope signals having substantially low, and preferably sub-audio, frequency signal components. It has been discovered that when the envelope signal at sub-audio frequencies is used during the correlation process, the digitally sampled waveforms are less sensitive to time-axis variations. However, the improvements which can be achieved by such a solution are very limited and will only operate for broadcast signals which have been "compressed" or "stretched" by a small amount. In addition, such a solution is subject to high false alarm rates. These disadvantages make the Moon et al. system less than desirable for a rapid, accurate, and inexpensive automatic broadcast signal recognition system.
A further advance in the automatic signal recognition field is disclosed in U.S. Pat. No. 4,450,531 to Kenyon et al. The same Mr. Kenyon is the sole inventor of the subject application, and the teachings of the '531 patent are hereby incorporated into this application by reference. The system of the '531 patent successfully addresses the reliability problems of a single segment correlation system and the time-axis variation problems experienced by prior systems. In the '531 patent, a plurality of reference signal segments are extracted from a program unit (song), digitized, Fourier transformed, and stored in a reference library in a frequency domain complex spectrum. The received broadcast signal is then prefiltered to select a frequency portion of the audio spectrum that has stable characteristics for discrimination. After further filtering and conversion to a digital signal, the broadcast signal is Fourier transformed and subjected to a complex multiplication process with reference signal segments to obtain a vector product. The results of the complex multiplication process are then subjected to an inverse Fourier transformation step to obtain a correlation function which has been transformed from the frequency to the time domain. This correlation function is then normalized and the correlation peak for each segment is selected and the peak spacing is compared with segment length. Simultaneously, the RMS power of the segment coincident with the correlation peak segment is sensed to determine the segment power point pattern. Thus, the '531 patent overcomes the disadvantages of a single segment correlation system by providing a plurality of correlation segments and measuring the distances between the correlation peaks. Where the distances match, the broadcast signal is declared as being similar to the signal segment stored in the reference library. In addition, the RMS value comparison operates to confirm the classification carried out using the signal segments.
To overcome the time-axis variation problem, the '531 patent utilizes an envelope detector and a bandpass filter for the broadcast information. However, the system according to the '531 patent is computationally very demanding. For example, performing the various multi-segment correlations requires a great deal of computer power. Since a multitude of segments are sampled, the system according to the '531 patent may take a good deal of time and require the use of expensive, powerful computers.
An automated approach to speech pattern recognition is disclosed in U.S. Pat. No. 4,282,403 to Sakoe. Sakoe discloses a speech recognition system in which a time sequence input of pattern feature vectors is inputted into a reference library. The received speech signal is then subjected to spectrum analysis, sampling, and digitization in order to be transformed into a time sequence of vectors representative of features of the speech sound at respective sampling instances. A time warping function may be used for each reference pattern by the use of feature vector components of a few channels. The time warping function for each reference pattern feature vector is used to correlate the input pattern feature vector and the reference pattern feature vector. The input pattern feature vector sequence is then compared with the reference pattern feature vector sequence, with reference to the time warping function, in order to identify the spoken word. However, the Sakoe system time warps the reference patterns rather than the input signal, and thus a plurality of patterns must be calculated for each reference pattern thus increasing the memory and computational demands of the system.
A further signal recognition system is disclosed in U.S. Pat. No. 4,432,096 to Bunge. In Bunge, sounds or speech signals are converted into an electrical signal and broken down into several spectrum components in a filter bank. These components are then integrated over a short period of time to produce the short-time spectrum of the signal. The spectral components of the signal are applied to a number of pattern detectors which apply an output signal only if the short-time spectrum corresponds to the pattern adjusted in the relevant pattern detector. Each pattern detector has two threshold detectors which supply a signal if the applied input lies between the adjustable thresholds. Thus, the pattern detectors supply an output signal only if all threshold value detectors are activated. For each sound of speech, a pattern detector is provided. When a series of sounds is recognized, the series of addresses of the pattern detectors which have successfully generated an output signal are stored and subsequently applied to the computer for comparison. It can be readily appreciated that such a system requires a number of pattern detectors and a corresponding powerful computation device. In addition, while the Bunge system uses a filter bank to provide a low frequency output signal which is relatively less sensitive to time-axis variations, the Bunge system is still subject to time distortion problems and a high false alarm rate.
A recently commercialized automatic broadcast signal recognition system is disclosed in U.S. Pat. No. 4,843,562 to Kenyon et al. Again, the same Mr. Kenyon is the sole inventor of the subject application, and the teachings of the '562 patent are incorporated herein by reference. In fact, specific teachings from the '562 patent will be incorporated in further portions of this specification. The '562 patent describes a two-stage (coarse and fine) classification system using fewer processor resources. According to the '562 patent, the broadcast signal is bandpass filtered, rectified, and lowpass filtered to provide a plurality of low bandwidth waveforms. The waveforms are sampled and the samples are used to generate a spectragram which is then compared with a plurality of reference spectragrams stored in a first stage reference library. The first stage reference spectragrams are then queued in order of their similarity to the generated spectragram. Next, a plurality of second stage reference patterns, which correspond to the queued first stage reference spectragrams, are correlated with one of the analyzed waveforms in the queueing order established previously. A correlation value is provided for each second stage reference pattern stored in the second stage reference library. When it is determined that a correlation value exceeds a threshold value, a recognition is declared and the broadcast signal is classified as similar to the second stage reference pattern whose correlation value exceeds the threshold. The analyzed waveform used in the second stage classification is time warped to account for speed fluctuations in the broadcast signal.
While the system according to the '562 patent is successful, it is somewhat limited in its ability of recognizing a large number of songs. For example, the system according to the '562 patent is capable of recognizing any of 600 songs on a single channel with high reliability. The system can simultaneously monitor 5 different channels. However, a system which could identify any one of three thousand songs on each of five simultaneously broadcast stations with high reliability would provide a very attractive and commercially successful signal recognition system. Further, the system according to the '562 patent required approximately 64 seconds to detect and classify a broadcast song. It is desired to reduce this time to 28 seconds to allow for the identification of shorter duration recordings such as advertisements. While increasing performance, it is important to retain the desirable compact architecture of the '562 patent.
Thus, what is needed is an improved system for accurately recognizing and classifying a large number of unique broadcast signals on a plurality of broadcast channels simultaneously and with high reliability. The system must be small, inexpensive, and easy to operate.