This invention relates to apparatus and method by which broadcast information can be recognized and classified. More particularly, this invention relates to a system and method for classifying broadcast information using a plurality of reference signal libraries in a two-stage classification process.
It is known that broadcast stations (television and radio) are regularly monitored to determine when and how often certain information is broadcast. For example, artists may be paid a royalty rate depending upon how often their particular work is broadcast. Likewise, commercial backers of broadcast programming have an interest in determining when and how often commercials are played. Further, marketing executives and the broadcasters themselves are interested in determining the popularity of certain broadcast information in order to target that information to the appropriate audience at the appropriate time. Those of ordinary skill in this field will readily understand that a wide variety of legal, economic and social concerns require the regular monitoring of broadcast information. All such requirements share a common need for certain information such as which information was broadcast and when.
Traditionally, such broadcast station monitoring was performed manually by a plurality of listeners who would physically monitor the broadcast program and manually tabulate which information was broadcast and when. However, the cost of these manual surveys has become prohibitive. Such a method is labor intensive and subject to reliability problems. For example, a manual monitor may easily miss a fifteen second commercial broadcast over radio. In addition, it is virtually impossible for a single individual to monitor a plurality of broadcast channels. Therefore, a great number of monitors has been traditionally required to fully monitor performance in a multi-media environment.
In view of the above problems with manual systems, it has been proposed to design and implement an automatic broadcast recognition system. It is believed that such automatic systems will be less expensive and more reliable than manual surveys.
In recent years, several techniques and systems have been developed which electronically monitor broadcast signals and provide information relative to the content and timing of the program monitored. Initially, these automatic systems performed signal recognition by inserting a code signal in the broadcast signal itself. Upon reception, the automatic system would recognize the code signal (matching it with a reference library) and classify the broadcast information accordingly. Although such coding techniques work for limited applications, they require allocation of portions of the broadcast signal band for identification purposes. In addition, such a system requires special processing, coding and decoding circuitry. Such circuitry is expensive to design and assemble and must be placed at each transmitting and receiving station. In addition, those of skill in this field understand that government regulatory agencies are adverse to providing additional bandwidth for purposes of code signal identification.
To overcome some of the disadvantages involved with the use of the coded signal techniques, certain automatic broadcast signal identification systems have been developed which do not require special coding of the broadcast signal. Such a system is disclosed in U.S. Pat. No. 3,919,479 to Moon et al. In Moon et al, an audio signal is digitally sampled to provide a reference signal segment which is stored in a reference library. Then, when the audio signal is broadcast, successive portions thereof are digitized and compared with the reference segment in the library. The comparison is carried out in a correlation process which produces a correlation function signal. If the reference and broadcast signal segments are not the same, a correlation function with a relatively small amplitude results. On the other hand, if the reference and broadcast signal segments are relatively the same, a large correlation function signal is produced. The amplitude of the correlation function signal is sensed to provide a recognition signal when the amplitude exceeds a predetermined threshold level.
While the Moon et al system may operate effectively in certain situations, it is not effective for many applications. For example, where signal drop-out is experienced, a single segment correlation system may be severely degraded or disabled all together. Additionally, the Moon et al system is relatively insensitive to time-axis variations in the broadcast information itself. For example, it is known that many disc-jockeys "compress" broadcast songs by speeding-up the drive mechanism. It is also known that other disc-jockeys regularly "compress" and/or "stretch" broadcast information to produce certain desired effects in the audience.
In an attempt to overcome such time-axis variations, Moon proposes to reduce the bandwidth of the broadcast signal by envelope-detecting the broadcast signal and providing envelope signals having substantially low, and preferably sub-audio frequency signal components. It has been found that when the envelope signal at sub-audio frequencies is used during the correlation process, the digitally sampled waveforms are less sensitive to time-axis variations. However, the improvements which can be achieved by such a solution are very limited and will only operate for broadcast signals which have been "compressed" or "stretched" by a small amount. In addition, such a solution is subject to high false alarm rates. These disadvantages make the Moon et al system less than desirable for a rapid, accurate, and inexpensive broadcast information recognition system.
Another automatic signal recognition system is disclosed in U.S. Pat. No. 4,450,531 to Kenyon et al. Mr. Kenyon is a joint inventor of the subject application and the '531 patent. The teachings of the '531 patent are hereby incorporated into this application by reference.
The Kenyon et al system successfully addresses the reliability problems of a single segment correlation system, and the time-axis variation problems experienced by prior systems. In Kenyon et al, a plurality of reference signal segments are extracted from a program unit (song), digitized, Fourier transformed and stored in a reference library in a frequency domain complex spectrum. The received broadcast signal is then prefiltered to select a frequency portion of the audio spectrum that has stable characteristics for discrimination. After further filtering and conversion to a digital signal, the broadcast signal is Fourier transformed and subjected to a complex multiplication process with reference signal segments to obtain a vector product. The results of the complex multiplication process are then subjected to an inverse Fourier transformation step to obtain a correlation function which has been transformed from the frequency to the time domain. This correlation function is then normalized and the correlation peak for each segment is selected and the peak spacing is compared with segment length. Simultaneously, the RMS power of the segment coincident with the correlation peak segment is sensed to determine the segment power point pattern. Thus, Kenyon et al overcomes the disadvantages of a single segment correlation system by providing a plurality of correlation segments and measuring the distances between correlation peaks. Where the distances match, the broadcast signal is declared as being similar to the signal segments stored in the reference library. In addition, the RMS value comparison operates to confirm the classification carried out using the signal segments.
To overcome the time-axis variation problem, Kenyon et al utilizes an envelope detector and band pass filtering of the broadcast information, similar to the system of Moon et al. In addition, Kenyon et al, proposes the use of more than one sampling rate for the reference signal segments. A fast and slow sample may be stored for each reference signal segment so that broadcast signals from faster rate stations will correlate with the faster rate reference segments and signals from slower rate stations will correlate with the slower rate reference segments. However, the system according to Kenyon et al also suffers from a relatively high false alarm rate and is computationally very demanding. For example, performing the various multi-segment correlations requires a great deal of computer power. Since a multitude of segments are sampled, the system according to Kenyon et al may take a good deal of time and require the use of expensive, powerful computers.
A system for speech pattern recognition is disclosed in U.S. Pat. No. 4,282,403 to Sakoe. Sakoe discloses a speech recognition system in which a time sequence input of pattern feature vectors is inputted into a reference library. The received speech signal is then subjected to spectrum analysis, sampling and digitalization in order to be transformed into a timed sequence of vectors representative of features of the speech sound at respective sampling instances. A time warping function may be used for each reference pattern by the use of feature vector components of a few channels. The time warping function for each reference pattern feature vector is used to correlate the input pattern feature vector and the reference pattern feature vector. The input pattern feature vector sequence is then compared with the reference pattern feature vector sequence, with reference to the warping function, in order to identify the spoken word. However, the Sakoe system time warps the reference patterns rather than the input signal. Thus, a plurality of patterns must be calculated for each reference pattern, necessarily increasing the memory and computational requirements of the system.
A further signal recognition system is disclosed in U.S. Pat. No. 4,432,096 to Bunge. In Bunge, sounds or speech signals are converted into an electrical signal and broken down into several spectrum components in a filter bank. These components are then integrated over a short period of time to produce the short-time spectrum of a signal. The spectral components of the signal are applied to a number of pattern detectors which supply an output signal only if the short-time spectrum corresponds to the pattern adjusted in the relevant pattern detector. Each pattern detector has two threshold detectors which supply a signal if the applied input lies between the adjustable thresholds. Thus, the pattern detectors supply an output signal only if all threshold value detectors are activated. For each sound of speech, a pattern detector is provided. When a series of sounds is recognized, the series of addresses of the pattern detectors which have successfully generated an output signal are stored and subsequently applied to the computer for comparison. It can be readily appreciated that such a system requires a number of pattern detectors and a corresponding powerful computation device. In addition, while the Bunge system uses a filter bank to provide a low frequency output signal which is relatively less sensitive to time-axis variations, the Bunge system is still subject to time distortion problems and a high false alarm rate.
Known automatic broadcast recognition systems have been caught in a quandary of choosing an appropriate time-bandwidth (sampling time times frequency band width) product. Where the broadcast signal is sampled with a large time-bandwidth product, signal recognition may be made accurately. However, when a suitably large time-bandwidth product is employed, it will be extremely sensitive to time-axis variations. Thus, most known systems utilize a predetermined time-bandwidth product and suffer recognition inaccuracies and time-axis variations. In addition, the computational load imposed by all known techniques severely limits the number of songs or other recordings that can be simultaneously sampled in real time.
Thus, what is needed is a small, inexpensive system with limited processing power which automatically monitors a plurality of broadcast channels simultaneously for a large number of sounds. Such a system should provide accurate recognition and, at the same time, remain relatively insensitive to time-axis variations.