It is often desirable to transmit low to medium speed data signals over audio channels, such as telephone, radio and television channels, carrying analog voice and/or music signals. Such data signals may be used to convey, for example, a serial number, the name of a song being played, copyright information, royalty billing codes, virtual reality cues and codes identifying particular television or radio stations for polling viewers and listeners. A popular technique for accomplishing such simultaneous transmission involves the transmission of a data signal in the underutilized portions of the frequency spectrum below and/or above the voice band available on a telephone line, such that the data signal is imperceptible to listeners. Spread spectrum whitening techniques are applied to the data signal to maintain interference at a low level.
An example of a technique that places the information in the lower frequency region of the voice band is disclosed in U.S. Pat. No. 4,425,661 to Moses et al. Another technique, described in U.S. Pat. No. 4,672,605 to Hustig et al., involves the use of a spread spectrum signal having most of its energy in the higher audio frequency region and above the voice band. Yet another technique, described in U.S. Pat. No. 4,425,642 to Moses et al., involves spread spectrum processing a data signal throughout the channel spectrum, such that the spectral energy of the data signal possesses a pseudo random noise characteristic which, when added to the voice channel, causes only an imperceptible increase in white noise.
Although systems such as those described above are typically sufficient for the particular purposes for which they were designed, they suffer certain deficiencies inherent to the use of spread spectrum processing. Specifically, the use of spread spectrum whitening techniques alone results in extremely low data throughput rates on an audio channel, due to the large spreading gain that must be achieved. In addition, although such techniques make limited use of certain "masking" characteristics of the audio signal with which the data signal is to be transmitted, they do not make full use of such characteristics, as further described below, thereby limiting the processing gain which might otherwise be achieved.
Other techniques for enabling the simultaneous transmission of audio and data signals in a single channel include using a start pulse created by taking a subband to zero energy level, then using the following short period of digitized audio as the serial number and using subbands to carry a digital message by forcing the subband energy to zero or leaving it at the actual level in order to create "marks" and "spaces" (i.e., "ones" and "zeros"). The primary deficiencies of the former technique include poor noise immunity and the fact that it is not practical in situations in which many bytes of data must be stored and processed. The primary deficiencies of the latter technique also include poor noise immunity, as well as an extremely slow data throughput rate.
It is known in the art that every audio signal generates a perceptual concealment function which masks audio distortions existing simultaneously with the signal. Accordingly, any distortion, or noise, introduced into the transmission channel if properly distributed or shaped, will be masked by the audio signal itself. Such masking may be partial or complete, leading either to increased quality compared to a system without noise shaping, or to near-perfect signal quality that is equivalent to a signal without noise. In either case, such "masking" occurs as a result of the inability of the human perceptual mechanism to distinguish between two signal components, one belonging to the audio signal and the other belonging to the noise, in the same spectral, temporal or spatial locality. An important effect of this limitation is that the perceptibility of the noise by a listener can be zero, even if the signal-to-noise ratio is at a measurable level. Ideally, the noise level at all points in the audio signal space is exactly at the level of just-noticeable distortion, which limit is typically referred to as the "perceptual entropy envelope."
Hence, the main goal of noise shaping is to minimize the perceptibility of distortions by advantageously shaping it in time or frequency so that as many of its components as possible are masked by the audio signal itself. See Nikil Jayant et al., Signal Compression Based on Models of Human Perception, 81 Proc. of the IEEE 1385 (1993). A schematic representation of time-frequency domain masking is shown in FIGS. 1a-1c, in which a short sinusoidal tone 10 produces a masking threshold 12. See John G. Beerends and Jan A. Stemerdink, A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation, 40 J. Audio Engineering Soc'y 963, 966 (1992).
"Perceptual coding" techniques employing the above-discussed principles are presently used in signal compression and are based on three types of masking: frequency domain, time domain and noise level. The basic principle of frequency domain masking is that when certain strong signals are present in the audio band, other lower level signals, close in frequency to the stronger signals, are masked and not perceived by a listener. Time domain masking is based on the fact that certain types of noise and tones are not perceptible immediately before and after a larger signal transient. Noise masking takes advantage of the fact that a relatively high broadband noise level is not perceptible if it occurs simultaneously with various types of stronger signals.
Perceptual coding forms the basis for precision audio sub-band coding (PASC), as well as other coding techniques used in compressing audio signals for mini-disc (MD) and digital compact cassette (DCC) formats. Specifically, such compression algorithms take advantage of the fact that certain signals in an audio channel will be masked by other stronger signals to remove those masked signals in order to be able to compress the remaining signal into a lower bit-rate channel.
Another deficiency of the prior art techniques for simultaneously transmitting data signals with audio signals is that if the signals are transmitted through a channel which implements a lossy compression algorithm, such as the MPEG compression algorithm, the data signal, or at least portions thereof, will likely be removed, as most such compression algorithms divide the audio channel into a plurality of subbands and then encode and transmit only the strongest signal within each subband. Regardless of which of the previously-described techniques is used, it is highly unlikely that the data signal will ever be the strongest signal in a subband; therefore, it is unlikely that any portion of the data signal will be transmitted. Moreover, with respect to the spread spectrum techniques, even assuming the data signal happens to be the strongest signal in one or two subbands, because the information is spread throughout the signal spectrum, the information contained in such subbands will comprise only a small portion of the total information carried by the data signal and therefore is likely to be useless.
Accordingly, what is needed is a system for simultaneously transmitting data and audio signals that utilizes the advantages of perceptual coding techniques and which is capable of transmitting data signals through a lossy compressed channel.