1. Field of the Invention
This invention relates to embedding digital data in an audio signal, especially related to methods and apparatuses to reduce error rates in the digital data recovery and to reduce the perception of noise in the audio signal.
2. Description of the Related Art
Using audible sound to transmit information is as old as human speech. When one speaks, the speaker is conveying information to his human listener. Using audible sound to transmit digitized information came somewhat later but it can be traced back at least to the days when people communicated using Morse code through telegraphy. In the recent years of the computer age, audio signals are also used for communicating digitized data through computer network. For example, Plain Old Telephone Service, or POTS has been used to transmit digital information using modulators and demodulators (modems). Most of the time, though, an audio channel is either used to communicate human speech, i.e. analog speech signals which are comprehensible to humans, or to communicate digitized information which is comprehensible to computers, but not both at the same time.
DSL (Digital Subscriber's Line) technology may look like an exception. Using DSL technology, a single traditional copper telephone line may transmit both digital data and voice signals at the same time. But the digital data and the voice signals are transmitted at quite different frequency bands carried by the same telephone line. The digital information is carried by high frequency signals and the analog voice signals are carried by low frequency electric waves with frequencies typically lower than about 4000 Hz. The DSL technology has a big limitation in terms of distance. A subscriber of the DSL service has to be within a short distance, e.g. three miles, from the service provider's server.
More recently, using various techniques, analog audio speech and digital data are transmitted simultaneously and commingled together, as illustrated in FIG. 1 and as described in more detail in a related patent application Ser. No. 10/378,709, filed on May 3, 2003. For simplicity, only the signal path from left to right is discussed here. The signal path from right to left is the same. A generic system 100 shown in FIG. 1 includes an ordinary audio conference system having units 102, 143 and 142 connected through a usual Plan Old Telephone System (POTS) network 122. In addition to the regular voice conference, this system 100 can also transmit digital data. Data source 104 (typically a computer) at the near end site can transmit data 106, which are mixed with audio signal 136 by a data and voice mixer 112 to create a data signal embedded in the voice signal. The combined audio signal 154 is transmitted through the POTS network 122 to a far end site. At a far end site, the mixed audio signal 156 can be separated by data and voice separator 132 into digital data 108 and voice signal 138. Digital data 108 goes into a data sink 144. Voice signal 138 goes to a voice sink 142, which is typically a speakerphone that can reproduce the sound of voice signal 138. In a far end that does not have a capable separator, mixed audio signal 156 can still be branched into a signal 139 and be reproduced by a voice sink 143, e.g. a speakerphone. To voice sink 143, mixed audio signal 139 is treated as if it is voice signal 136 and the data signal 106 is ignored. For conference participants using voice sink 143, the embedded data does not exist because the sound due to the embedded data is substantially imperceptible.
FIG. 1 shows the functional block diagram of audio system 100. For clarity, the data source (sink) and voice source (sink) are shown as separate entities and only one for each is shown. In actual implementation, more than one of each, data source, data sink, voice source or voice sink, may be present in a system. In many actual implementations, these different items may be the same physical entity that has multiple functionalities. In other implementations, the different functions, or their combinations, may be performed by different physical entities with the necessary interfaces connecting them.
There are many ways to combine the digital data and voice signals together. For example, several methods are disclosed in a patent application entitled “System and method for communicating data during an audio conference,” by Jeffrey Rodman and David Drell, filed on Mar. 3, 2003, Ser. No. 10/378,709, assigned to the same assignee as the current application, which is incorporated by reference herein. In that patent application, digital data is received from a data source. The digital data is then modulated onto an audio carrier signal which is an analog audio signal. That audio carrier signal is mixed with a regular speech signal to form an audio signal for transmission, recording or other further processing. The audio carrier signal can be a notched narrow-band audio tone or a spread spectrum audio carrier signal, which is a signal covering a wide audio spectrum. In some other cases, the carrier signal can be a narrow-band spectrum, but the frequency is hopping throughout a wide audio spectrum (signal hopping). In such a system, data and speech are transmitted simultaneously, in the same medium and same frequency spectrum range. For a human listener, only the speech portion of the signal is generally perceptible. The data portion is unknown to the listener. The data portion is imperceptible and masked by the speech portion or perceived as background noise when the audio channel is noisy. The data can be obtained only if there is a decoder at the receiver.
FIG. 2 illustrates another way of encoding/modulating digital data into audio signal, which is called Phase Shifting Keying (PSK). An audio tone 226, shown as a sine wave, is used to encode the digital data in a bit stream 222. Bit stream 222 is fed into a modulator/encoder 224 to be combined with audio carrier signal 226. The output from the modulator/encoder is an audio signal u(t) 228. What modulator 224 does is to change the phase of sine wave 226 according to bit signal 222: when the bit is a 1 then the phase is 0, or when the bit is 0 then the phase is 180°. The phase of the encoded sine wave indicates the digital data as 1 or 0. In the time domain, the corresponding bit stream is represented by a step curve 212 and the encoded carrier wave is shown as wave 214. The same modulation is also shown in the frequency domain, where the initial bit stream is shown as a curve 202, which is centered at the zero frequency or DC, while the encoded carrier signal is shown as a curve 204, which is centered at the audio carrier frequency Fc. On the receiver side, transmit signal u(t) 228 becomes a received signal x(t) 238. It is demodulated at demodulator 234 with audio carrier signal 226, which is same as the one used in modulator 224. Passing demodulated signal 236 through a Low Pass Filter 244, a bit stream 242 can be retrieved, which is the same as the original bit stream B(n) 222.
An audio system which can transmit additional data has many different uses. As in the cited patent application, such additional data are related to the ongoing audio conference or video conference calls. The data can be used by the conference equipment to automate many routine tasks, such as exchange parties' phone numbers, names etc; or remotely control related equipment, for example, adding a video portion of the conference call to make it a video conference call or transmitting a web address of an interested website.
The data embedded in the audio stream may also be used as a means for intellectual property right management, such as digital watermarks. It may also be used to as a watermark aircraft identification tag within voice communication between pilot and control tower, which can reduce errors in the communication between them.
In the most applications where digital data is embedded in audible signals, the digital data must have small amplitudes in order to avoid interference with the primary audible signals. Most of the time, the audio carrier signals with digital data are heard by people as “noises.” The noise level has to be as low as possible. But with low audio carrier signal amplitude, the carrier signal is more susceptible to real noise. It may not have enough signal to noise ratio (SNR) to be decoded properly at the receiver. It is found that a typical system using the data-in-speech method, the data error rate is as high as 10% of the data transmitted. The data processing system has to use other methods to correct the transmission or decoding errors. For example, one may increase the redundancy of the data encoding, transmitting/encoding the same data over and over to ensure the correct reception and decoding.
It is desirable to reduce the error rate without adversely affecting the primary audible signal. It is desirable to increase the carrier signal amplitude without raising the perceived noise level in the primary speech or music signal.