Many communications systems face the problem that the demand for information transmission and recording capacity often exceeds the available capacity. As a result, there is considerable interest among those in the fields of broadcasting and recording to reduce the amount of information required to transmit or record an audio signal intended for human perception without degrading its perceived quality. There is also an interest to improve the perceived quality of the output signal for a given bandwidth or storage capacity.
Traditional methods for reducing information capacity requirements involve transmitting or recording only selected portions of the input signal. The remaining portions are discarded. Techniques known as perceptual encoding typically convert an original audio signal into spectral components or frequency subband signals so that those portions of the signal that are either redundant or irrelevant can be more easily identified and discarded. A signal portion is deemed to be redundant if it can be recreated from other portions of the signal. A signal portion is deemed to be irrelevant if it is perceptually insignificant or inaudible. A perceptual decoder can recreate the missing redundant portions from an encoded signal but it cannot create any missing irrelevant information that was not also redundant. The loss of irrelevant information is acceptable, however, because its absence has no perceptible effect on the decoded signal.
A signal encoding technique is perceptually transparent if it discards only those portions of a signal that are either redundant or perceptually irrelevant. If a perceptually transparent technique cannot achieve a sufficient reduction in information capacity requirements, then a perceptually non-transparent technique is needed to discard additional signal portions that are not redundant and are perceptually relevant. The inevitable result is that the perceived fidelity of the transmitted or recorded signal is degraded. Preferably, a perceptually non-transparent technique discards only those portions of the signal deemed to have the least perceptual significance.
An encoding technique referred to as “coupling,” which is often regarded as a perceptually non-transparent technique, may be used to reduce information capacity requirements. According to this technique, the spectral components in two or more input audio signals are combined to form a coupled-channel signal with a composite representation of these spectral components. Side information is also generated that represents a spectral envelope of the spectral components in each of the input audio signals that are combined to form the composite representation. An encoded signal that includes the coupled-channel signal and the side information is transmitted or recorded for subsequent decoding by a receiver. The receiver generates decoupled signals, which are inexact replicas of the original input signals, by generating copies of the coupled-channel signal and using the side information to scale spectral components in the copied signals so that the spectral envelopes of the original input signals are substantially restored. A typical coupling technique for a two-channel stereo system combines high-frequency components of the left and right channel signals to form a single signal of composite high-frequency components and generates side information representing the spectral envelopes of the high-frequency components in the original left and right channel signals. One example of a coupling technique is described in “Digital Audio Compression (AC-3),” Advanced Television Systems Committee (ATSC) Standard document A/52, which is incorporated by reference in its entirety.
The information capacity requirements of the side information and the coupled-channel signal should be chosen to optimize a tradeoff between two competing needs. If the information capacity requirement for the side information is set too high, the coupled-channel will be forced to convey its spectral components at a low level of accuracy. Lower levels of accuracy in the coupled-channel spectral components may cause audible levels of coding noise or quantizing noise to be injected into the decoupled signals. Conversely, if the information capacity requirement of the coupled-channel signal is set too high, the side information will be forced to convey the spectral envelopes with a low level of spectral detail. Lower levels of detail in the spectral envelopes may cause audible differences in the spectral level and shape of each decoupled signal.
Generally, a good tradeoff can be achieved if the side information conveys the spectral level of frequency subbands that have bandwidths commensurate with the critical bands of the human auditory system. It may be noted that the decoupled signals may be able to preserve spectral levels of the original spectral components of original input signals but they generally do not preserve the phase of the original spectral components. This loss of phase information can be imperceptible if coupling is limited to high-frequency spectral components because the human auditory system is relatively insensitive to changes in phase, especially at high frequencies.
The side information that is generated by traditional coupling techniques has typically been a measure of spectral amplitude. As a result, the decoder in a typical system calculates scale factors based on energy measures that are derived from spectral amplitudes. These calculations generally require computing the square root of the sum of the squares of values obtained from the side information, which requires substantial computational resources.
An encoding technique sometimes referred to as “high-frequency regeneration” (HFR) is a perceptually non-transparent technique that may be used to reduce information capacity requirements. According to this technique, a baseband signal containing only low-frequency components of an input audio signal is transmitted or stored. Side information is also provided that represents a spectral envelope of the original high-frequency components. An encoded signal that includes the baseband signal and the side information is transmitted or recorded for subsequent decoding by a receiver. The receiver regenerates the omitted high-frequency components with spectral levels based on the side information and combines the baseband signal with the regenerated high-frequency components to produce an output signal. A description of known methods for HFR can be found in Makhoul and Berouti, “High-Frequency Regeneration in Speech Coding Systems”, Proc. of the International Conf. on Acoust., Speech and Signal Proc., April 1979. An improved HFR technique that is suitable for encoding high-quality music is disclosed in U.S. patent application Ser. No. 10/113,858 entitled “Broadband Frequency Translation for High Frequency Regeneration” filed Mar. 28, 2002, which is incorporated by reference in its entirety and is referred to below as the HFR application.
The information capacity requirements of the side information and the baseband signal should be chosen to optimize a tradeoff between two competing needs. If the information capacity requirement for the side information is set too high, the encoded signal will be forced to convey the spectral components in the baseband signal at a low level of accuracy. Lower levels of accuracy in the baseband signal spectral components may cause audible levels of coding noise or quantizing noise to be injected into the baseband signal and other signals that are synthesized from it. Conversely, if the information capacity requirement of the baseband signal is set too high, the side information will be forced to convey the spectral envelopes with a low level of spectral detail. Lower levels of detail in the spectral envelopes may cause audible differences in the spectral level and shape of each synthesized signal.
Generally, a good tradeoff can be achieved if the side information conveys the spectral levels of frequency subbands that have bandwidths commensurate with the critical bands of the human auditory system.
Just as for the coupling technique discussed above, the side information that is generated by traditional HFR techniques has typically been a measure of spectral amplitude. As a result, the decoder in typical systems calculates scale factors based on energy measures that are derived from spectral amplitudes. These calculations generally require computing the square root of the sum of the squares of values obtained from the side information, which requires substantial computational resources.
Traditional systems have used either coupling techniques or HFR techniques but not both. In many applications, the coupling techniques may cause less signal degradation than HFR techniques but HFR techniques can achieve greater reductions in information capacity requirements. The HFR techniques can be used advantageously in multi-channel and single-channel applications; however, coupling techniques do not offer any advantage in single-channel applications.