Many communications systems face the problem that the demand for information transmission and storage capacity often exceeds the available capacity. As a result there is considerable interest among those in the fields of broadcasting and recording to reduce the amount of information required to transmit or record an audio signal intended for human perception without degrading its subjective quality. Similarly there is a need to improve the quality of the output signal for a given bandwidth or storage capacity.
Two principle considerations drive the design of systems intended for audio transmission and storage: the need to reduce information requirements and the need to ensure a specified level of perceptual quality in the output signal. These two considerations conflict in that reducing the quantity of information transmitted can reduce the perceived quality of the output signal. While objective constraints such as data rate are usually imposed by the communications system itself, subjective perceptual requirements are usually dictated by the application.
Traditional methods for reducing information requirements involve transmitting or recording only a selected portion of the input signal, with the remainder being discarded. Preferably, only that portion deemed to be either redundant or perceptually irrelevant is discarded. If additional reduction is required, preferably only a portion of the signal deemed to have the least perceptual significance is discarded.
Speech applications that emphasize intelligibility over fidelity, such as speech coding, may transmit or record only a portion of a signal, referred to herein as a “baseband signal”, which contains only the perceptually most relevant portions of the signal's frequency spectrum. A receiver can regenerate the omitted portion of the voice signal from information contained within that baseband signal. The regenerated signal generally is not perceptually identical to the original, but for many applications an approximate reproduction is sufficient. On the other hand, applications designed to achieve a high degree of fidelity, such as high-quality music applications, generally require a higher quality output signal. To obtain a higher quality output signal, it is generally necessary to transmit a greater amount of information or to utilize a more sophisticated method of generating the output signal.
One technique used in connection with speech signal decoding is known as high frequency regeneration (“HFR”). A baseband signal containing only low-frequency components of a signal is transmitted or stored. A receiver regenerates the omitted high-frequency components based on the contents of the received baseband signal and combines the baseband signal with the regenerated high-frequency components to produce an output signal. Although the regenerated high-frequency components are generally not identical to the high-frequency components in the original signal, this technique can produce an output signal that is more satisfactory than other techniques that do not use HFR. Numerous variations of this technique have been developed in the area of speech encoding and decoding. Three common methods used for HFR are spectral folding, spectral translation, and rectification. A description of these techniques can be found in Makhoul and Berouti, “High-Frequency Regeneration in Speech Coding Systems”, ICASSP 1979 IEEE International Conf. on Acoust., Speech and Signal Proc., Apr. 2-4, 1979.
Although simple to implement, these HFR techniques are usually not suitable for high quality reproduction systems such as those used for high quality music. Spectral folding and spectral translation can produce undesirable background tones. Rectification tends to produce results that are perceived to be harsh. The inventors have noted that in many cases where these techniques have produced unsatisfactory results, the techniques were used in bandlimited speech coders where HFR was restricted to the translation of components below 5 kHz.
The inventors have also noted two other problems that can arise from the use of HFR techniques. The first problem is related to the tone and noise characteristics of signals, and the second problem is related to the temporal shape or envelope of regenerated signals. Many natural signals contain a noise component that increases in magnitude as a function of frequency. Known HFR techniques regenerate high-frequency components from a baseband signal but fail to reproduce a proper mix of tone-like and noise-like components in the regenerated signal at the higher frequencies. The regenerated signal often contains a distinct high-frequency “buzz” attributable to the substitution of tone-like components in the baseband for the original, more noise-like high-frequency components. Furthermore, known HFR techniques fail to regenerate spectral components in such a way that the temporal envelope of the regenerated signal preserves or is at least similar to the temporal envelope of the original signal.
A number of more sophisticated HFR techniques have been developed that offer improved results; however, these techniques tend to be either speech specific, relying on characteristics of speech that are not suitable for music and other forms of audio, or require extensive computational resources that cannot be implemented economically.