When speech is processed to achieve a low bit rate (or occupy a smaller bandwidth), improved spectrum or storage efficiency can be obtained. However, this causes the quality of the processed speech signal to be degraded. As a result, maintaining speech quality while minimizing bit rate or occupied bandwidth is a key issue for the success of any speech coding scheme.
In the past, low bit-rate voice coders have been used to reduce the amount of information required for transmission or storage. One such voice coder is a digital sub-band coder, which operates on speech segments to partition a voice signal into multiple frequency sub-bands. Based on the signal's spectral energy distribution, digital bits are allocated among certain subbands to encode the subband's information for transmission or storage.
In subband-type processing, a time segment (frame) of speech is processed. The speech segment is divided into subband signals by a filter bank. Each subband signal is processed according to the frequency spectrum of the input speech. For example, typical digital subband coding (SBC) allocates available bits to encode subband information according to the computed subband energy distribution. More bits are allocated for subbands with higher energy to yield an improved reconstructed output for these more significant bands. Fewer bits (or even zero bits) are allocated for subbands with lower energy. Another example is multilevel subband coding (MSBC) which provides for speech samples for only a fixed number of subbands. The rest of the subband signals are not transmitted. For better spectrum efficiency, very few subband signals are transmitted. In the aforementioned decoders, the quality of reconstructed speech is degraded due to those coarsely-quantized and/or missing subband signals.
Applied in a radio frequency communication system, it is desired to send only the essential spectral information to the receiver, which then reconstructs or synthesizes the voice signal by routing the essential signal information to reconstruction subband filters. In an attempt to improve the quality of the reconstructed signal, earlier sub-band coders have utilized random noise to excite those subband reconstruction filters for which actual signal information is unavailable. The filtered noise signals are then combined with the outputs from those subband filters excited with known information to generate a more natural-sounding voice signal. The amount of noise added in each spectral subband is usually scaled in proportion to the amount of speech energy originally present in the corresponding frequency subband. See, for example, Tor A. Ramstad, "Subband Coder with a Simple Adaptive Bit Allocation Algorithm--A Possible Candidate for Digital Mobile Telephony?", Proc. ICASSP, pp. 203-207, May 1982.
The amount of waveform/noise fill energy introduced into subbands where samples are not sent has traditionally been a fixed fraction of the actual speech energy present in those respective subbands. The fraction is empirically determined using subjective listening tests. A trade-off exists in the selection, however. This is because higher amplitudes of energy fill better mask artifacts and eliminate hollowness, whereas excess added waveform/noise causes coarseness and granularity.