All digital telephony employs some form of speech compression (or voice encoder/decoder, herein “vocoder”). When the IS-95A standard for CDMA digital telephony was finalized, its founder developed their own variable rate vocoder and dubbed it the QCELP (Qualcomn Codebook Excitation Linear Prediction) encoder. The first generation of this vocoder was a 8 kbps vocoder, QCELP-8. Unfortunately, the quality of the QCELP-8 was not very high. To address the quality issues, the manufacturer developed a high-rate version operating at 13 kbps and called it the QCELP-13 vocoder. It is known that in the QCELP-13 specification, it is a requirement that the first frame be encoded at full-rate. Such a requirement is not present on various other vocoders including the QCELP 8.
In addition, the manufacturer of the QCELP-13 vocoder released a floating-point C-language implementation. Commercially viable silicon solutions must implement this vocoder using fixed-point arithmetic; neither the standard or the C-code reference describe how to do this. As a result of fixed-point arithmetic, there are unwanted quantization effects which must be minimized in order to achieve toll-quality speech. However, without a fixed-point reference model, two different entities (e.g. two different companies) are free to implement their unique fixed-point implementation as they see fit. Unfortunately, when this happens, there is no assurance that one company's voice encoder output will sound good through another company's voice decoder and visa versa.
In order to ensure successful interoperability of vocoder implementations from different semiconductor providers, an exhaustive procedure has been defined (Modified Methodology IS-736 Performance test) to test the subjective quality of various implementations of the same vocoder under varying operating conditions; this test is referred to as the mean opinion scoring test (herein “MOS Test”). Increasingly, as more semiconductor companies provide chipsets for code division multiple access (CDMA) voice applications, service providers are demanding proof of interoperability between multiple semiconductor providers' vocoder implementations.
As mentioned above, the vocoder specification and corresponding distributed reference floating-point C-language code fails to sufficiently address how to process zero-or low-level input speech signals when the encoding rate is determined to be full-rate. However, it is exactly these types of speech signals which stress the vocoder most and for which it is very difficult to receive a passing score on the MOS test. Specifically, conventional vocoders fail to encode the data sufficiently when the encoding rate is full-rate and one or more subframes of the source material is a zero or low-level energy signal.
It has been observed that one or more subframes of the source material is a zero or low-level energy signal in at least the following three situations. First and most prevalent, conventional vocoders force the first frame always to be encoded at full-rate. If the input file has zero or low-level input, the vocoder will produce tones at audible harmonically-related frequencies. Second, if there is a sudden, short, quiet region in between two loud regions of speech, the vocoder will produce tones at the various frequencies. In this second case, conventional approaches attempt to code the first loud region as full-rate and then when the vocoder encounters the quiet region, the vocoder ideally would switch to eighth-rate encoding. However, the instantaneous switching between full- to eighth-rate encoding is prohibited by a process referred to as “hangover processing”. Simply stated, hangover processing says “If the last frames encoding rate was Rate 1 and the current frame is determined not to be a Rate 1 frame, then the next M (some integer) frames are encoded as Rate 1 before allowing the encoding rate to drop to Rate ½ (half-rate) and then to Rate ⅛ (eighth rate)”. Third, due to frame offsets, a situation can occur wherein a frame is to be encoded at full-rate, but the one or more subframes (1.25 ms) of the frame contain zero or low-level input while other subframes of the same frame contain high energy. Due to this fundamental flaw with some conventional vocoders, any conventional fixed-point or floating point approach will contain audible harmonically-related frequencies when any one of the three aforementioned scenarios occur. The result being a failure of the MOS test.
Thus, a need exists for a method for use in a vocoder system wherein the method reduces the creation of undesired, audible, harmonically-related frequencies when the encoding rate is determined to be full-rate and the source material is a zero or low-level energy signal situation. Still another need exists for a method for use in a vocoder system wherein the method meets the above need and further enables successful passing of subjective listening quality tests. Yet another need exists for a method for use in a vocoder system wherein the method meets both of the above needs and does not require complete revamping of existing vocoder systems and requiring minimal impact on the code size, computational complexity (MIPS, millions of instructions per second), and RAM (random access memory) requirements.