The present invention relates generally to vocoders and methods of operating vocoders. For purposes of the present invention, vocoders receive digitized samples of an analog voice signal and compress or encode the samples so that a resulting code characterizes the analog voice signal. The resulting code may then be applied to a channel, such as a transmission channel or a storage device. Such channels typically have a bandwidth which accommodates the resulting code, but is too low to accommodate the digitized samples. The resulting code, characterizes the original analog voice signal so that it may be decoded or expanded by a vocoder to produce samples that reproduce the voice signal as perceptually accurately as possible. The present invention relates to vocoders which seek to achieve optimal voice quality in the reproduced voice signal for a given bit rate. Specifically, the present invention relates to vocoders which utilize a variable frame rate in the compression or encoding operations.
Voice represents a complicated analog signal which is not easily compressed so that an accurate reproduction will result. For example, vowel sounds require a relatively long analysis window so that a relatively high degree of spectral accuracy can be achieved. The relatively high degree of spectral accuracy is required so that a later synthesized vowel sound will appear to accurately reproduce the original analog voice signal to a listener. On the other hand, consonant sounds require a relatively short analysis window so that a relatively high degree of temporal resolution may be achieved. The high degree of temporal resolution is required so that a later synthesized consonant sound will appear as an accurate reproduction of the original voice signal to a listener.
FIG. 1 shows the relationship between spectral accuracy and temporal resolution. Generally speaking, at a given bit rate a vocoder can achieve a high spectral accuracy by sacrificing temporal resolution, or can achieve a high degree of temporal resolution by sacrificing spectral accuracy.
Many conventional vocoders which apply coded voice to a fixed rate channel do not vary frame rate. Accordingly, designs of such systems attempt to trade off temporal resolution, which is needed to achieve accurate reproduction of consonants, with spectral accurate, which is needed to achieve accurate reproduction of vowels, and vice versa. Consequently, noticeably inaccurate reproductions for both vowels and consonants results. Reproduced consonants become slightly slurred and vowels do not faithfully reproduce nasal perceptions and voiced fricative perceptions.
A conventional solution to the problem of noticeably inaccurate reproductions of vowel and consonant sounds varies the analysis window, or frame, over which samples are coded so that short frames are used for analysis of consonants and long frames are used for analysis of vowels. However, a cumbersome vocoder architecture results from conventional implementations which adapt such variable frame rate vocoding methods for use with fixed rate channels. Such conventional implementations typically require elaborate buffering schemes with feedback systems to maintain a constant bit rate in spite of the variable frame rate. In some conventional systems, the buffering introduces an unacceptable delay.