1. Field of the Invention
The present invention relates generally to speech coding and, more particularly, to Code Excited Linear Prediction (CELP) for wideband speech coding.
2. Related Art
Generally, a speech signal can be band-limited to about 10 kHz without affecting its perception. However, in telecommunications, the speech signal bandwidth is usually limited much more severely. It is known that the telephone network limits the bandwidth of the speech signal to between 300 Hz to 3400 Hz, which is known as the “narrowband”. Such band-limitation results in the characteristic sound of telephone speech. Both the lower limit at 300 Hz and the upper limit at 3400 Hz affect the speech quality.
In most digital speech coders, the speech signal is sampled at 8 kHz, resulting in a maximum signal bandwidth of 4 kHz. In practice, however, the signal is usually band- limited to about 3600 Hz at the high-end. At the low-end, the cut-off frequency is usually between 50 Hz and 200 Hz. The narrowband speech signal, which requires a sampling frequency of 8 kb/s, provides a speech quality referred to as toll quality. Although this toll quality is sufficient for telephone communications, for emerging applications such as teleconferencing, multimedia services and high-definition television, an improved quality is necessary.
The communications quality can be improved for such applications by increasing the bandwidth. For example, by increasing the sampling frequency to 16 kHz, a wider bandwidth, ranging from 50 Hz to about 7000 Hz can be accommodated, which is referred to as the “wideband”. Extending the lower frequency range to 50 Hz increases naturalness, presence and comfort. At the other end of the spectrum, extending the higher frequency range to 7000 Hz increases intelligibility and makes it easier to differentiate between fricative sounds.
Digitally, speech is synthesized by a well-known approach known as Analysis-By-Synthesis (ABS). Analysis-By-Synthesis is also referred to as closed-loop approach or waveform-matching approach. It offers relatively better speech coding quality than other approaches for medium to high bit rates. A known ABS approach is the so-called Code Excited Linear Prediction (CELP). In CELP coding, speech is synthesized by using encoded excitation information to excite a linear predictive coding (LPC) filter. The output of the LPC filter is compared against the voiced speech and used to adjust the filter parameters in a closed loop sense until the best parameters based upon the least error is found. The problem with this approach is that the waveform is difficult to match in the presence of noise in the speech signal.
Another method of speech coding is the so-called harmonic coding approach. Harmonic coding assumes that voiced speech is approximated by a series of harmonics. And when all the harmonics are added together, a quasi-periodic waveform appears. Thus working on the principle that voiced speech is quasi-periodic, it is easier to match voiced speech using prior art Harmonic coding approaches.
Waveform matching or harmonic coding is easier for periodic speech components than non-periodic speech components. This is because non-periodic speech signal is random-like and broadband thus would not fit in the basic harmonic model. However, the harmonics approximation approach may be too simplistic for real voiced signals because real voiced signals include irregular (i.e. noise) components. Thus, high quality waveform-matching becomes difficult even for voiced speech, because of significant irregular components that may exist in the voiced signal especially for wideband speech signal. These irregular components usually occur in the high frequency areas of the wideband voice signals but, may also be present throughout the voice band.
The present invention addresses the above voiced speech issue because real world speech signal may not be periodic enough so that a perfect waveform matching becomes difficult.