Development of low bit rate (4.8 kb/s and below) speech coding methods with very high speech quality is currently a popular research subject. In order to achieve high quality speech compression, a robust voicing classification of speech signals is required.
An accurate representation of voiced or mixed type of speech signals is essential for synthesizing very high quality speech at low bit rates (4.8 kb/s and below). For bit rates of 4.8 kb/s and below, conventional Code Excited Linear Prediction (CELP) does not provide the appropriate degree of periodicity. A small code-book size and coarse quantization of gain factors at these rates result in large spectral fluctuations between the pitch harmonics. Alternative speech coding algorithms to CELP are the Harmonic type techniques. However, these techniques require robust pitch and voicing algorithms to produce a high quality speech.
Previously, the voicing information has been presented in a number of ways. In one approach, an entire frame of speech can be classified as either voiced or unvoiced. Although this type of voicing determination is very efficient, it results in a synthetic, unnatural speech quality.
Another voicing determination approach is based on the Multi-Band technique. In this technique, the speech spectrum is divided into various number of bands and a binary voicing decision (Voiced or Unvoiced) is made for each band. Although this type of voicing determination requires many bits to represent the voicing information, there can be voicing errors during classification, since the voicing determination method is an imperfect model which introduces some "buzziness" and artifacts in the synthesized speech. These errors are very noticeable, especially at low frequency bands.
A still further voicing determination method is based on a voicing cut-off frequency. In this case, the frequency components below the cut-off frequency are considered as voiced and above the cut-off frequency are considered as unvoiced. Although, this technique is more efficient than the conventional multi-band voicing concept, it is not able to produce voiced speech for high frequency components.
Accordingly, it is an object of the present invention to provide a voicing method that allows each frequency band to be composed of both voiced and unvoiced energy to improve output speech quality.