1. Field of the Invention
The present invention relates to speech processing, and more particularly to a voicing determination of the speech signal having a particular, but not exclusive, application to the field of mobile telephones.
2. Description of the Prior Art
In known speech codecs the most common phonetic classification is a voicing decision, which classifies a speech frame as voiced or unvoiced. Generally speaking, voiced segments are typically associated with high local energy and exhibit a distinct periodicity corresponding to the fundamental frequency, or equivalently pitch, of the speech signal, whereas unvoiced segments resemble noise. However, a speech signal also contains segments, which can be classified as a mixture of voiced and unvoiced speech where both components are present simultaneously. This category includes voiced fricatives and breathy and creaky voices. The appropriate classification of mixed segments as either voiced or unvoiced depends on the properties of the speech codec.
In a typical known analysis-by-synthesis (A-b-S) based speech codec, the periodicity of speech is modelled with a pitch predictor filter, also referred to as a long-term prediction (LTP) filter. It characterizes the harmonic structure of the spectrum based on the similarity of adjacent pitch periods in a speech signal. The most common method used for pitch extraction is the autocorrelation analysis, which indicates the similarity between the present and delayed speech segments. In this approach the lag value corresponding to the major peak of the autocorrelation function is interpreted as the pitch period. It is typical that for voiced speech segments with a clear pitch period the voicing determination is closely related to pitch extraction.