1. Field of the Invention
The present invention relates to a method and apparatus for extracting voiced/unvoiced classification information, and more particularly to a method and apparatus for extracting voiced/unvoiced classification information using a harmonic component of a voice signal, so as to accurately classify the voice signal into voiced/unvoiced sounds.
2. Description of the Related Art
In general, a voice signal is classified into a periodic (or harmonic) component and a non-periodic (or random) component (i.e. a voiced sound and a sound resulting from sounds or noises other than a voice, herein after referred to as an “unvoiced sound”) according to its statistical characteristics in a time domain and a frequency domain, so that the voice signal is called a “quasi-periodic” signal. In this case, a periodic component and a non-periodic component are determined as being a voiced sound and a unvoiced sound according to whether pitch information exists, the voiced sound having a periodic property and the unvoiced sound having a non-periodic property.
As described above, voiced/unvoiced classification information is the most basic and critical information to be used for coding, recognition, composition, reinforcement, etc., in all voice signal processing systems. Therefore, various methods have been proposed for classifying a voice signal into voiced/unvoiced sounds. For example, there is a method used in a phonetic coding, whereby a voice signal is classified into six categories including an onset, a full-band steady-state voiced sound, a full-band transient voiced sound, a low-pass transient voiced sound, and low-pass steady-state voiced and unvoiced sounds.
Particularly, features used for voiced/unvoiced classification include a low-band speech energy, zero-crossing count, a first reflection coefficient, a pre-emphasized energy ratio, a second reflection coefficient, casual pitch prediction gains, and non-casual pitch prediction gains, which are combined and used in a linear discriminator. However, since there is not yet a voiced/unvoiced classification method using only one feature, the performance for voiced/unvoiced classification is greatly influenced depending on how to combine a plurality of these features.
Meanwhile, during voicing, since a higher power is output by a vocal system (i.e. a system of making a voice signal), a voiced sound occupies a great portion of a voice energy, so that a distortion of a voiced portion in a voice signal exerts a great effect upon the entire sound quality of a coded speech.
In such a voiced speech, since interaction between glottal excitation and the vocal tract causes difficulty for spectrum estimation, measurement information with respect to a degree of voicing is necessarily required in most of voice signal processing systems. Such measurement information is also used in voice recognition and voice coding. Particularly, since the measurement information is an important parameter to determine the quality of sound in voice composition, use of wrong information or a misestimated value results in performance degradation in voice recognition and composition.
However, since an estimated phenomenon itself includes randomness to some degree as its characteristic, such an estimation is performed in a predetermined period, and the output of a voicing measure includes a random component. Therefore, a statistical performance measurement scheme may be used appropriately upon evaluation of the voicing measure, and the average of a mixture estimated using a great number of frames is used as a primary index (indicator).
As described above, although there are a plurality of features used to extract voiced/unvoiced classification information in the prior art, it is impossible to classify voiced/unvoiced sounds by a single feature. Therefore, voiced/unvoiced sounds have been classified by using a combination of features, any one of which cannot provide reliable information by itself. However, the conventional methods have a correlation problem between the features and a performance degradation problem due to noise, so a new method capable of solving these problems has been required. Also, the conventional methods do not properly express the existence of a harmonic component and a degree of harmonic component, which are essential differences between a voiced sound and a unvoiced sound. Therefore, it is necessary to develop a new method capable of accurately classifying voiced/unvoiced sounds through the analysis of a harmonic component.