1. Field of the Invention
The present invention relates generally to speech signal processing, and in particular, to an apparatus and method for detecting a degree of voicing of a speech signal.
2. Description of the Related Art
A method of separating a speech signal, which is used to perform phonetic coding into a voiced and unvoiced sound can be divided into six categories, such as onset, full-band steady-state voiced, full-band transient voiced, low-pass transient voiced, low-pass steady-state voiced, and unvoiced, for phonetic segmentation. Features used for the voiced and unvoiced separation and are combined and used by a linear discriminator are low-band speech energy, zero-crossing count, first reflection coefficient, pre-emphasized energy ratio, second reflection coefficient, casual pitch prediction gains, and non-casual pitch prediction gains. As described above, there exist many features used for the separation and feature extraction of voiced and unvoiced sounds, however, since information is insufficient to separate the voiced and unvoiced sounds using a single feature for each of the voiced and unvoiced sounds, they are separated by combining several features. Thus, how to combine and use several features significantly affects the accuracy of the voiced and unvoiced separation.
However, since correlations between the features exist, when several features are combined, the correlations must be considered, resulting in severe performance degradation related to noise. In addition, the existence or not of a harmonic component, which is an essential difference between the voiced sound and the unvoiced sound, and a difference between harmonic degrees cannot be normally represented, and thus, a feature extraction method for correctly performing the voiced and unvoiced separation by analyzing the harmonic component is required.
In order to correctly estimate the degree of voicing, sensitivity of a voiced sound included in a speech signal, tone of pitches, smoothing variation of pitches, insensitivity of randomness of a pitch period, insensitivity of a spectrum envelope, and subjective performance must be considered.