1. Field of the Invention
The present invention relates to a method and an apparatus of estimating a voicing, i.e. a voiced sound, for speech recognition by using local spectral information.
2. Description of Related Art
In a time domain, a frequency domain or a time-frequency hybrid domain of voice signals, a variety of coding methods that execute signal compression by using statistical properties and human's auditory features have been proposed.
Until now, there have been few approaches to speech recognition by using an extraction of voicing information from voice signals. A method of detecting voiced and unvoiced sounds from a voice signal input is executed generally in the time domain or the frequency domain.
A method, executed in the time domain, uses a zero-crossing rate and/or a frame mean energy of voice signals. Although guaranteeing some detectability in a clean (i.e., quite) environment, this method may show a remarkable drop in detectability in a noisy environment.
Another method, executed in the frequency domain, uses information about low/high frequency components of voice signals or uses pitch harmonic information. This conventional method may, however, estimate a voicing in an entire spectrum region.
FIG. 1 is an example of graph used for estimating a voicing in the whole spectrum region according to such a conventional method.
As shown in FIG. 1, a conventional method estimates a voicing in the entire spectrum region and thus may have some problems. One of the problems is that it unnecessarily refers to certain frequencies lacking voice components. Another problem is that it often fails to determine whether a colored noise is a harmonic or a noise. Additionally, as FIG. 1 shows, it may be difficult in some cases to find harmonic components at 1000 Hz or more.