1. Field of the Invention
The present invention relates to a method and apparatus for vocal-cord signal recognition with a high recognition rate in a noisy environment.
2. Description of the Related Art
FIG. 1 is a block diagram of a conventional apparatus for speech recognition. Referring to FIG. 1, the conventional apparatus for speech recognition includes a feature extracting unit 100 and a speech recognizing unit 110. The feature extracting unit 100 extracts particular features appropriate for speech recognition from an audio signal input from a microphone. However, the extraction of the particular features depends highly on the performance of the apparatus for speech recognition. Particularly, since the extraction of the particular features is degraded as the noise in the environment increases, various methods are used to extract particular features that are noise-robust.
Examples of a method of distance scale that is robust to additive noise includes a short-time modified coherence (SMC) method, a relative spectral (RASTA) method, a perpetual linear prediction (PLP) method, a dynamic features parameter method, and a cepstrum scale method. Examples of a method of removing noise are a spectral subtraction method, Bayesian estimation method, and a blind source separation method.
As a prior art of the apparatus for speech recognition, Korean Patent Publication No. 2003-0010432 discloses an “Apparatus for speech recognition in a noisy environment” which uses a blind source separation method. Noise included in two audio signals input to two microphones is separated using a learning algorithm that uses an independent component analysis (ICA). As a result, speech recognition rate is improved by the improved audio signals. However, the learning method using the ICA cannot be adopted in an apparatus for real-time speech recognition because the calculation of the learning algorithm is complex.
A Mel-frequency cepstral coefficient (MFCC), a linear prediction coefficient cepstrum, or a perceptual linear prediction cepstrum coefficient (PLPCC) are widely used as a method of extracting features of a signal after going through a pre-processing that removes noise or improves quality of the sound.
The speech recognizing unit 110 measures similarity between the vocal cord signal and the audio signal using the particular features extracted by the feature extracting unit 100 to calculate the result of speech recognition. To do this, hidden Markov model (HMM), a dynamic time warping (DTW), and a neural network are popularly used.