In general, a speech recognition apparatus in a real environment receives speech that deteriorates as it is mixed with noise and sound reverberations. The speech may also deteriorate depending on the specification of an input device. In order to cope with this problem, some approaches have been proposed for improving robustness of speech recognition by using such techniques as spectral subtraction, blind source separation and so on. One of such approaches proposed by M. Cooke et al. of Sheffield University is a missing feature theory (“Robust automatic speech recognition with missing and unreliable acoustic data”, SPEECH COMMUNICATION 34, p. 267-285, 2001 by Martin Cooke et al.). This approach aims at improving robustness of speech recognition by identifying and masking missing features (that is, deteriorated features) contained in the features of an input speech. This approach is advantageous in that it requires less knowledge about noises in comparison with the other approaches.
In a missing feature theory, deteriorated features are identified based on difference from the features of non-deteriorated speech, based on local SN ratio of spectrum or based on an ASA (Auditory Scene Analysis). The ASA is a method of grouping components of the features by utilizing certain clue that is commonly included in sounds that are radiated from the same sound source. Such clue is, for example, harmonic structure of spectrum, synchronization of on-set, position of the source or the like. Speech recognition includes several methods such as a method of recognizing speech by estimating original features for a masked portion and a method of recognizing speech by generating a sound model corresponding to masked features.