1. Field of the Invention
The present invention relates to a speech recognition technique using a DP (Dynamic Programming) matching method, a HMM (Hidden Markov Model) method or the like, and more particularly, to a speech recognition apparatus and a speech recognition method with recognition accuracy improved by correctly detecting a consonant at a leading position (hereinafter referred to a lead consonant) of a speech.
2. Description of the Background Art
In recent years, enthusiastic development of a speech recognition apparatus has been seen in information processing system such as a personal computer, a word processor and others in order to enable text input or the like with a speech. In a conventional speech recognition apparatus, well used are speech recognition techniques such as a DP matching method in which a variation in word spoken rate is effectively absorbed by application of pattern matching through non-linear expanding/shrinking of the time axis and a HMM method by which high recognition accuracy can be attained even against variations in voice spectrum caused by an individual difference of a speaker.
FIG. 1 is a block diagram representing the schematic configuration of a conventional speech recognition apparatus. The speech recognition apparatus includes: a microphone 101 converting a speech of a speaker to an analog, electrical signal; an A/D (Analog/Digital) converter 102 converting an analog signal outputted from the microphone 101 to sound data of digital information; a sound analyzer 103 analyzing the sound data outputted from the A/D converter 102 to convert it to a feature parameter 104; an speech detector 105 detecting an interval of the speech using the sound data outputted from the A/D converter 102; a matching processing unit 106 performing matching processing of a feature parameter 104 with registered data based on a detection result obtained by the speech detector 105; and a recognition judgment unit 107 performing judgment on recognition based on a matching result obtained by the matching processing unit 106 to output a recognition result 108.
Feature parameters adopted here are as follows: power, xcex94 power, LPC (Linear Predictive Coding) cepstrum, LPC xcex94 cepstrum and others.
The speech detector 105 calculates sound power through operation of the following equation based on the sound data and judges an interval in which sound power exceeds a prescribed threshold value as a speech interval:                     P        =                              ∑                          i              =              0                        N                    ⁢                      x            i            2                                              (        1        )            
where xi is an amplitude value of an ith sound in a frame and N is the number of samples in one frame.
In a case where no noise is mixed into a speech as shown in FIG. 2A in the above described speech interval detection method, it is possible to correctly detect a lead consonant interval of the speech from the sound data. The recognition judgment unit 107 can output a correct recognition result of a speech interval.
However, in a case where S/N ratios of the microphone 101 and others are bad and noises are mixed into a speech as shown in FIG. 2B, the lead consonant interval of a speech is embedded in the noises. The sound data results in lacking information on a lead consonant component and thereby the recognition judgment unit 107 has an output of a limited recognition result in a detectable range.
Furthermore, a method can be adopted in which like a spectral subtraction technique, information on frequencies of noises are detected in advance to calculate an average thereof and subtraction is performed of the average from each speech frame, followed by detection of a lead consonant interval. This method, however, has problems because of increasing an operational volume to negate high speed processing and since an adverse influence has a chance to be exerted on waveforms themselves of a speech to be analyzed in an environment of high noise levels, thereby disabling correct speech recognition.
It is accordingly an object of the present invention is to provide a speech recognition apparatus and a speech recognition method capable of causing matching processing to reflect information on a lead consonant component even when the lead consonant cannot be detected due to a noise.
It is another object of the present invention to provided a speech recognition apparatus and a speech recognition method capable of solving a deviation of a start edge position in the matching processing.
It is still another object of the present invention to provide a speech recognition apparatus and a speech recognition method in which a speech recognition speed is increased by reducing the number of matching processing times.
It is a further object of the present invention to provide a speech recognition apparatus and a speech recognition method capable of outputting a recognition result with high possibility even when no correct recognition result is attained.
According to an aspect of the present invention, a speech recognition apparatus includes: a sound analyzer converting sound data to a feature parameter; a voiced sound detector detecting a voiced sound component at a leading position (hereinafter referred to as a lead voiced sound) from the sound data; a lead consonant buffer storing a feature parameter preceding a lead voiced sound detected by the voiced sound detector as a feature parameter of a lead consonant therein; and a recognition processing section performing recognition processing referring to the feature parameter of the lead consonant stored in the lead consonant buffer.
Since the feature parameter preceding a lead voiced sound detected by the voiced sound detector is stored in the lead consonant buffer as a feature parameter of a lead consonant, recognition processing reflecting information on a lead consonant can be performed even when the lead consonant is not detected due to a noise.
According to another aspect of the present invention, a speech recognition method includes the steps of: converting sound data to a feature parameter; detecting a lead voiced sound from the sound data; storing a feature parameter preceding the lead voiced sound detected as a feature parameter of a lead consonant; and performing recognition processing referring to the feature parameter of a lead consonant stored.
Since a feature parameter preceding a lead voiced sound detected is stored as a feature parameter of a lead consonant, matching processing reflecting information on the lead consonant can be performed even when the lead consonant is not detected due to a noise.
The foregoing and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.