1. Field of the Invention
The present invention relates to a speech recognition device and a speech recognition performance improvement method. More particularly, the present invention relates to a speech recognition device for improving speech recognition performance in a noisy environment and a speech recognition performance improvement method therefor.
2. Description of the Related Art
Speech recognition devices, by which an operation of vehicle-mounted devices such as audio devices, navigation systems, etc., is performed using speech, have been put into practical use. FIG. 6 is a block diagram of such a speech recognition device. A microphone 1 for entering speech detects speech by a speaker and generates a speech signal. An A/D converter 2 converts the speech signal into digital form. An operation section 3 instructs the starting of speech recognition by operating a switch (not shown). A speech recognition engine 4 recognizes entered speech when the starting of speech recognition is instructed.
An example of the speech recognition engine 4 is disclosed in Japanese Unexamined Patent Application Publication No. 59-61893. In this conventional technology, speech recognition is performed by comparing a feature pattern for each of a series of single syllables in word entered speech with a standard pattern, and by referring to a word dictionary the recognized result is output as a word having a meaning.
In a case where noise is superimposed on speech data that is entered to a speech recognition system, if the speech data is entered to a speech recognition engine by changing the start position of a speech region, such as a portion of a non-speech region which is a start portion of the data being deleted (by changing the length of the non-speech region), there are cases in which the recognized result changes. That is, even in the case of the same produced speech, the correctness of the recognized result is changed depending on the speech-producing timing (the start position of the speech region).
This phenomenon hardly appears in a case where the magnitude of noise that is superimposed onto speech data, for example, noise inside a vehicle, is sufficiently small with respect to the speech (the S/N ratio is high), but when the magnitude of noise inside a vehicle is large with respect to the speech (the S/N ratio is low), this phenomenon appears conspicuously. The reason why such a phenomenon occurs is that, when the speech recognition engine 4 measures the noise level of the background in a non-speech region *SIT (FIG. 7) and performs a speech recognition process on the speech data of the speech region SIT, that noise level is used. The non-speech region *SIT is a region from the time tB at which the starting of speech recognition was instructed using a switch to the starting position (speech-producing timing) tST of the speech region SIT.
Since this measurement of noise data is a measurement at a region of a short time, even in the case of the noise under the same conditions, measured results vary depending on the measurement position. For this reason, the recognized results vary, with the result that the data may be recognized correctly or incorrectly. For example, as shown in FIG. 7, if the noise level is assumed to be an average level of the non-speech region *SIT and speech recognition is performed using speech data in the speech region SIT by taking the noise level into consideration, since the noise level is high at the start point of the non-speech region *SIT in FIG. 7, the shorter the non-speech region *SIT, that is, the earlier the speech-producing timing tST, the higher the average level; and the longer (the later) the speech-producing timing tST, the lower the average level becomes. In the manner described above, the level of the noise to be measured varies depending on the speech-producing timing tST, and as a result, the correctness of the recognized result changes.
The above phenomenon shows a situation in which, even in an environment where a certain degree of S/N is ensured, incorrect recognition occurs due to the timing of the speech production. When viewed from the user side, the recognition performance is decreased, which causes a problem.
In the conventional technology, including the technology of Japanese Unexamined Patent Application Publication No. 59-61893, improvement of the recognition rate is sought by exclusively increasing the recognition accuracy of the speech recognition engine, but there are limits.