The invention relates to a method of and an arrangement for speech recognition.
For the execution of speech recognition it is necessary for a user to supply respective speech utterances to a speech recognizer. There is a plurality of criteria for this, which influence the quality of the recognition result produced from the speech utterance. A user is often not aware of the criterions of such a speech recognition device. Only the experienced user of a speech recognition device is successful in keeping the error rate in the recognition process so low that an acceptable result is achieved. Speech recognition devices have been developed such that also different speakers can produce speech utterances, which are then recognized by the speech recognition system. Such speech recognizers are denoted as speaker-independent speech recognition systems.
WO 87/07460 describes a telephone-based speech recognition system, in which the speech recognizer informs a user that no appropriate word was found in the vocabulary. The user is requested to repeat the speech utterance. When the speech recognition system is supplied with a too low-voiced or disturbed speech utterance, the user of the speech recognition system is requested to speak into the microphone at a higher voice.
However, the fact that each speech recognizer is based on an acoustic model, which in turn is based on an average speech velocity, is not taken into account. A deviation of the user""s speech velocity from the average speech velocity of the acoustic model considerably increases the error rate during the recognition process.
It is an object of the invention to provide a method and a device in which the error rate during the recognition process is reduced.
The object is achieved in that the speech velocity is measured, which the user is informed of.
For executing speech recognition, a user produces a speech utterance with an appropriate velocity. Acoustic speech models, which are based on an average velocity, are used for the recognition process. It is necessary to make an adaptation for a deviating speech velocity. With smaller deviations of the speech velocity from the average speech velocity of the acoustic model, an adaptation is possible, but leads to a degraded evaluation of hypotheses in the recognition process as a result of the time distortion resulting from the adaptation. With larger deviations from the average speech velocity, an adaptation is no longer possible, because the models cannot be run through with optionally high velocity. Furthermore, when most users speak fast, they often tend to swallow short words or word endings. Such errors cannot be reliably recognized by the speech recognition system. Whereas an adaptation is possible to a certain extent, an avoidance of pronunciation inadequacies with many possible users is only possible in the form of a reduction of the pronounciation inadequacies. The lacking words or words not fully pronounced can in all cases be inferred from the context.
Therefore, the user""s speech velocity is measured and the user is informed thereof via an output means. The user is then persuaded to stick to an optimized speech velocity, which is oriented to the average speech velocity of the speech model. Output means are, for example, LEDs for which a respective color (green) shows an acceptable speech velocity and another color (red) shows an unacceptable deviation. Another possibility is the display of a number value by which the user is informed of the range in which this number value is to lie. With an exclusively audio-based communication, the user is informed by means of a warning signal that the speech velocity lies outside an acceptable range.
As a result, a lower error rate is advantageously achieved during the recognition process.
The measured speech velocity can advantageously also be applied to the speech recognizer as a confidence measure or for controlling the search process. The speech recognizer is then informed of a measure according to which the speech recognizer can decide whether the speech velocity lies within respective limits, or whether a too high speech velocity is to be taken into account during the recognition process. The same holds for a too low speech velocity.
During the recognition the speech velocity is determined by means of a suitable measure. A speech recognition system is to be trained before it can perform a recognition. Therefore, it is important to take the speech velocity into account and announce this to the user already during the training.
A measure for the speech velocity is, for example, the number of spoken and also recognized words per time unit. However, more accurate is the measurement of the recognized phonemes per frame where the frame is considered to be a predefined time interval.
The announcement whether the speech velocity lies in the acceptable range may be linked with a transgression of an experimentally determined threshold value. Consequently, the user is then only informed when the speech velocity is too high, so that he is not distracted by the informationxe2x80x94speech velocity lies in the acceptable range.
The threshold value may also be determined during the recognition process in that respective measures are transgressed or fallen short of during the recognition process.
A particular advantage of this invention is obtained from the learning process, which a user is to undergo. The user makes a great effort to attain a high efficiency when using a speech recognition system. Since his speech velocity is displayed relative to the average speech velocity of the acoustic model, he consequently learns to adapt his speech velocity and thereby achieves a low error rate.
The object of the invention is achieved by a speech recognition device in which a measuring unit determines a speech velocity and informs the user thereof by means of an output unit.