The invention relates to a method of automatic recognition of an at least partly spelled speech utterance, with a speech recognition unit based on statistical models including a linguistic speech model.
The automatic recognition of spelled speech utterances nowadays still has high error rates. On the one hand, the problem is to detect the boundaries between the individual letters, because a user, when spelling, regularly pronounces the individual letters without pauses i.e. silences between the letters. Furthermore, it is hard to acoustically model the letters representing brief speech units and being without context.
In the field of navigation systems for motor vehicles it is known that an entry mode is rendered available to a user in which navigation dataxe2x80x94for example, place namesxe2x80x94are entered by spelling out (cf. the Carin navigation system).
With the entry of a place name in such a navigation system, to be briefly explained in the following, and after the entry mode for entering place names has been activated, the letters of the respective alphabet that can be entered are shown to the user on a picture screen. By turning a multifunction button, the user can switch to and fro between the individual letters. The selection and thus entry of a letter is effected by pressing the multifunction button. Before the first letter of the respective place name is entered the user is offered all the letters of the respective alphabet to select from. After the user has selected a first letter, the navigation system performs a comparison with a database stored on a compact disc (CD). The result provides information about which letters in the place names that can be processed by the system can follow each other. Thus, after the user has entered a first letter, the comparison with the database will lead to the fact that no longer the total alphabet is selectable for entering the next letter, but only a part of the alphabet. Accordingly, as a second letter there can only be selected by means of the multifunction button a letter belonging to this part of the alphabet. With each entry of a letter, the part of the alphabet that can be selected is reduced in most cases; in exceptional cases such a part may also remain unchanged after a letter has been entered. For the case where a certain entered letter sequence can only be followed by a certain letter or a certain letter sequence, the entry of these letters is no longer necessary for the user, because the navigation system automatically assumes this (these) letter(s) as if it was (they were) entered by the user. The entry mode leads to a faster entry of spelled place names, which is also more comfortable to the user.
It is an object of the invention to improve the method defined in the opening paragraph for automatic recognition of a spelled speech utterance so that, in addition to a more convenient entry, also a reduced speech recognition error rate is achieved.
The object is achieved in that
after the at least partly spelled speech utterance has been entered, the speech recognition unit (2) determines a first recognition result for the speech utterance;
individually recognized letters are sent to the user for him to acknowledge or reject;
after a letter has been acknowledged, the linguistic speech model (6) is adapted, which linguistic speech model, after its adaptation, determines the number of letters that can be allowed as followers of the acknowledged letter and assumes the correctness of letters already acknowledged;
with the adapted linguistic speech model the speech recognition unit determines a further recognition result for the speech utterance, from which result the next letter to be sent to the user is determined, so that he can acknowledge it.
By means of the processing steps, in which the user is requested to acknowledge or reject recognized letters, the system receives a feedback relating to the correctness of the recognition result achieved thus far relative to the speech utterance to be recognized. The speech utterance to be recognized may be a single word or a word sequence, while the entry processed according to the invented method is the whole speech utterance spelled out or partly spelled out. The successive feedback is used for the step-by-step improvement of the statistic modeling used in the speech recognizer by a reduction of the search space. This leads to the fact that with each improvement the probability diminishes that a wrong letter is sent to the user to be acknowledged, which in its turn reduces the required time until the final recognition of the spelled speech utterance. The method thus enhances the convenience to the user. The acoustic models used in the speech recognition unit, which models were estimated on the basis of the spelled part of the speech utterance, need not be adapted according to the invented recognition procedure. Only the linguistic model used each time depends on the just processed position in the speech utterance.
For reducing the search space during the speech recognition, linguistic speech models are normally used. On the one hand, this reduces the computational expenditure for controlling the speech recognition unit and, on the other hand, this also brings in an improvement of the recognition results. However, there is the problem that a long linguistic speech model leads to too large acoustic search spaces. The processing of such a speech model requires very much memory capacity and cannot at present be realized or is inefficient with customary signal processors used for speech recognition applications. Thanks to the invention, the complexity of the linguistic speech model, on the other hand, is minimized. The speech model is successively adapted in dependence on the user""s acknowledgements of letters. Already acknowledged letter sequences are then presupposed as fixed. Only for the letters acknowledged last is there determined with the aid of the linguistic speech model which letters are selectable as following letters. Such a speech model is highly uncomplicated and can easily be converted into the speech recognition procedures used by means of customary signal processors with little calculation effort and memory capacity.
For the case where the user rejects a recognized letter, preferably two alternatives for a further processing are considered. On the one hand, the speech recognition unit can perform a renewed recognition operation with respect to the whole speech utterance after the linguistic speech model has been adapted including this information. The probability that the user is given the correct letter as a next proposed letter is increased considerably. On the other hand, there is also the possibility that the speech recognition unit determines a list N of best recognition alternatives as a recognition result for the speech utterance and that, after the user has rejected a recognized letter, the user is given the respective letter of the second-best solution alternative. This has the advantage that, after the user has rejected a letter sent to him as a recognition proposal, the speech recognition unit need not again perform the speech recognition procedures with respect to the (complete) spelled speech utterance, which achieves that after a rejection of a produced letter the user is given a further letter alternative with a minimum time delay.
If individual position-specific probability values particularly depending on all the previous letters are assigned to separate letters, which fact can be converted as a specification of the linguistic speech model used, the probability is enhanced that already a first proposal for a letter standing at a specific position of the speech utterance is correct and is acknowledged by the user. Here is used to advantage that certain letter combinations occur more often than other letter combinations.
In another embodiment of the invention the degree of exchangeability with other letters expressed by the probability value is taken into account when an alternative to a letter rejected by the user is to be determined. Certain letters, such as, for example, xe2x80x9cdxe2x80x9d and xe2x80x9ctxe2x80x9d are acoustically more similar than other letters. This may be converted as information in the linguistic speech model so that, in case the user rejects such a letter, a higher probability than the probability of other letters is assumed, so that a letter defined as acoustically similar to this letter is the correct one and it has actually been entered as part of the spelled speech utterance.
As a further embodiment there is proposed that invalid and wrongly recognized initial letter combinations in the linguistic speech model are defined as invalid and are not proposed to the user and that in such a case the speech recognition unit determines a further recognition result for the speech utterance by means of the adapted linguistic speech model, from which result is determined the letter to be supplied to the user for him to acknowledge. To keep the speech model small, invalid initial letter combinations as such are not included in the speech model until the speech recognizer has falsely hypothesized them. If there is assumed that the entered speech utterance contains only words from a limited number of words, for example, when in a certain entry mode only place names are entered in a navigation system for motor vehicles, by means of this variant of the invention the basic linguistic speech model can accordingly be adapted, because the number of possible speech recognition results relating to the entered speech utterance may really be considered limited. This leads to a reduced search space of the speech recognizer and, finally, to an avoidance of proposed recognition results with initial letters or initial letter combinations which are a priori wrong and are to be disregarded.
The invention relates to an electrical appliance, more particularly a navigation system for motor vehicles, for implementing one of the methods described above. All the electrical appliances that have functional units which include a speech recognition unit are considered, in which also an entry by means of spelled speech utterances is possible.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiment(s) described hereinafter.