1. Technical Field
The present invention relates to the field of computerized speech recognition, and more particularly in the field of computerized speech recognition on input speech comprising a long string of digits, numbers or letters, using context-free grammar
2. Description of the Related Art
Nowadays, more and more voice applications comprise speech recognition for strings used as validation keys, such as in telephony applications in the banking and commercial domains. However in speech recognition in these domains it often proves difficult for users to obtain efficient recognition of their speech, and users can quickly get lost in the process.
In the case of characters constituting non semantic strings, grammars are too open to perform effectively because context-free grammars do not allow the use of additional contextual and statistic rules, as opposed to spoken language recognition.
Conventional speech recognition methods and systems generate, for a given expected input speech, a list of results or hypothesis with a weight or confidence score for each of the results. Such methods and systems provide an n-best list, i.e., a list of results or hypothesis with the “n” highest confidence scores, where “n” is some integer.
Such speech recognition methods typically include analyzing the received sounds and comparing them to memorized sounds for known phonemes and/or words. Possibly matching phonemes and/or words combinations are then validated through analysis according to the rules of a model or “grammar”, and sorted according to a statistical method which defines a confidence score for each possible solution. The “n” results with the highest confidence score are then stored in an “n-best list”, sorted and associated with their respective confidence scores.
Depending to the language to be recognized, the rules of a grammar may includes several different pronunciations for the same chain of characters.
For speech input with semantic meaning, the context is used for rating the probability that given words to be combined together, thus giving an average for computing confidence scores of their combination. In such case, the grammar or model is generally called a “statistical language model”
In the case of a string containing successive characters or numeric figures with no semantic meaning, such statistical rules may be unavailable, and the grammar is then called “context-free”.
Moreover, such a non-semantic string may be spoken in several different ways, e.g. by grouping several successive figures differently in a global number in French. The efficiency of recognition is thus especially decreased in such context-free grammar.
In conventional methods or systems, the results of the n-best list are typically then pruned with a pruning algorithm using constraining rules, and the result with higher confidence score after pruning is taken as the result of the speech recognition. One such pruning algorithm frequently used with credit card numbers is the Luhn algorithm which defines, within all right sized numbers, the 10% that are valid as credit card numbers.
Especially in the case of context-free grammar, speech recognition can still be improved. Indeed, the methods and systems implementing pruning an n-best list with an algorithm, such as the Luhn algorithm, still suffer from misrecognitions that increase with the length of the input speech and the pauses users often insert while speaking. In case of misrecognition, the user has to speak again the entire speech, which yields to poor user experience especially for long input strings.