The method relates to an operating method of an automatic language recognizer for speaker-independent language recognition of words of different languages and a corresponding automatic language recognizer.
For phoneme-based language recognition, a language-recognition vocabulary is required, containing phonetic descriptions of all the words to be recognized. Typically, words are represented by sequences or chains of phonemes in the vocabulary. During a language recognition process, a search is conducted for the best path through various phoneme sequences found in the vocabulary. This search can, for example, take place by means of the Viterbi algorithms. For continuous language recognition, the probabilities for transitions between words can also be modeled and included in the Viterbi algorithm.
A phonetic transcription for the words to be recognized form the basis of phoneme-based language recognition. Therefore, at the start of a phoneme-based language recognition process, the first order is to obtain phonetic transcripts for the word. Phonetic transcripts can be generally defined as the phonetic descriptions of words from a target vocabulary. Obtaining phonetic transcripts particularly relevant for words that are not known to the language recognizer.
Mobile or cordless telephones are known that enable speaker-dependent name selection. In this case, a user of such a telephone must train the entries contained in the electronic telephone book of the telephone in order to be able to subsequently use the name selection by spoken word. Normally, no other user can use this feature because the speaker-dependent name selection is suitable for only one person, i.e. for the person who has trained the language selection. To overcome this problem, the entries in the electronic telephone book can be changed to phonetic transcripts.
To determine the phonetic transcript from a written word, for example from a telephone book entry, various approaches are known in the art. One example is a dictating system that is used with a PC. With dictating systems of this kind, a lexicon of typically more than 10,000 words with an allocation of letter sequences to the phoneme sequences is normally stored. Because a lexicon of this kind requires a very high storage capacity, it is not practical for mobile terminal devices such as mobile or cordless telephones to wholly incorporate this configuration.
Systems are also known whereby the conversion of a word to its phonetic transcript is rule-based, or takes place using specially trained neural networks. As with the lexicon, this method also has one disadvantage that the language in which the phoneme sequences to be realized must be specified. In any case, names from different languages may be present, particularly in electronic telephone books. On a mobile device, converting words from different languages would be burdensome to wholly implement under the above configuration.
Other multilingual systems for determining phoneme sequences and language recognition have been developed. These systems enable phoneme sequences to be created from different languages.
Under still other configurations, a user speaks the words into a language recognition system that automatically generates sequences of phonemes. However, for large vocabularies, (e.g., an electronic telephone book with 80 entries), this is no longer acceptable for the user.