Speech recognition and speech synthesis systems apply vocabularies containing words and their pronunciation forms. Both the creation of pronunciation forms and the resulting phoneme sequences are called phonetic transcription. A word together with its phonetic transcription forms a vocabulary entry.
One of the unsolved problems in current speech processing systems is the presence of “out of vocabulary” (OOV) words, that is of words which are not contained in the vocabulary, compare for instance U.S. Pat. No. 7,181,398 B2. The OOV words can be general purpose ones or user-specific pronunciations of known words. Most of the prior art speech recognition systems cannot detect automatically these OOV words; instead thereof, they make a recognition error. Normally in such systems, a correctionist or the user him-/herself identifies these OOV words. After the identification the system can determine the corresponding input acoustic data.
In another embodiment the user has the opportunity to add new words to the vocabulary by simple spelling them.
In all cases the prior systems can produce vocabulary entries automatically only for standard words, namely for words fitting to the morphology of the actual language, but they cannot produce automatically vocabulary entries for special words having a morphology differing from the actually used language. In particular, such special words are foreign words, family names with foreign origin or abbreviations.
This makes both the process of correcting OOV words and the process of adding new words to a vocabulary cumbersome and time-consuming. Additionally, due to the lack of the capability of automatic vocabulary entry generation for special words, an automatic vocabulary generation from any public accessible acoustic data is also not possible.