The invention relates to speaker-independent speech recognition, and more precisely to the compression of a pronunciation dictionary.
Different speech recognition applications have been developed during recent years for instance for car user interfaces and mobile terminals, such as mobile phones, PDA devices and portable computers. Known methods for mobile terminals include methods for calling a particular person by saying aloud his/her name into the microphone of the mobile terminal and by setting up a call to the number according to the name said by the user. However, present speaker-dependent methods usually require that the speech recognition system is trained to recognize the pronunciation for each name. Speaker-independent speech recognition improves the usability of a speech-controlled user interface, because the training stage can be omitted. In speaker-independent name selection, the pronunciation of names can be stored beforehand, and the name spoken by the user can be identified with the pre-defined pronunciation, such as a phoneme sequence. Although in many languages pronunciation of many words can be represented by rules, or even models, the pronunciation of some words can still not be correctly generated by these rules or models. However, in many languages, the pronunciation cannot be represented by general pronunciation rules, but each word has a specific pronunciation. In these languages, speech recognition relies on the use of so-called pronunciation dictionaries in which a written form of each word of the language and the phonetic representation of its pronunciation are stored in a list-like structure.
In mobile phones the memory size is often limited due to reasons of cost and hardware size. This imposes limitations also on speech recognition applications. In a device capable of having multiple user interface languages, the speaker-independent speech recognition solution often uses pronunciation dictionaries. Because a pronunciation dictionary is usually large, e.g. 37 KB for two thousand names, it needs to be compressed for storage. Broadly speaking, most text compression methods fall into two classes: dictionary-based and statistics-based. There are several different implementations at the dictionary-based compression, e.g. LZ77/78 and LZW (Lempel-Ziv-Welch). By combining a statistical method, e.g. arithmetic coding, with powerful modelling techniques, a better performance can be achieved than with dictionary-based methods alone. However, the problem with the statistical based method is that it requires a large working memory (buffer) during the decompression process. Therefore this solution is not suitable for use in small portable electronic devices such as mobile terminals.
Although the existing compression methods are good in general, the compression of the pronunciation dictionaries is not efficient enough for portable devices.