The invention relates generally to statistical language models, and more particularly to compression of such models (e.g., n-gram language models).
A common application today is the entering, editing and manipulation of text. Application programs that perform such text operation include word processors, text editors, and even spreadsheets and presentation programs. For example, a word processor allows a user to enter text to prepare documents such as letters, reports, memos, etc. While the keyboard has historically been the standard input device by which text input is performed into these type of application programs, it is currently being augmented and/or replaced by other types of input devices. For example, touch-sensitive pads can be xe2x80x9cwrittenxe2x80x9d on with a stylus, such that a handwriting recognition program can be used to input the resulting characters into a program. As another example, voice-recognition programs, which work in conjunction with microphones attached to computers, also are becoming more popular. Especially for non-English language users, and particularly for Asian language users, these non-keyboard type devices are popular for initially inputting text into programs, such that they can then be edited by the same device, or other devices like the keyboard. Speech and handwriting recognition have applications beyond text entry as well.
A primary part of the use of handwriting or speech recognition is the selection of a language model that is used to determine what a user writes or speaks should be translated to. In general, the more sophisticated a language model is, the more space it needs for storage. This is unfortunate especially in situations where storage space is at a premium, such as in handheld- and palm-oriented computing devices. Therefore, the compression of such models is typically necessary. The performance, or measure of accuracy, of a language model is determined typically based on what is known in the art as the perplexity of the model. Prior art language model compression techniques, while reducing the size of the resulting compressed model, also disadvantageously increase the perplexity, and hence reduce the accuracy, of the model. Such compression techniques that result in a reduced-size and increased-perplexity language model include only pruning the language model, and using what is referred to as xe2x80x9cclassicalxe2x80x9d clustering that by virtue of the clustering itself reduces the size of the model, but which increases the perplexity of the model.
Therefore, there is a need within the prior art for compressing language models that result in smaller-sized models, but with as limited an increase in perplexity as possible. For this and other reasons, therefore, there is a need for the present invention.
The invention relates to the cluster- and pruning-based compression of language models. In one embodiment, words are first clustered, such that the resulting language model after clustering has a larger size than it did before clustering. Clustering techniques amenable to the invention include but are not limited to predictive clustering and conditional clustering. The language model, as clustered, is then pruned. Pruning techniques amenable to the invention include but are not limited to entropy-based techniques, such as Stolcke pruning, as well as count-cutoff techniques and Rosenfeld pruning. In one particular embodiment, a word language model is first predictively clustered, using a novel predictive clustering technique, and then is pruned utilizing Stolcke pruning.
Embodiments of the invention provide for advantages not found within the prior art. Unintuitively and nonobviously, embodiments of the invention initially cluster a language model such that it has a larger size than it did before being clustering. The subsequent pruning of the model then results in a compressed language model that has a smaller size for a given perplexity level as compared to prior art language model compression techniques. Embodiments of the invention also result in a compressed language model that has lower perplexity for a given size of model as compared to prior art language model compression techniques.
The invention includes computer-implemented methods, machine-readable media, computerized systems, and computers of varying scopes. Other aspects, embodiments and advantages of the invention, beyond those described here, will become apparent by reading the detailed description and with reference to the drawings.