1. Field of the Invention
The present invention is related to language recognition and more particularly to language recognition by multiple connected computer systems connected together over a network.
2. Background Description
Speech recognition systems for voice dictation, typically, try to improve word recognition accuracy by using what is referred to, normally, as a xe2x80x9clocal contextxe2x80x9d for word recognition. Local context based word recognition uses n preceding and m succeeding words, normally referred to as xe2x80x9cn-tuples,xe2x80x9d to identify a particular spoken word. The local context recognition model selects the most likely word based on the statistical probability of the specific n-tuples. However, these prior speech recognition systems still make inappropriate recognition errors because they lack global context for the spoken words.
For example in dictating a review of a conference paper, prior art speech recognition systems have recognized xe2x80x9cin terms of validityxe2x80x9d as xe2x80x9cin terms of the lead Dave.xe2x80x9d xe2x80x9cValidityxe2x80x9d has a global context because it is commonly discussed in conference paper reviews. However, xe2x80x9cthe lead Davexe2x80x9d is nonsensical in this context and so, has no global context.
Thus, there is a need for improved methods and systems for language recognition.
It is a purpose of the invention to improve language recognition by computers;
It is yet another purpose of the invention to expand the context base for language recognition.
The present invention is a language recognition system, method and program product for recognizing language based input from computer users connected together over a network of computers. Each computer includes at least one user based language model trained for a corresponding user. The language models may be used by users in automatic speech recognition, handwriting recognition, machine translation, gesture recognition or other similar actions that require interpretation of user activities. Computer users on a network of computers are clustered into classes of similar users. Users that are connected over the network are clustered according to their characteristic similarities such as, nationality, profession, sex, age, or any other characteristics that may influence user language model selection. Characteristics of users are collected by sensors and from databases and, then, distributed over the network during user activities. Language models with similarities among similar users on the network are identified.
The language models include a language model domain, with similar language models being clustered according to their domains. Language models identified as similar are modified in response to user production activities. After modification of one language model, other identified similar language models are compared and adapted. Also, user data, including information about user activities and user language model data, is transmitted over the network to other similar users. Language models are adapted only in response to similar user activities, when these activities are recorded and transmitted over the network. Language models are given a global context based on similar users that are connected together over the network.