In the field of this invention it is known that ASR can be improved by adapting the recognition engine to the specific user (speaker dependent recognition) and to the device used by the user for audio input. It is also known that for general-purpose applications, the preferred implementation involves non-user specific modelling (speaker independent recognition) and a remote server, which does not negotiate or otherwise interact with specifics of the local device.
From patent publication WO-02-103675 there is known a client-server based Distributed Speech Recognition (DSR) system which recognises speech made by a human at a client device and transmitted to a remote server over a network. The system distributes the speech recognition process between the client and the server so that a speaker-dependent language model may be utilized yielding higher accuracy as compared to other DSR systems. Accordingly, the client device is configured to generate a phonetic word graph by performing acoustic recognition using an acoustic model that is trained by the same end-user whose speech is to be recognized; the resulting phonetic word graph is transmitted to the server which handles the language processing and generates a recognized word sequence. However, these approaches have disadvantages. The speaker dependent recognition loses the general applicability of speaker independent recognition, since it will not perform as well for speakers other than the one for which it is trained. Also the speaker independent recognition, especially in a hostile environment such as noisy telephone lines, can show decreased accuracy since it fails to capitalise on the characteristics of the specific device and speaker.
A need therefore exists for distributed voice recognition system and method wherein the above-mentioned disadvantage(s) may be alleviated.