1. Field of the Invention
The present invention relates to a voice recognition system and method, and more particularly to a server-client voice recognition system and method. The present invention also relates to a computer-readable storage medium having a program recorded thereon for voice recognition. The present invention is applicable to a voice input interface such as a cellular phone or a personal digital assistant.
2. Description of the Related Art
FIG. 1 is a schematic diagram showing an example of a conventional server-client voice recognition system that has a client terminal device and a server device. In the conventional server-client voice recognition system, the client terminal device processes voice recognition of a relatively light load while the server device processes voice recognition of a comparatively heavy load. Specifically, as shown in FIG. 1, the conventional server-client voice recognition system has a client terminal device 310 and a server device 320, which are connected to each other via a communication network 330.
The client terminal device 310 includes a voice input unit 311 for inputting a user's voice, a voice pre-processing unit 312 for performing pre-processing, such as a waveform analysis of an input voice data, and a selector unit 313 for selecting whether subsequent content recognition of pre-processed voice data is to be carried out either in an internal process (in the client terminal device 310) or an external process (in the server device 320). The client terminal device 310 also includes a primary voice recognition unit 314 and a primary recognition dictionary 315 for the internal content recognition process, a communication unit 316 for communicating with the server device 320, and a recognition result output unit 317 for sending a result of voice recognition outside the system.
The server device 320 includes a communication unit 321 for communicating with the client terminal device 310, a secondary voice recognition unit 322, and a secondary recognition dictionary 323. The secondary voice recognition unit 322 and the secondary recognition dictionary 323 are used for the external voice recognition process. For example, this type of a voice recognition system has been disclosed in Japanese Unexamined Patent Publications Nos. 2003-241796 and 2004-133699.
In such a conventional voice recognition system, the client terminal device 310 and the server device 320 operate as follows.
Specifically, when voice data is inputted into the voice input unit 311 of the client terminal device 310, the voice pre-processing unit 312 carries out pre-processing of the voice data, such as a sound waveform analysis of the input voice data and the like. The selector unit 313 operates in response to the result of the sound waveform analysis and selects whether recognition of contents in the input voice data is to be carried out either in the primary voice recognition unit 314 in the client terminal device 310 or the secondary voice recognition unit 322 in the server device 320.
If the selector unit 313 selects the primary voice recognition unit 314, the primary voice recognition unit 314 performs voice recognition of the voice data by the use of the primary recognition dictionary 315 and sends a recognition result to the recognition result output unit 317. If the selector unit 313 selects the secondary voice recognition unit 322, the pre-processed voice data is sent from the communication unit 316 in the client terminal device 310 to the server device 320 via the communication network 330. When the pre-processed voice data is received from the client terminal device 310 by the communication unit 321 in the server device 320, the secondary voice recognition unit 322 immediately is operated to perform voice recognition of the received voice data by the use of the secondary recognition dictionary 323. The communication unit 321 returns a voice recognition result to the client terminal device 310 via the communication network 330. When the communication unit 316 in the client terminal device 310 receives the voice recognition result, the recognition result output unit 317 supplies the result to the user.
However, in the conventional voice recognition system, the primary recognition dictionary 315 in the client terminal device 310 has a small capacity in order to alleviate an amount of processes required for voice recognition. Accordingly, the client terminal device 310 has a very limited vocabulary for recognition. This often makes recognition of users' voices difficult in dependency upon words in the conventional voice recognition system.
In such a case, consideration might be made about a way to successively add unrecognized words by each user to the primary recognition dictionary 315 in the client terminal device 310. However, such a way to add unrecognized words to the primary recognition dictionary 315 imposes a severe burden on the user and results in an increase of an amount of calculation in the client side. This brings about various problems, such as a delay, during the voice recognition process.
Furthermore, in the conventional example, when the client terminal device 310 receives a result of voice recognition from the server device 320, such a result is solely delivered to a user and is not accumulated in the client terminal device 310. Accordingly, the conventional voice recognition system is inconvenient in that it cannot satisfy user's needs of getting frequently-used words (vocabulary) or recently-used words.