Speech recognition systems, i.e. systems for recognizing spoken language, are rapidly increasing in significance in many areas of data and communications technology. Speech recognition systems typically are comprised of a computing system loaded with a speech recognition program for processing. Many speech recognition programs have a grammar, sometimes also called a dictionary, either built in or in some other way available to the program.
Speech recognition programs can be constructed for installation and use in servers, as stand alone applications in client devices, and now in some browser applications.
Speech recognition programs designed for use in client devices are currently available from companies such as IBM, Nuance, Phillips, Loquendo, and Microsoft, as well as others. Speech recognition programs for use in servers are provided by many of the same as well as other suppliers. Some suppliers manufacture speech recognition programs for cell phone, and PDA platform applications.
Speech recognition programs are currently used in many applications such as interactive voice response systems, command recognition systems giving direction to a computer or device, dictation mode systems including medical transcription, speaker identification, speech analytics, keyword processing, automotive applications, and hypertext navigation including multi-modal navigation.
In each of the applications and platforms listed above, except dictation mode systems, a grammar may be required. The grammar could be in one of many different forms such as a database, XML file, other file type, dynamic data, or other data form, accessible by a speech recognition program. Most grammars are generally not accessible by speech recognition programs other than those they were designed to operate with.
Currently grammars in speech recognition programs are not easily updated. Some manufacturers of speech recognition programs may offer grammar replacements, but none offer solutions which may allow a user or an administrator to modify a grammar. Modifying a grammar instead of replacing a grammar can be valuable in many applications such as in command and control applications where a fixed grammar is made up of speech commands. Updating other components of a speech recognition program may also be valuable.
As an example, there may be a clear benefit if a user or administrator can update one or more speech commands, or the actions triggered by speech commands, without having to contact the manufacturer of the speech recognition software and requesting new software.
U.S. Pat. No. 7,146,323 discloses a method and system for gathering information by voice input, further described, as a hypertext navigation system combining the advantages of a point and click hypertext navigation system with prior art voice controlled hypertext navigation system. On the server, the main components are a Web server or HTTP-Server; one or more Web applications, or servlets; and an application server and or data base. On the client device the speech recognition and synthesis systems are available to signed Java applets. The main component, as illustrated in the invention, is a voice navigation component (applet) that performs the following steps: locates, selects, and initializes a speech recognition engine and a speech synthesis engine; defines, enables, and disables decoding grammars; and processes the recognition results (e.g. launches HTTP requests, initiates spoken words, and plays back of prerecorded prompt). It is possible to use general grammars or language models that are available at the client side. Usually such grammars can be installed along with the general speech recognition engine. Furthermore it is required to upload application dependent, or so called information dependent, grammars from the server to the client. These grammars specify the recognition vocabulary for navigating within related Web pages, Web pages belonging to a Web application, or related Web applications. A further component of their invention is a conventional point and click navigation component (applet) as used in existing prior art systems. The point and click component (applet PACNA) can load new Web pages responsive to user selection (pointing and clicking) of a hyperlink displayed in an HTML document. Both the voice navigation component (applet) and the point and click navigation (applet) are originally stored on the server system. Preferably, the loading of an initial web page from the server into the client can automatically initiate a loading of both the voice navigation component (applet) and the point and click navigation (applet). In the client device a Java Virtual Machine must be available for processing any uploaded applets including any grammar.
A disadvantage of the above mentioned system is the requirement to upload application dependent, or so called information dependent, grammar from the server to a client device as part of a voice navigation component Java applet. The information dependent grammar as a part of an applet, specifies the recognition vocabulary for navigating within related Web pages, Web pages belonging to a Web application, or related Web applications, thereby limiting the grammar to the Web pages or Web applications loaded. The above mentioned system loads grammar. The above mentioned system does not update grammar. Another disadvantage is that, a Java Virtual Machine must be available for processing any uploaded applets in the client device. This prevents non-Java systems from utilizing the invention. Furthermore, this prior art system does not include any method of updating and modifying components of a speech recognition program which may include grammar, dll's, multimedia files, advertisements, or other information.
U.S. Pat. No. 7,139,715 discloses a system and method for providing remote automatic speech recognition and text to speech services via a packet network, further described as a client-server based system wherein a client device loads a relatively small program, named ASR Client, that communicates with a speech recognition program in a server. The client device includes hardware, such as a microphone, and software for the input and capture of audio sounds, such as speech. The client device ASR Client program is loaded with one or more grammars that are activated by a user. The words in the grammar are transmitted to the speech recognition program in the server where the server recognizes the words triggering actions in the server. In another embodiment the ASR Client program sends to the server an identifier representing a grammar to be utilized by the server.
A disadvantage of the above mentioned system is that there is no method of updating grammars, dll's, multimedia files, or other information.
U.S. patent application Ser. No. 11/557,971, describes a speech interface for search engines, but does not offer a method to update grammar, and other components.
It is therefore the object of the present invention to provide a system and method of updating a speech recognition program.