This invention relates in general to the field of data communication and processing, and more particularly to a system and method for adding speech recognition capabilities to Java.
Computer users have long desired personal software applications capable of responding to verbal commands. Allowing users to interact with software applications using speech as an input medium provides a more natural interface than, for example, keyboard, mouse or touchscreen input devices. Voice input provides an advantage of facilitating hands-free operation. Besides allowing users to attend to other matters while interacting with the software application, hands-free operation provides access to physically challenged users. Voice input offers an additional advantage of avoiding problems of spelling and syntax errors, and having to scroll through large lists associated with other input methods.
One approach to providing speech recognition capabilities for a client application is the concept of HTML(hyper-text markup language)-based smart pages. A smart page is a World-Wide-Web (Web) page that contains a link to a grammar specific to that page, and is capable of interpreting the results of that grammar. The author of the smart page defines the grammar to which the page will respond, embeds a link to that grammar within the smart page and gives visual cues to the user regarding the type of verbal input expected. When the speech engine encounters the smart page, it incorporates the grammar, enabling it to respond to speech input and return a result to the smart page. The smart page interprets the result and responds accordingly.
A disadvantage of this approach is that HTML-based Web pages are stateless; that is, when following a link on the current page to a new page, the new page knows nothing about the previous page. While it is possible to overcome this limitation by encoding state information in the URL (Uniform Resource Locator), this method provides a very inefficient solution. A further disadvantage of this approach is that it provides no solution for adding speech recognition capabilities to client applications in general. Because HTML is not a full programming language, its practical application is limited to Web pages and browsing commands.
According to the teachings of the present invention, a system for adding speech recognition capabilities to Java is provided which eliminates or substantially reduces the disadvantages and problems associated with previously developed systems.
In accordance with one embodiment of the present invention, a system for adding speech recognition capabilities to Java includes a speech recognition server coupled to a Java application through an application program interface. The Java application dynamically specifies a grammar to the application program interface, which communicates the grammar to the speech recognition server. The speech recognition server receives the grammar and a speech input. The speech recognition server performs speech recognition on the speech input, and generates a result based on the grammar. The application program interface communicates the result to the Java application, which performs an action based on the result received.
The present invention provides important technical advantages including the ability to easily encode state information in a Java application. Unlike HTML, which is stateless, Java is a full programming language capable of efficiently carrying the necessary state information. Moreover, because Java is a full programming t the present invention facilitates speech enablement of any Java program application, and is not limited to Web browsing applications. A further advantage is the fact that Java is a platform independent language. As such, the present invention allows the same program to use speech recognition on multiple platforms, provided a speech server runs in the background. This allows the client programmer to ignore platform dependent issues such as audio recording and speech recognizer specifics. A still further advantage is to speech-enable a Java application. The present invention allows inexperienced programmers to quickly speech-enable applications with a simple template, while providing more experienced programmers the flexibility to implement more complex features.
Yet a further advantage of the present invention is the client/server model upon which the application program interface is based. Because the speech recognition server handles the bulk of the processing load, a lighter load is placed on the slower interpreted Java application. Furthermore, the client/server model provides flexibility by allowing the client application to execute on a separate, perhaps less powerful, device than the server computer. When communicating with Java programs on the Web, the client side nature of Java greatly simplifies tracking dialog context in an interaction. Furthermore, direct communication with a Java application eliminates network delays when waiting for a response.
Still another advantage of the present invention is the provision of dynamic modification of the contents of a grammar data structure. Dynamic modification is a valuable advantage where the context encountered by the speech engine is unpredictable, such as browsing World-Wide-Web sites. In such cases, dynamic modification allows the speech recognition server to augment the language of the speech engine to fit the context of the application encountered. The grammar data structure of the present invention provides an additional advantage of conciseness over conventional single regular grammars.