A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates in general to voice-controlled devices and, in particular, to systems and processes for voice-controlled information retrieval.
There is a continuing challenge in providing access to computational resources to mobile workers. A xe2x80x9cmobile workerxe2x80x9d performs job duties that require constant physical movement or manual labor, such as performed by a traditional blue-collar worker. Mobile workers typically use their hands in performing their work and do not work at a desk in a traditional office-type setting.
Personal computers and terminals fail to adequately provide computer access to the mobile worker for at least two reasons. First, personal computers and terminals are stationary devices. As a result, mobile workers are forced to alter their work patterns to allow for physical access centered on the stationary personal computer or terminal. Second, personal computers and terminals typically include a display and a keyboard or other tactile input device. Thus, mobile workers must take their eyes off their work to view the display and use their hands to operate the tactile input device. These changes in work patterns are not always practical.
Enterprise resource planning (ERP) systems are one type of computer resource particularly well suited for use by mobile workers. These systems provide an integrated solution by combining traditionally stand-alone legacy systems, such as human resources, sales, marketing and other functionally separate areas, into a unified package. Two companies active in the development of ERP solutions are PeopleSoft and SAP AG.
Moreover, the use of ERP systems opens up a wide range of new possible uses for information stored in corporate databases. For example, previously unavailable engineering plans, such as blueprints, can be made available to assembly line workers. Similarly, an inventory system can be updated on the fly by a packing clerk who works in the shipping department to reflect a change in the inventory of available goods.
Present mobile computing systems suffer from limited available bandwidth with which to send and receive data. This poses a problem with providing mobile workers with access to ERP information. Mobile workers require continuous access to corporate data. The use of visual-based browsers, by way of example, typically require high bandwidth capabilities which are not typically available on mobile computing devices. A speech-based approach is needed.
A prior art, speech only approach to providing voice-controlled access to information retrieval can be found in telephony interactive menu systems or so-called xe2x80x9cvoice response systems.xe2x80x9d These systems are generally used by voice activated menu systems which provide a spoken menu of selections to a user over a telephone. The user indicates an appropriate response, generally corresponding to a number on the telephone keypad. The response can be spoken or keyed into the keypad. Such systems limit responses to a finite set of numeric potential choices. Such systems are further limited in the complexity of any given menu option which generally must be short and easily understandable to be effective.
A prior art, visual/speech approach to providing hands free access to information retrieval is a speech-enabled Web browser, such as described in the commonly assigned U.S. patent application Ser. No. 09/272,892, entitled xe2x80x9cVoice-Controlled Web Browser,xe2x80x9d pending, filed Mar. 19, 1999, the disclosure of which is incorporated herein by reference. Such speech-enabled Web browsers augment a standard user interface with a microphone and speaker. Hyperlinks are presented visually to the user who responds by voice using the hyperlink""s text, or using a visual hint to make a selection. However, the visual nature of the information content itself inherently limits the flexibility of this approach. The voice prompts are driven by the linear arrangement of the Web content which is designed primarily for visual display and is not formatted for access by a speech-enabled browser. Consequently, complex information is not always easily accessible through speech-enabled Web browsers.
Consequently, there is a need for providing mobile workers with voice-controlled access to computer retrievable information without requiring the mobile worker to alter a work pattern through the use of a stationary personal computer or terminal which requires a display and manual tactile input. Such a solution would preferably be mobile in nature, that is, easily wearable or holdable by the mobile worker and operable without the need for a visual display. Alternately, such a solution could be embodied on a conventional client computer or on telephony devices.
The present invention provides an approach to voice-controlled information retrieval in which information, such as dynamically generated corporate data, can be presented to a mobile worker using a low bandwidth, speech-oriented connection. The approach includes the capability to present closely related, but mostly static, visual information or other high bandwidth information to a mobile worker using a portable or stationary, but locally situated, Web server. The visual information can optionally be displayed on a Web browser running on another client.
One embodiment of the present invention is a system, process and storage medium for voice-controlled information retrieval using a voice transceiver. A voice transceiver executes a conversation template. The conversation template comprises a script of tagged instructions comprising voice prompts and expected user responses. A speech engine processes a voice command identifying information content to be retrieved. The voice transceiver sends a remote method invocation requesting the identified information content to an applet process associated with a Web browser. An applet method retrieves the identified information content on the Web browser responsive to the remote method invocation.
A further embodiment of the present invention is a system, process and storage medium for retrieving Web content onto a browser running on a remote client using a voice transceiver. A storage device stores a conversation template on the server. The conversation template comprises a script including instruction tags for voice commands and voice prompts. A voice transceiver receives the conversation template. A parser parses the instruction tags from the script to form a set of interrelated tokens and instantiates an object corresponding to each token. An interpreter interprets the set of tokens by executing the object instance corresponding to each token. A speech engine receives a voice command on the voice transceiver from a user for Web content. A remote client is interconnected to the server and the voice transceiver via a network. The voice transceiver sends a remote method invocation identifying the Web content. The remote client includes an applet associated with a browser running on the remote client and requests the Web content from the server responsive to the remote method invocation. The browser receives the Web content.
A further embodiment of the present invention is a process and language definition embodied as code stored on a computer-readable storage medium for facilitating speech driven information processing using a voice transceiver. A speech markup document for speech operations interpretable by the voice transceiver is defined. The markup document comprises a set of tags with each such tag comprising a speech instruction and at least one such tag further comprising a remote procedure call. An applet object for information processing operations interpretable by a client interconnected to the voice transceiver is defined. The applet object comprises a remote procedure call interface responsive to the remote procedure call of the speech markup document and a method defining an operation performable by the browser corresponding to the speech instruction of the at least one such tag.
A further embodiment of the present invention is an integrated system for retrieving Web content using a voice transceiver. An integrated server comprises an enterprise resource planning system and a server suite which cooperatively provide enterprise information formatted as Web content. A storage device stores a conversation template on the server. The conversation template comprises a script including instruction tags for voice commands and voice prompts. A voice transceiver receives the conversation template. The voice transceiver includes a parser, an interpreter and a speech engine. The parser parses the instruction tags from the script to form a set of interrelated tokens and instantiates an object corresponding to each token. The interpreter interprets the set of tokens by executing the object instance corresponding to each token. The speech engine receives a voice command on the voice transceiver from a user for Web content. The voice transceiver requests the Web content from the integrated server responsive to the voice command and the voice transceiver presents the Web content to the user upon the receipt thereof from the server.
A further embodiment of the present invention is an integrated server for retrieving Web content onto a browser running on a remote client using a telephone. A storage device stores a conversation template which comprises a script including instruction tags for voice commands and voice prompts. Middleware and a server suite cooperatively provide enterprise information received from a legacy system formatted as Web content. A voice transceiver receives the conversation template and includes a parser and an interpreter. The parser parses the instruction tags from the script to form a set of interrelated tokens and instantiates an object corresponding AU to each token. The interpreter interprets the set of tokens by executing the object instance corresponding to each token. A telephonic speech engine receives a voice command for Web content received from a user via a telephone interfacing to the integrated server. A remote client is interconnected to the integrated server via a network. The voice transceiver sends a remote method invocation identifying the Web content to the remote client. The remote client includes an applet associated with a browser running on the remote client and requesting the Web content from the server responsive to the remote method invocation. The browser receives the Web content.
A further embodiment of the present invention is a fielded voice control system for retrieving Web content onto a browser using a voice transceiver. A corporate server comprises an enterprise resource planning system and a server suite which cooperatively provide enterprise information formatted as substantially dynamic Web content. A local server is interconnected to the corporate server via a low bandwidth network and comprises a server suite providing Web content. A portable client is interconnected to the local server via a high bandwidth network having an effective data rate higher than the effective data rate of the low bandwidth network. The portable client comprises a voice transceiver and a speech engine. The voice transceiver includes a parser parsing the instruction tags from the script to form a set of interrelated tokens and instantiating an object corresponding to each token and an interpreter interpreting the set of tokens by executing the object instance corresponding to each token. A speech engine receives a voice command on the voice transceiver from a user for Web content. The voice transceiver requests the Web content from the local server responsive to the voice command and the voice transceiver presents the Web content to the user upon the receipt thereof from the server.
A further embodiment of the present invention is a fielded voice control system for retrieving Web content onto a browser using a telephone. A corporate server comprises an enterprise resource planning system, a server suite, a voice transceiver, and a telephonic speech engine. The enterprise resource planning system and the server suite cooperatively provide enterprise information formatted as substantially dynamic Web content. The voice transceiver includes a parser which parses the instruction tags from the script to form a set of interrelated tokens and instantiates an object corresponding to each token and an interpreter which interprets the set of tokens by executing the object instance corresponding to each token. The telephonic speech engine receives a voice command for Web content received from a user via a telephone interfacing to the corporate server. A local server is interconnected to the corporate server via a low bandwidth network and comprises a server suite providing Web content. A remote client is interconnected to the local server via a network. The voice transceiver sends a remote method invocation identifying the Web content requested by the voice command to the remote client. The remote client includes an applet associated with a browser running on the remote client and requesting the Web content from the local server responsive to the remote method invocation. The browser receives the Web content.
A further embodiment of the present invention is a system and process for preemptive voice-controlled information retrieval using a voice transceiver. A voice transceiver executes a conversation template which comprises a script of tagged instructions comprising voice prompts. An interrupt handler monitors receipt of further conversation templates to the voice transceiver during the execution of the conversation template. A session stack temporarily stores an activation record for the conversation template being executed by the voice transceiver upon the receipt of a further conversation template by the interrupt handler and subsequent execution of the further conversation template by the voice transceiver. A speech engine processes a voice command identifying information content to be retrieved. The voice transceiver sends a remote method invocation requesting the identified information content to an applet process associated with a Web browser. An applet method retrieves the identified information content on the Web browser responsive to the remote method invocation.