The present invention relates generally to interactive voice response devices and more specifically to a method and system that use standard Internet protocols to send pages of information from remote computer systems to portable interactive voice response devices to program them to interact with and guide the actions of users.
Many workers who perform tasks that would otherwise require close supervision can be guided efficiently through the use of portable interactive voice response devices. At the same time these devices can be used to collect data and verify the work done.
Consider, for example, the task of xe2x80x9cpicking,xe2x80x9d that is, retrieving, distinct sets of different items from a warehouse to fill customer orders. First the orders are collected and organized, next an efficient route through the warehouse is devised, then a warehouse worker is told what items to retrieve from specific locations to fill particular customer orders. The worker follows the prescribed route, picks the items, indicates any that are out of stock or that he cannot otherwise locate, segregates the items by customer order, and returns with the items. On other occasions a worker may follow a similar but different procedure to put away items, may receive items at a loading dock and verify shipping invoices, may inspect incoming or outgoing items, or may count and tabulate inventories.
For each of these tasks the worker could be given instructions verbally or on paper, and could record his results using paper and pencil. However, it is more efficient and reliable to use a portable computer-based device that provides the worker with step-by-step instructions as needed, that immediately requests verification or other data from him, and that transmits data to a remote computer system for storage in a central database. Typically, portable computer-based devices utilize a display screen for providing information and a keyboard for receiving information. However, in situations such as those described here the worker""s eyes are busy looking at the items to be located, put away or counted, his hands are holding picked items, tools or a cart, and he may be wearing heavy gloves. Thus, it is not convenient for him to hold a computer-based device (e.g., a bar code scanner), view a display screen, or operate a keyboard. In these situations it is desirable for the worker to wear an interactive voice response device that provides instructions and verification in voice, and that accepts information in voice.
Portable computer-based devices, like other computer systems, must be programmed for particular functionality. Also, they must be sent specific data, for example, for specific customer orders. Furthermore, for convenience and flexibility, they should be programmed to support several kinds of tasks, and their programs should be changed or activated as required. Typically such devices are programmed for use in a particular facility or by a particular firm. However, it is more economical to produce commercially devices that can be easily programmed and reprogrammed. Thus, there is a need for methods and systems for easily programming, and for sending programs and data to portable interactive voice response devices so that they perform different functions as needed. There is also a need for general-purpose methods and systems for managing voice-oriented dialogs on such devices.
The ubiquitous World Wide Web Internet architecture provides well-known, standard means for displaying text and images, for playing sound recordings, and for receiving information from users. Graphical browsers such as MICROSOFT INTERNET EXPLORER and NETSCAPE COMMUNICATOR, which are widely used by the general public, interpret programs written in accordance with Internet programming conventions. Many convenient program development tools support those conventions. The simplest type of program is a set of linked pages written in the HTML (HyperText Markup Language) language. A page typically includes text style markers, text, references to files that store images or sound, and links to other pages. Means for representing and processing information are also becoming standard. In particular, the XML (extensible Markup Language) language represents information by storing markers that define the structure of the data together with the data itself. Software tools are available for creating, validating, displaying and extracting information from XML files. The World Wide Web was originally intended for access by general purpose computers, but many special-purpose devices, including wireless devices, have been developed to access it.
Over-the-telephone access to computer-based information is provided economically by interactive voice response systems. These systems provide information in voice, either using stored speech sequences or synthetic speech. They accept telephone key input. Some also use speech recognition to accept voice input. Interactive voice response systems are used for such applications as stock trading, order entry, and determining bank account balances and order status. Typically each manufacturer of such systems provides his own proprietary programming methodology and program development tools.
A consortium, called the VoiceXML Forum, founded by several large U.S. firms promotes using Internet-like programming conventions to standardize and facilitate the programming of interactive voice response systems. In particular, the VoiceXML Forum proposes the use of VoiceXML, an HTML-like eXtensible Markup Language that supports voice input and output with speech recognition and both stored and synthetic speech. To acquire and provide information, over-the-telephone users would use a voice browser that interprets VoiceXML programs similar to the way computer users use a graphical browser that interprets HTML programs. The VoiceXML Forum describes the use of VoiceXML and voice browsers only for over-the-telephone use, only with VoiceXML programs resident on a voice server, and only for acquiring and providing information. Prior to the present invention VoiceXML and voice browsers have not been used for other devices or applications. (Reference: xe2x80x9cVoice Extensible Markup Language VoiceXML, Version 1.0xe2x80x9d, VoiceXML Forum, Aug. 17 1999, www.vxmlforum.org/specsxe2x80x941.html)
The present invention uses Internet methodologies to meet the programming and reprogramming needs of portable interactive voice response devices. In particular, it integrates standard Internet protocols, the VoiceXML language and a voice browser to program and operate portable interactive voice response devices for the purpose of interacting with and guiding the actions of users. It thereby provides methods for creating voice-oriented Internet-based systems that support a multitude of applications, for example several different logistics applications within a warehouse, and that manage voice dialogs on portable devices, without requiring extensive computer programming for different facilities and applications.
An interactive voice response system includes a server and a set of mobile clients. The server and clients include RF transceivers for exchanging messages over an RF channel. Each mobile client includes a microphone, a speaker or headset, a processor and a voice browser. The voice browser interprets voice pages received from the server. Upon receiving a particular voice page from the server, the voice browser outputs via the speaker voice prompts specified by the voice page. A speech recognition engine used by the voice browser converts voice responses from a user into a text response. The voice browser then performs an action based on the text response. The action taken may be to request a new voice page from the server, or to continue to interpret the current voice page.
The server preferably includes an HTTP server module for receiving and responding to requests for voice pages from the mobile clients in accordance with a predefined protocol.
The mobile clients each include a text-to-speech module for converting text in a voice page into voice prompts, and a digitized speech module for playing digitized voice data representing other voice prompts. The mobile clients also include a speech recognition module for recognizing words or data string within a user""s voice responses in accordance with a user specific voice file received from the server.