The present invention generally relates to computer systems, and relates in particular to mechanisms that provide voice control of a server.
A vast amount of information is available using computer servers. Servers, mainframe computers, and other computer storage devices on networks provide a warehouse of information and services. However, accessing information and initiating processes or services on such servers is difficult using presently available mechanisms. In particular, multiple commands using a keyboard or mouse are usually required for a user to navigate through the file directory structure of a server to locate desired information. The data processing field has failed to develop systems that provide voice control of a remote server from a local point, so that a local user can command the remote server by voice to display visual information at the local point or carry out a desired process. Past systems provide incomplete solutions to this problem. For example, interactive voice response (IVR) systems are used to deliver stored information over a telephone line to an end user. IVR systems are specialized computer systems that have a processor, a storage device such as a hard disk, hardware for interfacing the processor to the public switched telephone network (PSTN) and an IVR application program that runs on the processor. Generally, the end user connects to the IVR system using a telephone. The end user takes the telephone handset off hook and dials a predetermined telephone number that identifies the IVR system. The telephone call is delivered over the PSTN to one of several trunk lines connected to the IVR system. The IVR system answers the call by seizing one of the trunk lines and playing a pre-recorded greeting to the caller. Typically the greeting is a voice recording stored digitally on a storage device that provides the end user with a menu of processing options that can be selected using telephone dial pad keys. Some IVR systems include voice recognition software or processors, so that an end user can select system options by speaking a short word or phrase such as a number.
Example IVR applications include automated receptionist services, various customer service or xe2x80x9chelp deskxe2x80x9d applications, and airline reservations systems.
IVR systems can be configured to carry out a voice dialogue with the end user. The dialogue comprises a series of questions from the IVR system and answers from the end user until the desired service is provided to the end user by the IVR system. However, IVR systems are unable to display visual information, such as pre-formatted text or graphics, or dynamically created custom graphic information, in conjunction with the dialogue. The absence of visual information from present IVR systems is a major limitation, and represents the loss of a powerful medium for conveying information.
Client-server computer systems also provide an incomplete solution. In a client-server system, a client at a local point is connected by a data connection to a server at a remote point. The client can be a computer or a combination of a computer and software running on the computer. The data connection can be a cable, a local area network (LAN), a wide area network, or another type of network. The data connection can be the global network, operating according to standard protocols, known as the Internet. The server can be a file server of the LAN, or a server not affiliated with the client. For example, the server can be a server that is publicly accessible using anonymous file transfer protocol (FTP) over the Internet. Using the Internet and certain wide area network technologies, a client can connect to, xe2x80x9clog onxe2x80x9d to, request and use a distant server.
One popular technology enjoying wide use with the Internet is known as the World Wide Web. The World Wide Web enables a computer to locate a remote server using a server name in an agreed-upon format that is indexed at a central Domain Name Server (DNS). The local computer or client runs a browser program. Using the browser, the client locates the remote server using the DNS, and connects to the remote server. The client requests information from the server using a communication protocol called the Hypertext Transfer Protocol (HTTP), by providing a Uniform Resource Locator (URL) that uniquely identifies a page of information stored on the server. A URL is a form of network address that identifies the location of information stored in a network and represents a reference to a remote World Wide Web server, known as a website. The pages of information are files prepared in the Hypertext Markup Language (HTML). Thus, a Web client-server system can be used to request and display information stored on a remote server.
URLs generally are formatted according to the following syntax:
 less than protocol id greater than :// { less than server greater than }  less than second level domain greater than   less than top level domain greater than  { less than directory greater than } { less than file greater than }
The  less than protocol id greater than  identifies the transmission protocol to be used. For example, in the case of the Web it is  less than http greater than , and in the case of an anonymous file transfer protocol transaction it is  less than ftp greater than . The  less than server greater than  element is an optional server name such as  less than www. greater than . The  less than server greater than  element may also identify a directory on a storage device of the Web server that contains HTML documents. The  less than second level domain greater than  element is the name of the server domain as found in the DNS table, such as  less than etrade greater than . The  less than top level domain greater than  element identifies the type of the second level domain, and must be an item selected from a finite set of globally recognized top level domains, such as xe2x80x9ccom,xe2x80x9d xe2x80x9corg,xe2x80x9d xe2x80x9cmil,xe2x80x9d xe2x80x9cedu,xe2x80x9d xe2x80x9cgov,xe2x80x9d and others. The  less than directory greater than  element is an optional name of directory within the  less than server greater than , such as  less than DocumentRoot/ greater than . The  less than file greater than  element is an optional name of a file, document or image to be retrieved, such as  less than Index.html greater than . Thus, a URL serves as one type of a network address to locate a document anywhere in a network.
However, client-server systems and World Wide Web systems cannot respond to voice commands and cannot deliver visual or graphic information that is coordinated with a voice dialogue. These systems also do not enable a user to select or initiate computation processes in coordination with a voice dialogue. World Wide Web systems can include panels or pages that are dynamically generated by the systems, and can include internal or external computation processes rather than static documents or pages. However, there is no way to select such processes or locate them using voice interaction.
In addition, when a user wishes to obtain specific information or activate a specific process using a Web server, but the URL of the information or process is unknown, the user must follow the hypertext links of numerous irrelevant pages until the desired information is located. Navigation through this control structure is clumsy. It is especially inconvenient at the slow data transfer speeds that presently characterize most use of the Web.
One approach to these issues is to integrate speech recognition software in a computer program or computer remote from a server. The computer is connected through an interface to a microphone. The computer runs a speech recognition program that converts voice commands received by the microphone into keystrokes or commands understandable by the program. This is known as xe2x80x9clocal controlxe2x80x9d or clientside speech recognition because recognition of voice commands is carried out in a local computer. The local computer is separated from a server that stores Web pages and runs application programs that serve the client. The speech recognition controls only the program of the local computer, not the server or applications running in the server. However, such an approach has several disadvantages.
First, high-quality speech recognition is expensive both in the commercial sense and in terms of computing resources and power needed to provide acceptable results. Currently, high-quality speech recognition software is in very limited use and is not generally available at a reasonable price to the vast majority of home or business computer users.
Also, local control allows an end user to access only the information structure presented by the program currently running in the local computer. The voice commands are limited to the command set of the current program. Local control cannot provide flexible shortcuts through a Web site, and cannot enable the Web site to identify, during a voice dialogue with the end user, suitable Web pages to present to the end user.
Thus, there is a need for a system that enables a local client to rapidly retrieve information from a remote server using voice commands.
There is also a need for a system that enables a local client to carry out a voice dialogue with a remote server and receive or retrieve visual and graphic information that is coordinated with the voice dialogue.
There is also a need for an arrangement with which voice commands or a voice dialogue can be used to locate, select, activate or initiate a computing process or service that is available at the server; to locate information in a database, and to execute trades in a securities trading system.
These and other needs are fulfilled by the present invention, which comprises, in one embodiment, a method of controlling a remote server by a voice command issued from a location local to a client, comprising the steps of establishing a voice communication channel between the location local to said client and the remote server; establishing a data communication channel associated with the voice communication channel between the client and the remote server; receiving the voice command by the voice communication channel; associating the voice command with a resource identifier; selecting a server resource based on the resource identifier; and delivering the resource from the remote server to the client by the data communication channel.
One feature of the invention is establishing the voice communication channel integrated with the data communication channel. Another feature is delivering a voice message over the voice communication channel in coordination with delivery of the server resource. Yet another feature is processing the voice command using a speech recognition process to recognize the natural language phrase.
One aspect of this embodiment is associating the natural language phrase with a network address by performing the steps of: identifying the natural language phrase in a table of the remote server that maps natural language phrases to network addresses; and looking up the network address in the table. A feature of this aspect is loading said document identified by the network address from a storage device coupled to the remote server.
Another aspect of the invention is establishing a data communication channel between the client and the remote server configured to communicate data between the client and the remote server; and establishing a voice communication channel between the client and the remote server configured to communicate sound information including the voice command between the client and the remote server. One feature of this aspect is establishing a telephone connection from the client to a voice recognizer that is coupled to the remote server. Another feature is establishing a telephone connection from the client to an interactive voice response (IVR) system coupled to the remote server. Still another feature is establishing an Internet telephony connection from the client to a voice recognizer that is coupled to the remote server.
Still another aspect of the invention is recognizing the natural language phrase in the voice command at the IVR system; transmitting the natural language phrase to the remote server; and transmitting a voice response from the IVR system to the client.
The invention also encompasses a computer system and a computer program product configured in accordance with the foregoing aspects and features.
Additional objects, advantages and novel features of the invention will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following or may be learned by practice of the invention. The objects and advantages of the invention may be realized and attained by means of the instrumentalities and combinations particularly pointed out in the appended claims.