A vast amount of information is available using computer servers. Servers, mainframe computers, and other computer storage devices on networks provide a warehouse of information and services. However, accessing information and initiating processes or services on such servers is difficult using presently available mechanisms. In particular, multiple commands using a keyboard or mouse are usually required for a user to navigate through the file directory structure of a server to locate desired information. The data processing field has failed to develop systems that provide voice control of a remote server from a local point, so that a local user can command the remote server by voice to display visual information at the local point or carry out a desired process.Past systems provide incomplete solutions to this problem. For example, interactive voice response (IVR) systems are used to deliver stored information over a telephone line to an end user. IVR systems are specialized computer systems that have a processor, a storage device such as a hard disk, hardware for interfacing the processor to the public switched telephone network (PSTN) and an IVR application program that runs on the processor. Generally, the end user connects to the IVR system using a telephone. The end user takes the telephone handset off hook and dials a pre-determined telephone number that identifies the IVR system. The telephone call is delivered over the PSTN to one of several trunk lines connected to the IVR system. The IVR system answers the call by seizing one of the trunk lines and playing a pre-recorded greeting to the caller. Typically the greeting is a voice recording stored digitally on a storage device that provides the end user with a menu of processing options that can be selected using telephone dial pad keys. Some IVR systems include voice recognition software or processors, so that an end user can select system options by speaking a short word or phrase such as a number.
Example IVR applications include automated receptionist services, various customer service or "help desk" applications, and airline reservations systems.
IVR systems can be configured to carry out a voice dialogue with the end user. The dialogue comprises a series of questions from the IVR system and answers from the end user until the desired service is provided to the end user by the IVR system. However, IVR systems are unable to display visual information, such as pre-formatted text or graphics, or dynamically created custom graphic information, in conjunction with the dialogue. The absence of visual information from present IVR systems is a major limitation, and represents the loss of a powerful medium for conveying information.
Client-server computer systems also provide an incomplete solution. In a client-server system, a client at a local point is connected by a data connection to a server at a remote point. The client can be a computer or a combination of a computer and software running on the computer. The data connection can be a cable, a local area network (LAN), a wide area network, or another type of network. The data connection can be the global network, operating according to standard protocols, known as the Internet. The server can be a file server of the LAN, or a server not affiliated with the client. For example, the server can be a server that is publicly accessible using anonymous file transfer protocol (FTP) over the Internet. Using the Internet and certain wide area network technologies, a client can connect to, "log on" to, request and use a distant server.
One popular technology enjoying wide use with the Internet is known as the World Wide Web. The World Wide Web enables a computer to locate a remote server using a server name in an agreed-upon format that is indexed at a central Domain Name Server (DNS). The local computer or client runs a browser program. Using the browser, the client locates the remote server using the DNS, and connects to the remote server. The client requests information from the server using a communication protocol called the Hypertext Transfer Protocol (HTTP), by providing a Uniform Resource Locator (URL) that uniquely identifies a page of information stored on the server. A URL is a form of network address that identifies the location of information stored in a network and represents a reference to a remote World Wide Web server, known as a website. The pages of information are files prepared in the Hypertext Markup Language (HTML). Thus, a Web client-server system can be used to request and display information stored on a remote server.
URLs generally are formatted according to the following syntax:
&lt;protocol id&gt;://{&lt;server&gt;} &lt;second level domain&gt;&lt;top level domain&gt;{&lt;directory&gt;} {&lt;file&gt;}
The &lt;protocol id&gt; identifies the transmission protocol to be used. For example, in the case of the Web it is &lt;http&gt;, and in the case of an anonymous file transfer protocol transaction it is &lt;ftp&gt;. The &lt;server&gt; element is an optional server name such as &lt;www.&gt;. The &lt;server&gt; element may also identify a directory on a storage device of the Web server that contains HTML documents. The &lt;second level domain&gt; element is the name of the server domain as found in the DNS table, such as &lt;etrade&gt;. The &lt;top level domain&gt; element identifies the type of the second level domain, and must be an item selected from a finite set of globally recognized top level domains, such as "com," "org," "mil," "edu," "gov," and others. The &lt;directory&gt; element is an optional name of a directory within the &lt;server&gt;, such as &lt;DocumentRoot/&gt;. The &lt;file&gt; element is an optional name of a file, document or image to be retrieved, such as &lt;Index.html&gt;. Thus, a URL serves as one type of a network address to locate a document anywhere in a network.
However, client-server systems and World Wide Web systems cannot respond to voice commands and cannot deliver visual or graphic information that is coordinated with a voice dialogue. These systems also do not enable a user to select or initiate computation processes in coordination with a voice dialogue. World Wide Web systems can include panels or pages that are dynamically generated by the systems, and can include internal or external computation processes rather than static documents or pages. However, there is no way to select such processes or locate them using voice interaction.
In addition, when a user wishes to obtain specific information or activate a specific process using a Web server, but the URL of the information or process is unknown, the user must follow the hypertext links of numerous irrelevant pages until the desired information is located. Navigation through this control structure is clumsy. It is especially inconvenient at the slow data transfer speeds that presently characterize most use of the Web.
One approach to these issues is to integrate speech recognition software in a computer program or computer remote from a server. The computer is connected through an interface to a microphone. The computer runs a speech recognition program that converts voice commands received by the microphone into keystrokes or commands understandable by the program. This is known as "local control" or client-side speech recognition because recognition of voice commands is carried out in a local computer. The local computer is separated from a server that stores Web pages and runs application programs that serve the client. The speech recognition controls only the program of the local computer, not the server or applications running in the server. However, such an approach has several disadvantages.
First, high-quality speech recognition is expensive both in the commercial sense and in terms of computing resources and power needed to provide acceptable results. Currently, high-quality speech recognition software is in very limited use and is not generally available at a reasonable price to the vast majority of home or business computer users.
Also, local control allows an end user to access only the information structure presented by the program currently running in the local computer. The voice commands are limited to the command set of the current program. Local control cannot provide flexible shortcuts through a Web site, and cannot enable the Web site to identify, during a voice dialogue with the end user, suitable Web pages to present to the end user.
Thus, there is a need for a system that enables a local client to rapidly retrieve information from a remote server using voice commands.
There is also a need for a system that enables a local client to carry out a voice dialogue with a remote server and receive or retrieve visual and graphic information that is coordinated with the voice dialogue.
There is also a need for an arrangement with which voice commands or a voice dialogue can be used to locate, select, activate or initiate a computing process or service that is available at the server; to locate information in a database, and to execute trades in a securities trading system.