The Internet is essentially a network of servers containing information that users can obtain using personal computers. Users generally connect to a server, a computer equipped with information and capabilities that assist the user with contacting other servers and obtaining additional information. Users typically execute these functions, also referred to as "navigating" on the Internet, using a mouse and Windows-based software. The user's navigation of the Internet is thus essentially graphically-based (looking at a screen) with functions activated using a mouse.
Speech recognition software and hardware for use in conjunction with personal computers and other environments, like the Internet, is a rapidly developing technology. With speech recognition, a user's voice commands are recognized by a computer and then converted, based on the speech pattern, into an electronic signal. For example, speech recognition has been highly successful in the field of long-distance telephone calling for the purpose of allowing collect calls. Typically, with this application, a caller will provide a name and a phone number to a computer when making a collect call. The computer will then place the caller on hold and call the number to be reached. The person receiving the collect call will answer "yes" or "no" in response to the computer message and the collect caller's name. The voice recognition hardware and software, which is also known as a speech recognition engine, either signals a switch to complete the call upon recognizing the "yes" response, or to disconnect upon recognizing the "no" response.
One issue with using speech recognition is selecting the appropriate speech recognition engine to use for a particular application. These speech recognition engines include speaker dependent and independent dictation machines, continuous speech systems, large vocabulary systems, and small vocabulary systems. Further, these systems can be Windows based, Macintosh based, UNIX based, Windows NT based, or based on another platform, depending on the preferred operating system.
Speech recognition operating in conjunction with computer connection with the Internet, also known as speech enabling of the Internet, appears to have promising application possibilities. One possible application of this technology is for navigational purposes on the Internet. For example, speech recognition has been successfully utilized at the desktop level generally. Voice macros have been created for a number of Windows functions for use on the Internet. A macro is a series of functions on the computer activated by a single command. For a voice macro, the speech server's recognition of an inputted voice command activates a series of commands.
Two prior art methods for speech-enabling the Internet have been explored by various companies and research entities. In general terms, researchers have approached the problem from either the perspective of speech-enabling the Internet, or from the perspective of Internet-enabling the telephone system.
The first method is the most common approach and the one being pursued by Texas Instruments, Apple Computer, and Microsoft. In this approach, the speech recognition engine is located on the local host, along with the web browser. This approach allows such activities as those described above--voice macros for Windows functions that can be used when browsing the Internet.
Texas Instruments further refined this approach by using the text associated with hotlinks to supply the vocabularies for the recognizer. Apple has taken the approach of making both the web browser and the speech recognition engine scriptable (controllable with the AppleScript language). Microsoft has taken the approach of providing tools for web page developers to allow them to speech-enable their web pages. These tools provide a mechanism for supplying the recognizer with grammars and their speech synthesizers with spoken prompts.
The advantages of the present invention over this method include: (1) telephone access serves a far greater potential audience than speech access limited to desktop operations; (2) no additional requirements of the user's computer, such as a speech recognition engine, are required; (3) the system uses a migration path starting with an immediate utility with no long-term limitations; and (4) direct benefits are available from telephony integration.
Internet-enabling the telephone system is primarily being investigated as a research effort. Demonstrations from MIT and the Sun SpeechActs group have shown potential for using a speech-only interface for retrieving personal information (voice e-mail) over the phone and for using the Internet as an up-to-date repository of information available over the phone. For example, ALTech, a commercial spin-off of MIT, has demonstrated the use of a speech server for obtaining information about local movies.
Advantages of the present invention over this method include: (1) an optional Graphical User Interface (GUI) makes using the system with today's World Wide Web much more practical and simple than attempting to do it with speech alone; (2) the potential user base is just as large over the long term; and (3) providing tools to other developers is expected to lead to much more rapid progress than attempting to build speech-only interfaces from the ground up.