1. Field of the Invention
The present invention relates to a method and apparatus for Internet access, and more particularly to accessing and navigating the Internet through the use of an audio interface via standard POTS (plain old telephone service).
2. Description of the Related Art
The number of Internet access methods has increased with the rapid growth of the Internet. World Wide Web (WWW) xe2x80x9csurfingxe2x80x9d has likewise increased in popularity. Surfing or xe2x80x9cInternet surfingxe2x80x9d is a term used by analogy to describe the ease with which a user can use the waves of information flowing around the Internet to find desired or useful information. The term surfing as used in this specification is intended to encompass all of the possible activities a user can participate in using the Internet. Beyond looking up a particular Internet resource or executing a search, surfing as used herein is intended to include playing video games, chatting with other users, composing web pages, reading email, applying for an online mortgage, trading stocks, paying taxes to the Internal Revenue Service, transferring funds via online banking, purchasing concert or airline tickets, etc. Various kinds of web browsers have been developed to facilitate Internet access and allow users to more easily surf the Internet. In a conventional web interface, a web browser (e.g. Netscape Navigator(copyright) which is part of Netscape Communicator(copyright) produced by Netscape Communications Corporation of Mountain View, Calif.) visually displays the contents of web pages and the user interacts with the browser visually via mouse clicking and keyboard commands. Thus, web surfing using conventional web browsers requires a computer or some other an Internet access appliance such as a WB-2001 WebTV(copyright) Plus Receiver produced by Mitsubishi Digital Electronics America, Inc. of Irvine, Calif.
Recently, some web browsers have added a voice based web interface in a desktop environment. In such a system, a user can verbally control the visual web browser and thus surf the Internet. The web data is read to the user by the browser. However, this method of Internet access is not completely controllable by voice commands alone. Users typically must use a mouse or a keyboard to input commands and the browser only reads the parts of the web page selected using the mouse or the keyboard. In other words, existing browsers that do allow some degree of voice control still must rely on the user and visual displays to operate. In addition, these browsers require that the web data to be read aloud must be formatted in a specific way (e.g. the shareware Talker Plug-In written by Matt Pallakoff and produced by MVP Solutions Inc. of Mountain View, Calif. can be used with Netscape Commerce Server and uses files formatted in accordance with a file format identified by the extension xe2x80x9c.talkxe2x80x9d (see i.e. http://www.mvpsolutions.com/PlugInSite/Talker.html which was printed on Jun. 22, 1999 and is incorporated herein by reference.)
Some commercially available products (e.g. Dragon Dictate(copyright) from Dragon Systems Inc. of Newton, Mass.) can read a web page as displayed on a conventional browser in the standard web data format, however, the particular portion of the page to be read must be selected by the user either via mouse or voice commands. A critical limitation of these systems is that they require the user to visually examine the web data and make a selection before any web data to speech conversion can be made. This limitation also exists when using these systems to surf the web. The user needs to look at the browser and visually identify the desired Uniform Resource Locator (URL) (or use a predetermined stored list of URLs) and then select the desired URL by voice commands. What is needed is a means to access and surf the Internet that does not rely upon the user being able to visually perceive web data. What is further needed is a system for xe2x80x9caudio-onlyxe2x80x9d access to the Internet that does not require the authors of web pages to provide web data in specialized formats for audio play-back.
In view of the background discussed above, it is an object of the present invention to provide an improved web browser interface that: does not require the use of a computer or other Internet appliance, thus making Internet access significantly simpler by using a ubiquitous device like POTS; can interact with the user completely through audio signals using voice recognition and web data to speech conversion (i.e., without any need to visually perceive web pages); and allow the use of a conventional visual browser component but with a more intelligent interface that permits audio-only control and feedback (i.e., looking at the browser is optional). Another object of the present invention is to bring Internet access to the masses of people who either cannot afford a computer or lack computer training but can use the ubiquitous POTS. Thus, the present invention allows Internet browsing without requiring the substantial cost of owning and operating a computer or Internet access appliance.
In addition, since the present invention allows a user to browse the Internet with voice only, the user is thus enabled to do so while his eyes and/or hands are otherwise occupied (e.g., while driving, walking, or operating machinery). Another object of the present invention is to facilitate audio-only web browsing using web data as currently formatted (i.e., the present invention does not require a change to the existing web server data format to support audio-only browsing). Another object of the present invention is to allow access to email using POTS.
Thus the present invention provides a method of browsing the Internet comprising the steps of establishing bi-directional voice communication link with an audio Internet service provider, speaking a web surfing voice command over the bi-directional voice communication link, and then the audio Internet service provider generating a voice response representative of a World Wide Web page corresponding to the web surfing voice command. The step of generating a voice response includes the steps of translating the spoken web surfing voice command into a conventional web browser command using a speech recognition unit, retrieving Internet data responsive to the conventional web browser command, identifying portions of the Internet data useful to create an audio representation of the Internet data, and translating the identified Internet data into a computer-generated voice signal.
The present invention further includes a system for browsing the Internet comprising a telephone and an audio Internet service provider coupled to the telephone. The audio Internet service provider includes a data Internet service provider coupled to an apparatus operable to perform a selective translation function, wherein the apparatus selectively translates between voice signals and Internet data signals. The voice signals include spoken language and the internet data signals include World Wide Web pages. The apparatus operable to perform a selective translation function includes an intelligent agent that includes a speech recognition engine (SRE), a text to speech conversion engine (TTS), an understanding unit (UU) for interpreting the voice signals and processing the Internet data signals, and a transaction processing unit (TPU).