1. Field of the Invention
The present invention relates to a method for accessing the Internet, and more particularly to accessing and navigating the Internet through the use of an audio interface, e.g., via standard POTS (plain old telephone service), with vocal and aural navigation, selection and rendering of the Internet content.
2. Description of the Related Art
The number of Internet access methods has increased with the rapid growth of the Internet. World Wide Web (WWW) “surfing” has likewise increased in popularity. Surfing or “Internet surfing” is a term used by analogy to describe the ease with which a user can use the waves of information flowing around the Internet to find desired or useful information. The term surfing as used in this specification is intended to encompass all of the possible activities a user can participate in using the Internet. Beyond looking up a particular Internet resource or executing a search, surfing as used herein is intended to include playing video games, chatting with other users, composing web pages, reading email, applying for an online mortgage, trading stocks, paying taxes to the Internal Revenue Service, transferring funds via online banking, purchasing concert or airline tickets, etc. Various kinds of web browsers have been developed to facilitate Internet access and allow users to more easily surf the Internet. In a conventional web interface, a web browser (e.g., Netscape Navigator® which is part of Netscape Communicator® produced by Netscape Communications Corporation of Mountain View, Calif.) visually displays the contents of web pages and the user interacts with the browser visually via mouse clicking and keyboard commands. Thus, web surfing using conventional web browsers requires a computer or some other an Internet access appliance such as a WB-2001 WebTV® Plus Receiver produced by Mitsubishi Digital Electronics America, Inc., of Irvine, Calif.
Recently, some web browsers have added a voice based web interface in a desktop environment. In such a system, a user can verbally control the visual web browser and thus surf the Internet. The web data is read to the user by the browser. However, this method of Internet access is not completely controllable by voice commands alone. Users typically must use a mouse or a keyboard to input commands and the browser only reads the parts of the web page selected using the mouse or the keyboard. In other words, existing browsers that do allow some degree of voice control still must rely on the user and visual displays to operate. In addition, these browsers require that the web data to be read aloud must be formatted in a specific way (e.g., the shareware Talker Plug-In written by Matt Pallakoff and produced by MVP Solutions Inc. of Mountain View, Calif., can be used with Netscape Commerce Server and uses files formatted in accordance with a file format identified by the extension “talk”.
Some commercially available products (e.g., Dragon Dictate® from Dragon Systems Inc. of Newton, Mass.) can read a web page as displayed on a conventional browser in the standard web data format, however, the particular portion of the page to be read must be selected by the user either via mouse or voice commands. A critical limitation of these systems is that they require the user to visually examine the web data and make a selection before any web data to speech conversion can be made. This limitation also exists when using these systems to surf the web. The user needs to look at the browser and visually identify the desired Uniform Resource Locator (URL), or use a predetermined stored list of URLs, and then select the desired URL by voice commands.
For reasons of increased mobility, it would be more desirable to be able to access and surf the Internet without being required to visually perceive the web data. Furthermore, it would be desirable to allow for “audio-only” access to the Internet such that authors of web pages need not provide web data in specialized formats for audio playback. However, the Internet is primarily a visual medium with information designed to be accessed visually, i.e., by looking at it. Accordingly, the information is displayed with visual access in mind, resulting in use of columns, tables, frames, color, bolded text, various font sizes, variable positioning of text and images, popup windows and so on. During observation, the human brain processes such information and selects the content that the user is interested in reading. When such information is accessed by voice, normally all of the associated text is extracted after filtering out graphics, banners, images, HTML and XML tags, and other unwanted nuances not useful to audio playback. Listening to such content may require much time and thereby lose the interest of the user. Also, selecting part of the text or navigating within a large amount of text displayed for visual access in mind is very difficult.
What would be helpful is an appropriate way of rendering the Internet content such that a relatively small amount of text is produced, quite suitable for audio playback, for facilitating further navigation and selection of content while still accurately representing the source data, i.e., the visual web page.
Additionally, some further important issues relating to accessing the Internet by voice include inter- and intra-page navigation, finding the correct as well as relevant contents on a linked page, and assembling the right contents from a linked page.