The amount of information available over communication networks is large and growing at a fast rate. The most popular of such networks is the Internet, which is a network of linked computers around the world. Much of the popularity of the Internet may be attributed to the World Wide Web (WWW) portion of the Internet. The WWW is a portion of the Internet in which information is typically passed between server computers and client computers using the Hypertext Transfer Protocol (HTTP). A server stores information and serves (i.e. sends) the information to a client in response to a request from the client. The clients execute computer software programs, often called browsers, which aid in the requesting and displaying of information. Examples of WWW browsers are Netscape Navigator, available from Netscape Communications, Inc., and the Internet Explorer, available from Microsoft Corp.
Servers, and the information stored therein, are identified through Uniform Resource Locators (URL). URL's are described in detail in Berners-Lee, T., et al., Uniform Resource Locators, RFC 1738, Network Working Group, 1994, which is incorporated herein by reference. For example, the URL http://www.hostname.com/document1.html identifies the document "document1.html" at host server "www.hostname.com". Thus, a request for information from a host server by a client generally includes a URL. The information passed from a server to a client is generally called a document. Such documents are generally defined in terms of a document language, such as Hypertext Markup Language (HTML). Upon request from a client, a server sends an HTML document to the client. HTML documents contain information that is interpreted by the browser so that a representation can be shown to a user at a computer display screen. An HTML document may contain information such as text, logical structure commands, hypertext links, and user input commands. If the user selects (for example by a mouse click) a hypertext link from the display, the browser might request another document from a server, move the user to another part of the same document, display an image, activate an animation sequence, and so forth.
Currently, most WWW browsers are based upon textual and graphical user interfaces. Thus, documents are presented as images on a computer screen. Such images include, for example, text, graphics, hypertext links, and user input dialog boxes. Most user interaction with the WWW is through a graphical user interface. Although audio data is capable of being received and played back at a user computer (e.g. a .wav or .au file), such receipt of audio data is secondary to the graphical interface of the WWW. Thus, with most WWW browsers, audio data may be sent as a result of a user request, but there is no means for a user to interact with the WWW using an audio interface.
Recently, audio browsing systems have been developed that allow a user to access documents on a server computer using only an audio interface device (e.g. a telephone), rather than the traditional paradigm of a client computer executing a browser program. The server computer can be one of the plurality of servers that comprise the Internet.
One such audio browsing system is disclosed in the document entitled The pwWebSpeak.TM. Project which can be accessed on the Internet using the following URL: http://www.prodworks.com/pwwebspk.html, and is incorporated herein by reference.
One problem associated with existing audible web browsers, however, is the inability to provide a user interface capable of presenting the vast amounts of information found in many HTML documents in an efficient and user-friendly manner. Conceptually, the content of an HTML document is a tree structure. The document contains elements such as sections, lists, tables, buttons, and so forth. Each element, in turn, may contain other elements. For example, sections can contain lists, and a list item may itself be another list.
Many traditional voice interactive systems navigate through the tree structure of an HTML document using a menu system. This menu system, however, is less than satisfactory for a number of reasons. For example, navigation through trees purely by menus is a tedious process. The user has to often keep going up one level to pick the next menu item. In the case of a relatively large tree, a user can quickly become "lost" within the menu choices. Moreover, HTML documents frequently have no headings for some group of elements, such as list items. In other words, the page content does not directly specify a name or description for the menu from which the element would be chosen. In addition, the HTML descriptions of the document often do not define the tree structure explicitly. For example, even though HTML defines the beginning of a section with a heading tag, there is no tag to define the end of a section. In fact, many authors do not even use the heading tag, but rather rely upon visual demarcations to organize the information for a user, such as a bold font or a horizontal line.
Other voice interaction systems mimic techniques traditionally used by graphic based browsers for navigating through a document, such as text sliders for scrolling through a document, cursor movement, and instructions such as page up, page down, home, and end. These traditional navigation techniques allow the user to visually scan the various text, images, hyperlinks, and so forth, at a relatively fast rate. Similarly; some voice interaction systems provide a means for scrolling through a document by allowing the user to "tab" or manually skip through the various header levels within a document.
For example, the following table includes a list of audio browsing commands typically offered by an audio browsing system. Users may navigate through documents using audio user input, either in the form of dual-tone multi-frequency (DTMF) 15 tones or speech, as follows.
______________________________________ DTMF SPEECH NAVIGATION COMMAND COMMAND RESPONSE ______________________________________ *8 Top Jump to beginning of document *3 End Jump to end of document *6 Next Jump to beginning of next prompt sequence *7 Skip Jump to next option, link, definition or other list item *5 List List all links within a document with a pause following each link allowing user to specify a selection of the link. ______________________________________
As can be seen, these commands permit a user to skip through a document. These commands, however, do not provide a means for controlling the quantity of information read to the user by the voice interaction system. Thus, the user is left to essentially guess whether the document contains any desired information based on the short, and frequently inadequate, descriptions provided by the headers. This often results in the user having to scan through large amounts of irrelevant information, which is inconvenient and also increases connection charges for the user, as well as potentially delaying another user from accessing the system.
Based on the foregoing, it can be appreciated that there is a need for an audio browser system in which the quantity of information presented to the user in an audible format can be readily controlled in an efficient and user-friendly manner.