1. Related Applications
This application is a continuation-in-part of U.S. patent application Ser. No. 09/464,989, entitled “Voice Interface for Electronic Documents,” filed Dec. 16, 1999. This application also claims the benefit of U.S. Provisional Patent Application Ser. No. 60/263,003, entitled “Choosit/User Defined Mapping,” filed Jan. 19, 2001. The foregoing patent applications are incorporated herein by reference.
2. The Field of the Invention
The present invention relates to methods and systems for enabling a user to map content of an electronic document so that the information can be accessed from an audio interface. More particularly, the present invention relates to methods and systems for enabling a user to identify and map regions of electronic documents containing text and links so that an audio representation of text located within selected regions and links can be easily accessed over a telephone system.
3. The Prior State of the Art
In recent years, the Internet has become an indispensable source of information for millions of people in their professional and private lives. For example, the Internet is used for email, banking, investing, shopping, news, entertainment, corporate networking, and social interaction. Not too many years ago, however, sharing information over the Internet was principally the domain of academia and scientists. At the time, the Internet was difficult to navigate and essential computer hardware was prohibitively expensive. However, the advent of user-friendly hypertext navigation and the World Wide Web (“Web”), in conjunction with modestly priced and increasingly powerful personal computers, has propelled the Internet to the forefront of public attention, making it the indispensable source of information that it is today.
Like the Internet, the use of portable telephones (e.g. cellular and digital telephones) has experienced tremendous growth in recent years. Initially, however, portable telephones were not widely used because of problems that included providing coverage beyond major metropolitan areas, the expense of purchasing a portable telephone device, the expense of airtime for use, and for some people, because portable telephone devices were originally too big to make them convenient. As a result, portable telephones were mostly used only for vital business concerns rather than for personal matters. Over the past few years, however, portable telephones have become increasingly more sophisticated, compact, and affordable. As a result, portable telephone use has dramatically increased, and for many people, it is now a primary means of communication.
The growing dependence on increasingly intelligent portable telephones, together with the increasing dependence on the Internet as a source of information has created the framework for the inevitable convergence of portable telephone use and the Internet. The ability to access the Internet from a portable telephone is particularly beneficial for enabling remote and mobile access to the Internet. The use of portable computers is one alternative for enabling a user to access the Internet from a mobile or remote location. However, this is not a practical solution for many people who cannot afford a portable computer. They also are not nearly as portable as cellular and digital telephones. For instance, portable telephones in general are smaller, less expensive and more plentiful than portable computers and their battery life generally exceeds that of a portable computer. Furthermore, portable telephones, unlike portable computers, can be used hands-free, while performing other tasks, such as driving an automobile. Accordingly, it is desirable to enable Internet access from portable telephone devices.
One unfortunate problem, however, is that portable telephones do not interface well with the Internet. In particular, hypertext navigation of the Web is a two-dimensional and visually oriented activity, which makes a computer with a relatively large screen an obvious choice for access. A large screen makes it possible for a user to visually inspect the layout of a document and to quickly find the information that he or she wants to read. This is not possible, however, with a portable telephone because portable telephones are very small and compact and have only a very small display screen, if any at all. This makes it impractical, or impossible, to display a Web page on a portable telephone device in a way that is conducive to user-friendly navigation of the Web.
One area that may be particularly well-served by telephone access to the Internet is the personal home page market, as it is becoming increasingly popular for content providers, such as Yahoo!, to offer personal Web home pages that enable a user to compile various desired content into one location. For example, a user may chose to have current data regarding various financial markets, weather, sports stories, headlines, technology, calendaring, contacts, entertainment, travel, reference, etc., appear on a single personal home page. By providing a single, convenient source of diverse information, these personal home pages are highly attractive because, after requiring an initial investment of set up time, they cut down the total amount of time a user would otherwise have to spend to find desired information on a reoccurring basis.
A convenient way to access the Internet by telephone would be useful for anyone who does not have constant access to a networked personal computer. It would also be particularly beneficial to provide an effective audio interface to the Internet for enabling the visually impaired to access the Internet from a portable telephone device that does not have to be attached to Braille machinery.
To overcome the visual display limitations associated with portable telephone device, techniques have been developed that enable a user to access audio representations of Internet content. This is accomplished in one of two general ways, as illustrated in FIGS. 1 and 2. Either a Web page is modified to enable direct audio access from that page, or alternatively, text-to-speech software is used to dictate information from a static translation of an existing Web page.
The prior art solutions, however, are inherently incompatible with the two dimensional format in which visual information is presented. In particular, audio is presented in a linear format, as a function of time. This makes it very difficult for a user to navigate through undesired information to find the information that is desired. Furthermore, it is difficult to impossible for a user to quickly scan an entire Web page for a desired link or for desired content when the Web page is dictated to the user. The sequential format, in which audio is presented, makes navigation of the Internet with an audio interface a very time consuming activity.
FIGS. 1 and 2 show the two general prior art approaches for accessing the Internet with an audio interface. The first approach, designated as 100, in FIG. 1, illustrates a source 110 of electronic content that is accessible through telephone 120. The content in source 110 is written in a markup language specifically designed for enabling audio output through an audio interface. For this approach to work with existing Web pages, it is necessary, however, that existing Web pages be translated into a suitable Voice eXtensible Markup Language (VXML), such as Motorola's VoxML, which includes explicit elements or tags for enabling voice interaction. Requiring explicit voice elements, however, presents a serious drawback, namely, it does not provide a means for accessing content that does not include VXML voice elements. Thus, VoxML fails to provide access to the wealth of content already available on the Web, written mostly in HyperText Markup Language (“HTML”). In other words, to provide full Web access, the entire content of the Web would need to be rewritten to include explicit voice tags of a VXML. The difficulty of accomplishing this task is further compounded by the fact that because there are several existing VXMLs, and more potentially to be developed, it is unclear which VXML should be used or will ultimately be adopted by the industry.
FIG. 2 shows another prior art approach, designated as 200, for enabling a user to access the Internet using an audio interface. As shown, HTML source 210, represents existing Web content that can be accessed through visual and audio interfaces. First, as is well known in the art, visual browser 220 provides a visual interface for monitor 230. Second, static translation 240 provides an audio interface for telephone 250. Static translation 240 is a copy of at least a portion of HTML source 210 that has been manually altered to include audio elements. Someone examines HTML source 210, creates a corresponding audio interface using Text To Speech (TTS) technology, and then stores the audio interface in static translation 240. A user who is interested in accessing HTML source 210 through telephone 250 interacts with the audio interface provided by static translation 240.
The solution of FIG. 2 has the advantage of providing an audio interface without obligating HTML content providers (e.g., providers of HTML source 210) with the responsibility of maintaining an audio interface. However, this approach imposes new problems that may be nearly equal to the one it proposes to solve. Like the approach in FIG. 1, a significant amount of work must be devoted to identifying HTML content of interest and then modifying it. Once the content has been initially modified, each time HTML source 210 changes, corresponding changes must be made to the static translation 240. Naturally, some delay will occur between the time HTML source 210 changes and the corresponding modifications are made to static translation 240. For content that changes unpredictably, frequent and constant updating is a significant burden. Moreover, because of the incredible amount of HTML content available on the Web, only a small portion could practically be modified to include an audio interface and placed in static translation 240, leaving vast Web content completely inaccessible to telephone 250 use.
Another problem for users of each of the foregoing techniques is that an Internet document provider can load the static translation 240, or the VXML source 110 with commercial advertisements that will ultimately be passed on to the user in audio form. This is a problem, not only because a user may find the advertisements to be obnoxious, but also because they can take up precious and sometimes expensive airtime associated with a portable telephone service agreement. Advertisements that are displayed on a graphical browser can be quickly viewed and dismissed, perhaps even unconsciously. However, audio representations of the same advertisements are presented in a time intensive sequential format that cannot be dismissed. This problem, however, is not limited to advertisements. It occurs whenever a user accesses an Internet document that contains some information that the user has no interest in receiving, but which is still included in the VXML source 110 coding or static translation 240 of the HTML code. As a matter of illustration, and not limitation, if a user wishes to access financial data from a Web page containing an index or table of financial data, and the user only wants to access selected portions of the Web page, the user may have to wait and suffer through the dictation of undesired content before the desired content is finally presented.
Accordingly, it would be desirable to enable a user to control how content from an Internet document is presented through an audio interface. Currently, there is no end in sight for the increasing growth of the Internet nor is it likely that the Internet's expanding importance as a source of information will diminish any time soon. Considering the corresponding growth in portable telephone use, providing users with controlled and effective audio access to the Internet would be a great benefit. It would also be beneficial to accomplish this without requiring modification to the existing source of HTML Internet documents.