1. The Field of the Invention
The present invention relates to methods, systems, and computer program products for accessing electronic documents. More specifically, the present invention relates to methods, systems, and computer program products for providing a voice interface to electronic documents.
2. The Prior State of the Art
As computers have become ubiquitous in our day-to-day activities, the advantages of storing information electronically have steadily increased. One of the primary advantages of electronically stored information is its inherent versatility. For example, editing and exchanging electronic information is greatly simplified as compared to editing and exchanging documents stored in paper form only. Furthermore, any advantage attributable to having a physical document is retained in electronic storage because a xe2x80x9chard copyxe2x80x9d of an electronic document may be readily produced from the electronic version.
Another significant advantage of electronically stored documents is that of providing enhanced access to information. Over the past few years, the improved access offered by electronic documents has become so important that many organizations expend substantial resources in scanning paper documents to store them electronically. Routine facsimile transmission further exemplifies the value of electronic access to documents. Arguably, it is access to information that fuels what many refer to as the Information Age.
Today, perhaps the most prominent example of access to electronically stored information is the Internet. Literally millions of people depend on the Internet for email, banking, investing, shopping, news, entertainment, and social interaction. Not too many years ago, sharing information over the Internet was principally the domain of academicians and scientists. For members of the general public, the cryptic nature of access tools and the essentially prohibitive computer hardware requirements meant virtual anonymity for the Internet. However, the advent of hypertext navigation and the World Wide Web (xe2x80x9cWebxe2x80x9d), in conjunction with modestly priced and increasingly powerful personal computers, has propelled the Internet to the forefront of public attention and has made the Internet an almost indispensable source of information.
Likewise, use of early cellular telephone technology was also limited. Initially, problems included providing coverage beyond major metropolitan areas, the expense and size of cellular telephones, and the expense of airtime. As a result, cellular telephones were used mostly for vital business concerns rather than for personal matters. Over the past few years, however, the cellular industry has solved, to one degree or another, most of the problems that inhibited cellular""s general acceptance. Today, cellular telephone use has dramatically increased and, for many people, is the primary means of communicating with others.
Increasing dependence on cellular telephones as a primary means of communication together with increasing dependence on the Internet as a source of information presents an unfortunate problem: a primary means of communication, the cellular telephone, does not interface well with a vital source of information, the Internet. The problem is compounded in that the hypertext navigation of the Web is visually oriented, making a computer with a relatively large screen an obvious choice for access, yet the size of cellular telephones is much more conducive to convenient portability. Frequently cellular telephones are clipped to belts or placed in pockets or purses; portable computers require their own case and a free hand to carry. Moreover, public telephones are available to those who do not carry cellular telephones, whereas public computers have a minimal presence at best.
Although the prior art includes some attempts to solve the problem of accessing electronic documents by voice, none of the prior art teachings offer the comprehensive solution provided by the present invention. Specifically, FIGS. 1 and 2 show the prior art""s approaches to accessing Internet documents, approaches that have proven to be generally inadequate in many ways. The approach designated generally at 100 illustrates a Source 110 of electronic content that is accessible through Telephone 120. The content in Source 110 is written in a markup language specifically designed for telephone access.
Using Motorola""s Voice extensible Markup Language (xe2x80x9cVoxMLxe2x80x9d), the information includes explicit elements or tags for enabling voice interaction. However, requiring explicit voice elements presents a serious drawback: it provides no means for accessing content that does not include the VoxML""s voice elements. Thus, VoxML provides no access to the wealth of content already available on the Web, written mostly in HyperText Markup Language (xe2x80x9cHTMLxe2x80x9d). In other words, to provide full Web access, the entire content of the Web would need to be rewritten to include VoxML""s explicit voice tags.
Moreover, VoxML""s facilities for authoring voice content do not provide for using a common source to generate both audio and visual interfaces. Therefore, even if a single document contains both visual and audio elements, the elements must be maintained separately; any changes to one must be replicated in the other.
FIG. 2 shows another approach to the problem, designated as 200, that has proven to be generally inadequate. HTML Source 210, representing existing Web content, can be accessed through one of two interfaces. First, as is well known in the art, Visual Browser 220 provides a visual interface for Monitor 230. Second, Static Translation 240 provides an audio interface for Telephone 250. Static Translation 240 is a copy of at least a portion of HTML Source 210 that has been manually altered to include audio elements. Someone examines HTML Source 210, creates a corresponding audio interface, and then stores the audio interface in Static Translation 240. A user who is interested in accessing HTML Source 210 through telephone 250 interacts with the audio interface provided by Static Translation 240.
The solution of FIG. 2 has the advantage of providing an audio interface without obligating HTML content providers (e.g., providers of HTML Source 210) with the responsibility of maintaining an audio interface. However, this approach imposes new problems that may be nearly equal to the one it proposes to solve. Like the approach in FIG. 1, a significant amount of work must be devoted to identifying HTML content of interest and then modifying that. Once the content has been initially modified, each time HTML Source 210 changes, corresponding changes must be made to the Static Translation 240. Naturally, some delay will occur between the time HTML Source 210 changes and the corresponding modifications are made to Static Translation 240. For content that changes frequently, such as information regarding financial markets, frequent and constant updating is a significant burden. Moreover, because of the incredible amount of HTML content available on the Web, only a small portion could be modified to include an audio interface and placed in Static Translation 240, leaving vast Web content completely inaccessible to Telephone 250.
One area that may be particularly well-served by telephone access is the personal home page market, as it is becoming increasingly popular for content providers, such as Yahoo!, to offer personal Web home pages. These personal pages allow a user to select from a variety of content that is placed on a single Web page. For example, a user may chose to have current data regarding various financial markets, weather, sports stories, headlines, technology, calendaring, contacts, entertainment, travel, reference, etc., appear on a personal home page. By providing a single, convenient source of diverse information, these personal home pages are highly attractive.
There is no end in sight for the increasing growth of the Internet nor is it likely that the Internet""s expanding importance as a source of information will diminish any time soon. Considering the corresponding growth in cellular telephone use and the cellular telephone""s convenient size, providing cellular access to the Internet in particular and electronic content in general would be a great benefit. Furthermore, public telephones also could provide beneficial Internet access for those who do not carry cellular telephones. However, the prior art lacks effective methods, systems, and computer program products for providing voice or audio interfaces to electronic content.
The problems in the prior state of the art have been successfully overcome by the present invention, which is directed to methods, systems, and computer program products for providing a voice interface to electronic documents. The present invention allows for access to existing electronic content without requiring any modification to the content source. Furthermore, the present invention allows for a common content source to incorporate both a visual and audio interface, without including separate markups for each interface, making the content source more easily maintained. Although embodiments of the present invention are described as applied to Web pages in an Internet context, the invention is not limited to any particular format of electronic information or any particular network typically used for accessing electronic content.
In one preferred implementation, the present invention works with content that operates as an index to additional content, such as is typical with personal home pages. The present invention takes the content of a personal home page and creates a hierarchy of categories that are presented to a client. There is no requirement that the client is necessarily a person. For example, the client may be an intervening service needing an audio interface to electronic documents. The present invention generates an audio representation of the available categories and allows the client to select one. Navigating through the hierarchy, the client may eventually reach the bottom hierarchy level, with links pointing to content that includes text mixed with links. At this point, the present invention reports the number of links, and provides an audio representation of the text.
Because creating categories requires some knowledge of the layout for personal home pages, Web content in general will not be mapped into various categories. For unmapped content, the present invention operates as described above with respect to text mixed with links, by reporting the number of links on a page and providing an audio representation of the page""s text. Alternatively, a client may choose to hear an audio representation that only includes links. In response, the client may select a link of interest to follow. The present invention also provides a variety of global commands that are available to assist navigation.
The foregoing methods, systems, and computer program products provide significant advantages over the prior art. Because the present invention provides an audio interface without requiring any modification to existing content, the telephone access will be readily available to the vast information available electronically. Moreover, the present invention also provides for organizing certain content by mapping links and text to a hierarchy of categories to aid navigation.
These and other objects, features, and advantages of the present invention will become more fully apparent from the following description and appended claims, or may be learned by practicing the invention as set forth below.