Quite frequently, people listen to the radio or to music while working, using their eyes and hands. People, for example, tend to listen to the radio or to music while driving, while traveling by train (particularly, packed trains), or while performing housekeeping chores, such as cooking, cleaning and washing.
Further, since with the increase in the popularity of the Internet, web and e-mail browsing have become rather generalized pursuits, there are times when people desire to access web pages or to check their e-mail but are constrained from doing so by work or other activities that restrict their use of computer screens and keyboards. However, even though their eyes and hands are occupied, these people may still be able to use their ears, and to employ a method whereby speech synthesis browsers or oral reproduction software are used to read aloud from Web pages, from e-mail messages or from other text matter, and thereby enable information to be accessed simply by listening. As an example method of this sort, the news article download software “NewsTool” includes functions that can obtain a list of articles carried by individual news company sites, and can then continuously download and read the contents of the articles aloud.
It should be noted that with VoiceXML, a standardized, structured speech processing language based on XML (short for Extensible Markup Language), the speech process can be easily performed and an automatic speech input/output system can be constructed that uses a telephone, for example.
When information content, such as an e-mail, is composed mainly of text data, the data can be quite satisfactorily accessed using a speech synthesis browser or oral reproduction software. However, when a Web page is to be read, a variety of speech synthesization problems are encountered.
Web pages are generally written using a structural language, such as HTML (short for HyperText Markup Language) or XML, and the content has a complicated structure for which elements and frames are used. Further, a Web page includes a wide content assortment, including image data, which are inappropriate for oral reading, and generally only a small amount of text data is included with the elements. Also, titles and text content are more widely separated in accordance with correlations established using links. Since the design of a web page is such that the contents of individual elements are sorted in detail using a complicated structure and are presented visually, merely by orally reading such a web page using a conventional speech browser, it is difficult to provide an adequate amount of information that would permit a person to understand the content, if the person is listening inattentively. In addition, navigation based on the elements and the structure used is required to reproduce information so it is coherent and correlated. However, in most cases, while people are working their hands are occupied, and the performance of complex navigational operations using keywords is not appropriate for workers who are not listening closely. Furthermore, when frames are used to divide pages, errors occurs when frames are switched. Even when NewsTools is employed, the smooth oral reading of articles is possible only for specific sites; a critical limitation of the application is that formatting for targeted sites, pages and articles must be incorporated in advance, and the setup can not be changed spontaneously.
It is said that with the advent of widespread Internet use, a digital divide that has long existed for users unfamiliar with computer operations, and for the physically impaired and for elderly people who tend to know little about information technology has been increased. The resolving of this problem is considered to be a very important social objective. Thus, if an information service can be provided that can be accessed merely by using a telephone, a device that almost everybody can easily employ, it is expected that the restraints imposed by the digital divide will be removed. At this time, the use of the Web is still preferable when the volume and the topicality of information are taken into consideration. However, problems encountered at the time a structured document is read orally appear when a telephone is used to access the Web to obtain information, and a technique must be provided that permits of the numerical keys of a telephone to be used for navigation. If these problems can be resolved, it is anticipated that the information service provided by the current telephone response system, which handles only a small amount of information and updates that information less frequently, can be dramatically expanded.