1. Field of the Invention
This invention relates generally to devices for browsing information on an information network. More specifically, this invention relates to an apparatus and system for receiving personalized information from an information network in audio format using distributed text-to-speech processing.
2. Description of the Related Art
A number of different information networks are available that allow access to information contained on their computers, with the Internet being one that is generally known to the public. The capabilities, usefulness, and amount of information available from information networks are ever-increasing. Further, users often subscribe to one or more information services that are accessible via an information network. Currently, a user must browse the information network for information that is of interest to them. Oftentimes, a user must interrupt their use of an application program, such as spreadsheets or word processing programs, to browse the information network. Even messages sent from information networks to users via e-mail or instant messaging facilities require the user to take specific action to learn the content of the messages. Additionally, while some subscription services and portal services allow a user to customize the format and, to a certain extent, the content, of the information provided, a user must still manually navigate to the various sources of information to see if there is anything of interest to them. Still further, a user often has to sift through a lot of information that is of no interest to them, thereby consuming more time than necessary. Another drawback to current capabilities is that the user typically is not informed immediately when information of interest becomes available, but rather, must enter commands to browse the information sources, and therefore may not receive information of interest as soon as it is available.
In the prior art, systems are available to provide information requested from an information network in aural format, however, these systems require interaction with the user and do not provide the information that the user has indicated an interest in automatically as the information becomes available.
It is therefore desirable to provide users with the ability to prescreen information from various, selected sources, to reduce the amount of time required to find items of interest to the user.
It is also desirable to provide users with relevant information as soon as possible after the news becomes available.
It is also desirable to provide a summary of news items of interest to the user, and to allow the user to access more in-depth information regarding a particular summary.
It is further desirable to receive the information aurally, thereby allowing the user to receive information of interest without being required to interrupt their activity to manipulate or view the information.
There are several known methods for converting information from text format to audio format for output to an audio output device such as an audio speaker system. The information is typically in conventional orthography and the output is synthetic speech. The input is provided in the form of a digital signal which represents the characters of conventional orthography. The primary output is also a digital signal representing an acoustic waveform corresponding to the synthetic speech. Digital-to-analog conversion is a well known technique for producing analog signals which can drive audio speakers. The signal may have any convenient implementation, e.g. electrical, magnetic, electromagnetic or optical.
Speech converters usually include two major sub-units namely an analyzer and a synthesizer. The analyzer divides the original input signal into small textual elements. The synthesizer converts each of these small elements into a short segment of digital waveform and it also joins these together to produce the output.
It will be appreciated that the linguistic analysis of a sentence is exceedingly complicated since it involves many different linguistic tasks, and a wide variety of linguistic processors are commercially available, each of which is capable of doing at least one of the tasks. Further, different portions of the linguistic analysis can be distributed among at least two different data processors.
One category of linguistic processors is designated as “converters” in that they change the nature of the symbols utilized. For example a “converter” alters a signal representing a word or other linguistic element in graphemes into a signal representing the same element in phomenes using a grapheme to phoneme dictionary. This dictionary requires a large amount of storage space, and it is therefore preferable to store and maintain one dictionary in a central location, such as a network server, so that it may be accessed by several users, instead of storing and maintaining separate copies of the dictionary on each user's workstation. The benefits of maintaining large resources on servers arc both ease of maintenance and reduced client system resource requirements. Further, converting the phonemes to an audio signal generates a large amount of data, and transferring the data in audio format requires a large amount of bandwidth.
The invention disclosed in U.S. patent application Ser. No. 09/409,000, filed Sep. 29, 1999, entitled “System and Apparatus For Dynamically Generating Audible Notices From An Information Network” discloses a text-to-speech (TTS) engine that resides either in a client-side processor, in a server-side processor, or which is distributed among data processors in the system. TTS processing functions are computationally intensive and some tasks require a large amount of storage space and bandwidth for data transfer. Therefore, it is further desirable to distribute the TTS engine between at least two data processors in a manner which optimizes processing time, data transfer, and storage space efficiency.
In addition to grapheme to phoneme TTS converters, there are other TTS engines that use different algorithms for transforming text data to audio data. Typically, these other TTS engines also involve converting text data to an intermediate format that requires less storage than the data in audio format. Therefore, it is also desirable to distribute other types of TTS engines between at least two data processors in a manner which optimizes processing time, data transfer, and storage space efficiency.