1. Field of the Invention
This invention relates generally to devices for browsing information on an information network. More specifically, this invention relates to an apparatus and system for receiving personalized information from an information network in audio format using distributed text-to-speech processing.
2. Description of the Related Art
A number of different information networks are available that allow access to information contained on their computers, with the Internet being one that is generally known to the public. The capabilities, usefulness, and amount of information available from information networks are ever-increasing. Further, users often subscribe to one or more information services that are accessible via an information network. Currently, a user must browse the information network for information that is of interest to them. Oftentimes, a user must interrupt their use of an application program, such as spreadsheets or word processing programs, to browse the information network. Even messages sent from information networks to users via e-mail or instant messaging facilities require the user to take specific action to learn the content of the messages. Additionally, while some subscription services and portal services allow a user to customize the format and, to a certain extent, the content, of the information provided, a user must still manually navigate to the various sources of information to see if there is anything of interest to them. Still further, a user often has to sift through a lot of information that is of no interest to them, thereby consuming more time than necessary. Another drawback to current capabilities is that the user typically is not informed immediately when information of interest becomes available, but rather, must enter commands to browse the information sources, and therefore may not receive information of interest as soon as it is available.
In the prior art, systems are available to provide information requested from an information network in aural format, however, these systems require interaction with the user and do not provide the information that the user has indicated an interest in automatically as the information becomes available.
It is therefore desirable to provide users with the ability to prescreen information from various, selected sources, to reduce the amount of time required to find items of interest to the user.
It is also desirable to provide users with relevant information as soon as possible after the news becomes available.
It is also desirable to provide a summary of news items of interest to the user, and to allow the user to access more in-depth information regarding a particular summary.
It is further desirable to receive the information aurally, thereby allowing the user to receive information of interest without being required to interrupt their activity to manipulate or view the information.
There are several known methods for converting information from text format to audio format for output to an audio output device such as an audio speaker system. The information is typically in conventional orthography and the output is synthetic speech. The input is provided in the form of a digital signal which represents the characters of conventional orthography. The primary output is also a digital signal representing an acoustic waveform corresponding to the synthetic speech. Digital-to-analog conversion is a well known technique for producing analog signals which can drive audio speakers. The signal may have any convenient implementation, e.g. electrical, magnetic, electromagnetic or optical.
Speech converters usually include two major sub-units namely an analyzer and a synthesizer. The analyzer divides the original input signal into small textual elements. The synthesizer converts each of these small elements into a short segment of digital waveform and it also joins these together to produce the output.
It will be appreciated that the linguistic analysis of a sentence is exceedingly complicated since it involves many different linguistic tasks, and a wide variety of linguistic processors are commercially available, each of which is capable of doing at least one of the tasks. Further, different portions of the linguistic analysis can be distributed among at least two different data processors.
One category of linguistic processors is designated as xe2x80x9cconvertersxe2x80x9d in that they change the nature of the symbols utilized. For example a xe2x80x9cconverterxe2x80x9d alters a signal representing a word or other linguistic element in graphemes into a signal representing the same element in phomenes using a grapheme to phoneme dictionary. This dictionary requires a large amount of storage space, and it is therefore preferable to store and maintain one dictionary in a central location, such as a network server, so that it may be accessed by several users, instead of storing and maintaining separate copies of the dictionary on each user""s workstation. The benefits of maintaining large resources on servers are both ease of maintenance and reduced client system resource requirements. Further, converting the phonemes to an audio signal generates a large amount of data, and transferring the data in audio format requires a large amount of bandwidth.
The invention disclosed in U.S. patent application Ser. No. 09/409,000, filed Sep. 29, 1999, entitled xe2x80x9cSystem and Apparatus For Dynamically Generating Audible Notices From An Information Networkxe2x80x9d discloses a text-to-speech (TTS) engine that resides either in a client-side processor, in a server-side processor, or which is distributed among data processors in the system. TTS processing functions are computationally intensive and some tasks require a large amount of storage space and bandwidth for data transfer. Therefore, it is further desirable to distribute the TTS engine between at least two data processors in a manner which optimizes processing time, data transfer, and storage space efficiency.
In addition to grapheme to phoneme TTS converters, there are other TTS engines that use different algorithms for transforming text data to audio data. Typically, these other TTS engines also involve converting text data to an intermediate format that requires less storage than the data in audio format. Therefore, it is also desirable to distribute other types of TTS engines between at least two data processors in a manner which optimizes processing time, data transfer, and storage space efficiency.
In one embodiment, the present invention provides a system for converting information from a text format to an audio format, wherein the text to speech conversion is distributed among two or more data processors. One data processor executes a first set of program instructions to receive information in text format from a data source, to convert the information from the text format to an intermediate format, such as phonemes, and to transmit the information in the intermediate format to the second data processor. The second data processor executes a second set of program instructions to convert the information from the intermediate format to the audio format. In one embodiment, the first data processor, such as a network server, includes one or more databases to aid TTS synthesis, such as one or more grapheme to phoneme dictionaries, that are accessible by multiple users. The second data processor is a client side data processor, such as a client workstation.
In another embodiment, the present invention provides a computer program product for dynamically generating audible notices from an information network using distributed text to speech processing. The information network includes a client processor and a remote processor, such as a network server. The computer program product includes a first set of program instructions that are executed on the remote processor that generate an intermediate representation of the information, such as a phonemic representation. The computer program product further includes a second set of program instructions that are executed on the client side processor that allow a user to preselect at least one data source that is accessible from the information network, to receive information from the at least one preselected data source, and to convert the information from a text format to an audio format based on the intermediate representation of the information.
In one embodiment, the first set of program instructions utilize a dictionary for translating graphemes to phonemes that is stored in a location that is accessible by the first set of program instructions.
In another embodiment, the present invention provides a method for dynamically generating audible notices from an information network which includes preselecting at least one data source from the information network, receiving information from the at least one preselected data source, converting the information from a text format to an intermediate format in a remote processor, converting the information from the intermediate format to an audio format in a client processor, and transmitting audio signals representative of the information in audio format. In one embodiment, the text is converted into an intermediate phonemic representation using a dictionary for translating graphemes to phonemes. The dictionary is stored in a location that is accessible by the remote processor. The phonemes are converted to audio output signals in the client processor.
Each embodiment of the present invention distributes the text to speech processing so that multiple users can take advantage of resources requiring a large amount of storage space from a remote, centralized processor, such as a network server. Intermediate processing of the information is performed at the remote processor to take advantage of the centralized resources, thus reducing the amount of data transfer from the remote processor to the client processor. The information, in intermediate format, is then transferred to the client processor, where it is converted to audio output signals. This feature also advantageously reduces data transfer requirements, since audio output format typically requires a large amount of data storage compared to the intermediate format.
The foregoing has outlined rather broadly the objects, features, and technical advantages of the present invention so that the detailed description of the invention that follows may be better understood.