1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with methods, systems, computer program products, and methods of doing business by adapting audio renderings of non-audio messages (for example, textual e-mail messages that are processed by a text-to-speech translator) to reflect various nuances of the non-audio information.
2. Description of the Related Art
Face-to-face communication between people involves many parallel communication paths. We derive information from body language, from words, from intonation, from facial expressions, from the distance between our bodies, and so forth. Distance communication, such as phone calls, e-mail exchange, and voice mail, on the other hand, involves only a few of these communication paths. Users may therefore have to take extra actions (which may or may not be successful) if they wish to try to overcome the limitations so imposed.
Distance communicating is becoming more prevalent in our society. Voice mail systems became widely used in years past, and in more recent years electronic mail systems have become common, with the popularity and pervasiveness of e-mail continuing to grow. When communicating by e-mail, message creators often try to overcome the limitations of distance communications by techniques such as using different font sizes, colors, emoticons (i.e. combinations of text symbols which bear a resemblance to facial expressions), and so forth to express non-text information. This non-text information includes emphasis, emotion, irony, etc.
Emotions may be particularly difficult to convey when using distance communication. For example, if a person is angry, it can be quite difficult to communicate that emotion in the words of an e-mail message. While a voice mail message has the advantage of conveying the speaker's (i.e. the message creator's) tone of voice, it still may not adequately represent the speaker's emotion. As another example of the difficulties of distance communication, suppose a message creator has many different topics to cover. When communicating in person, the speaker can use changes in body language to indicate a change in subject. In a voice mail message, however, it may be difficult for the listener to appreciate when one topic has ended and another has begun. In an e-mail message, the message creator may perhaps change paragraphs when the topic changes, and may use bolding and italics to give further visual clues about the number and importance of topics as well as other semantic and contextual meaning. In this case, viewing an e-mail may provide important information about the topic layout by giving the viewer a “broadside” visual overview.
A typical person using distance communications may receive a number of voice mail messages in her voice mailbox throughout the course of a day, and perhaps facsimile transmissions as well, in addition to receiving e-mail messages in an e-mail inbox. To enable people to deal with multiple sources of distance communication more effectively and efficiently, unified messaging systems have been developed. A unified messaging system provides a single interface into multiple message types, and consolidates e-mail, voice mail, and fax messages into a single mailbox so that the recipient has a common place to access her incoming messages (using either a telephone to listen to the messages, or a software application on a computer to either see a textual message display or to listen to an audio version of messages). However, unified messaging systems and network convergence may exacerbate the problems of distance communications by adding the difficulties of media transformation to the communications.
One problem with existing systems is that when e-mail is transformed via an audio read out, as is done when a unified messaging system is accessed from a telephone, much of the contextual information that the message creator attempted to convey using changes in fonts and color, emoticons, and so forth, can be lost. The loss of the context of messages may result in a loss of understanding of the topic or perhaps a loss of the underlying meaning of the message (or both). The format of the e-mail message (e.g. paragraphs, lists, and so forth) also contributes to the overall understanding of the message, as stated earlier, and the inability of a listener to perceive this formatting information can lead to a loss in meaning and understanding.
In addition to the loss of context, another problem of existing systems is that message transformations such as text-to-speech translations performed on e-mail messages are sometimes inaccurate. For example, in the sentence “They read the words aloud.”, is the sentence intended to reflect the present tense, such that the pronunciation of “read” is “reed”? Or is it meant to be past tense, such that the correct pronunciation is “red”? When the recipient listens to the translated message, she may not be aware of which parts of the translation are accurate and which are not. The recipient must therefore either trust that the translated information is 100% accurate, or assume that part or none of it is accurate. In either case, a loss in communications may occur.
Loss of context and inaccurate translations may both result in wasted time and effort, and therefore decreased efficiency, for message recipients. For example, the recipient may have to spend additional time attempting to discern whether a translated message is accurate, and what the correct message was meant to be if the translation is inaccurate; similarly, he may need to spend time investigating the true underlying message if important contextual information is lost during a text-to-speech translation. Furthermore, when a message has been distorted because of lost context and/or inaccurate translation, it may be difficult to tell that a problem has occurred. If the message recipient relies on the message content without realizing that a distortion has occurred, adverse consequences may result.
Accordingly, what is needed is a technique that alleviates these problems in distance communications, providing a more accurate and more productive way for people to communicate using audio renderings of non-audio messages (such as the audio messages that result when textual messages are processed by text-to-speech translation systems).