A text-to-speech (TTS) synthesis is beneficial to people with learning disabilities, people who are illiterate, people with visual impairment, and people wishing to multitask. Despite the far-reaching advantages, present TTS synthesis has a great disadvantage. The present TTS synthesis lacks humane touch. This may be because the present TTS synthesis outputs mechanical or automated voice. The automated voice which may be clearly distinguishable from human voice may be annoying to the ears. Another disadvantage of the present TTS synthesis is that, it does not attempt to recreate emotions behind textual messages, for instance emotions behind emoticons are lost in the present TTS synthesis.
Present attempts at creating a natural voice synthesizer fails because they do not take into account emotional context in which these textual messages are sent. The context with which the textual messages are sent, may be gathered from the emotions associated with previous exchange of messages. For instance, emotion associated with a message “Mary is gone” may be understood only after analyzing the emotion associated with the previous message “Mary is in the hospital”. Hence, there is a need for an improved system and method for automatically reading out messages of a sender in the sender's natural voice with the appropriate emotion.