Text-to-speech technology (TTS) enables a computer read text aloud via a synthesized voice. Computer-generated voices of the existing art can sound very unnatural and monotonic, even when the words of the text are properly annunciated, because natural human spoken speech varies the inflection and even the pronunciation of words due to a large number of variables at the sentence, paragraph, and context levels. Still further, education level, regional dialect, and mood provide further intonation and expressive characteristics of spoken speech which computer-generated speech does not faithfully reproduce. Thus, computer-generated voices, while accurately conveying the written words of a digital text, often sound cold and lifeless, and do not convey much of the rest of the meaning of the digital text.
With respect to literature, these shortcomings prevent TTS technology from being a viable option for performing literature aloud to a human audience. Unlike short mechanical phrases, literature consists of varying mood, character profiles, ebb & flow, and many nuances and other contextual information that must be conveyed in order to effectively capture the tone of a plot. While TTS technology has improved in recent years, the present inventors have recognized that today's TTS systems are largely agnostic to these plot elements and character attributes, limiting its capacity to portray a piece of literature accurately.
Recent advancements in TTS technology consider characteristics of the target listener to customize the playback for the listener. For example, technology currently exists for synthesizing a voice in the style of listener's voice, dialect, gender, etc., based on parameters supplied to the TTS engine. However, this is not helpful for effectively capturing the mood or characters within a piece of literature because the reader/listener is not one of the characters in the literature.
For these reasons, the inventors have recognized that there is a need in the art for an improved system of verbalization by a computer of a digital work of literature which accurately portrays the voice intonation, accent, dialect, education level, and mood of each character, and optionally a narrator, in the work of literature.