A text-to-audio system can convert input text into an output acoustic signal imitating natural speech. Text-to-audio systems are useful in a wide variety of applications. For example, text-to-audio systems are useful for automated information services, auto-attendants, computer-based instruction, computer systems for the visually impaired, and digital readers.
Some simple text-to-audio systems operate on pure text input and produce corresponding speech output with little or no processing or analysis of the received text. Other more complex text-to-audio systems process received text inputs to determine various semantic and syntactic attributes of the text that influence the pronunciation of the text. In addition, other complex text-to-audio systems process received text inputs with annotations. Annotated text inputs specify pronunciation information used by the text-to-audio system to produce more fluent and human-like speech.
Some text-to-audio systems convert text into high quality, natural sounding speech in near real time. However, producing high quality speech requires a large number of potential acoustic units, complex rules, and exceptions for combining the units. Thus, such systems typically require a large storage capacity and high computational power and typically consume high amounts of power.
Oftentimes, a text-to-audio system will receive the same text input multiple times. Such systems fully process each received text input, converting that text into a speech output. Thus, each received text input is processed to construct a corresponding spoken output, without regard for having previously converted the same text input to speech, and without regard for how often identical text inputs are received by the text-to-audio system.
For example, in the case of digital readers, a single text-to-audio system may receive text input the first time a user listens to a book, and again when the user decides to listen to the book another time. Furthermore, in the case of multiple users, a single book may be converted thousands of times by many different digital readers. Such redundant processing can be energy inefficient, consume processing resources, and waste time.