Field of the Invention
The invention pertains to text-to-speech processing and, more particularly, to enhanced text-to-speech processing for improved document review.
Description of the Related Art
For various reasons, documents have been converted to speech (spoken text) using conventional text-to-speech processing. A user desiring to review a document can then listen to the resulting speech instead of having to read through the document. For users with impaired vision, listening to the resulting speech for a document is particularly important. Regardless of the reasons for listening to speech associated with a document, conventional text-to-speech processing is often not able to impart to the user (listener) contextual information about the text that is being spoken. Further, in recent years, documents have become more complex and more diversified. As a result, today's documents can have many different formats and contain various different document elements, including links, images, headings, tables, captions, footnotes, etc., which makes text-to-speech processing more challenging. Thus, there is a need to provide improved text-to-speech processing that can present contextual information to listeners.
For users desiring to listen to documents while on-the-go, text-to-speech processing can generate audio output that can be listened to while on-the-go. However, text-to-speech processing is processor-intensive, making it impractical for many portable devices that have limited processing power. Hence, there is also a need to manage creation, delivery and consumption of audio outputs that provide speech associated with documents.