A text-to-speech engine converts text into synthesized audio output. The original source of the text may be in a variety of types such as plain text, e-mail, HTML, XML, rich text format, portable document format, PS, GPS coordinates, RSS, SMS, MMS, video, and a multi-media link, for example. Digital media may contain text data along with other data relating to context, formatting, visual presentation, and layout, for example. Text may contain punctuation, acronyms, abbreviations, ambiguities, short forms, informalities, different languages, symbols, grammatical errors, formatting errors. These factors introduce challenges and complications for text-to-speech engines.