The vast majority of text content is unlikely to be updated to contain graphics and sound despite the increasing focus on multimedia capabilities of mobile devices. ‘Archived’ formats such as books and newspapers as well as message formats such as SMS and email will remain popular in their present form for a very long time. There is currently an opening for a technology that can add the appeal of multimedia to the text format which is not very exciting on its own.
The most obvious solution to this problem is to store and/or transmit the added multimedia content with the original text content. However, this increases the amount of data by at least an order of magnitude since the text format is much more compact than graphics and sound. U.S. Pat. No. 7,103,548 disclosed a system for converting a text message into audio form, the text message having embedded emotion indicators and feature-type indications, the latter of which serve to determine which of multiple audio-form presentation feature types are to be used to express, in the audio form of the text message, the emotions indicated by said emotion indicators. And currently MSN Messenger allows for the sender to write tags in a text that is then translated into a picture at the receiving end. However, preparing the content in advance eliminates the possibility of a context-dependent ‘surprise effect’. Furthermore, if a certain ambient soundscape, say, rain and wind, is added to the speech and played back through a single loudspeaker in a conventional mobile device it will sound like disturbing background noise and reduce intelligibility.
There are several formats suitable for storing and presenting multimedia content. The best known is SMIL (Synchronized Multimedia Integration Language). For material aimed at publication on the World Wide Web, ACSS (Audio Cascaded Style Sheets) can be used to define some properties of the sound. In combination with SSML (Speech Synthesis Markup Language, recommended by W3) it is possible to do some basic real-time rendering of sound and speech.
Accordingly, there is no markup language or corresponding software architecture suitable for performing real-time sound synthesis and rendering of sound effect, particularly stereo or 3D sound in text-based applications.