Content may be provided to a user over a network, such as the Internet, and the content may be presented to the user in one or more of a variety of different modalities. Modalities may include, for example, presenting content in a visual format on a computer screen, or presenting content in an audio format over a pair of speakers. Further, the same content may be presented to a user in multiple modalities concurrently, such as, for example, displaying text on a screen and speaking the text over a pair of speakers. This example of multi-modality output may be achieved, for example, by providing both a Hyper-Text Markup Language (HTML) source document and a Voice Extensible Markup Language (VXML) source document for the content and synchronizing the two documents during presentation to the user. Input may also be received from the user in one or more of multiple modalities.