1. Field of the Invention
The invention relates generally to communication over a network.
2. Background Art
Audio has long been carried in telephone calls over networks. Traditional circuit-switched time division multiplexing (TDM) networks including public-switched telephone networks (PSTN) and plain old telephone networks (POTS) were used. These circuit-switched networks establish a circuit across the network for each call. Audio is carried in analog and/or digital form across the circuit in real-time.
The emergence of packet-switched networks, such as the local area networks (LANs), and the Internet, now requires that audio and video be carried digitally in packets. Audio can include but is not limited to voice, music, or other types of audio data. Voice over the Internet systems (also called Voice over IP or VOIP systems) transport the digital audio data belonging to a telephone call in packets over packet-switched networks instead of traditional circuit-switched networks. In one example, a VOIP system forms two or more connections using Transmission Control Protocol/Internet Protocol (TCP/IP) addresses to accomplish a connected telephone call. Devices that connect to a VOIP network must follow standard TCP/IP packet protocols in order to interoperate with other devices within the VOIP network. Examples of such devices are integrated access devices, media gateways, and media servers.
A media server is often an endpoint in a VOIP telephone call. The media server is responsible for ingress and egress audio streams, that is, audio streams which enter and leave a media server respectively. The type of audio produced by a media server is controlled by the application that corresponds to the telephone call such as voice mail, conference bridge, interactive voice response (IVR), etc. In many applications, the produced audio is not predictable and must vary based on end user responses. Words and sentences must be assembled dynamically in real time as they are played out in audio streams.
Packet-switched networks, however, can impart delay and jitter in a stream of audio carried in a telephone call. A real-time transport protocol (RTP) is often used to control delays, packet loss and latency in an audio stream played out of a media server. The audio stream can be played out using RTP over a network link to a real-time device (such as a telephone) or a non-real-time device (such as an email client in unified messaging). RTP operates on top of a protocol such as the User Datagram Protocol (UDP) which is part of the IP family. RTP packets include among other things a sequence number and a timestamp. The sequence number allows a destination application using RTP to detect the occurrence of lost packets and to ensure a correct order of packets are presented to a user. The timestamp corresponds to the time at which the packet was assembled. The timestamp allows a destination application to ensure synchronized play-out to a destination user and to calculate delay and jitter. See, D. Collins, Carrier Grade Voice over IP, Mc-Graw Hill: United States, Copyright 2001, pp. 52–72, the entire book of which is incorporated in its entirety herein by reference.
Along with the development of VOIP systems, a separate development of World Wide Web technology has occurred. Web servers are used to deliver a rich variety of content including audio content (referred to herein as “web audio content”). Web servers originally provided all types of web content to computing devices such as personal computers. A personal computer must have an appropriate browser, plug-in, and media player for a user to view the web content. For example, to view (i.e. hear) web audio content such as a .wav file, a media player such as Real Player, Quicktime or Windows Media needs to be installed on the personal computer. Telephones which cannot handle such media players have not been able to view web audio content.
One approach to delivering web audio content is to store audio data at a media server. This audio data can then be delivered in real-time to a telephone. Such an approach is very limited as the media server is required to prestore large amounts of data from which a telephone user can select. This is impractical and expensive as the number of users and quantity of web audio content desired to be heard increases.
What is needed is a system and method for allowing web content to be delivered to any telephone without requiring a media server to store large amounts of web content.