In media streaming applications, a data stream is composed of a sequence of data segments to be delivered to a receiver at a prescribed data rate (or data rate profile in case of variable-bit-rate encoded media data) so that the receiver may begin playback of the received data segments while concurrently receiving subsequent data segments. As long as data segments may be delivered to the receiver before their prescribed playback time, the receiver may sustain continuous media playback without any interruptions.
The delivery of media data is carried out through a transport protocol. There are transport protocols that are specifically designed for streaming media such as RTSP/RTP. However, an increasing number of media contents are delivered using standard HTTP protocols nowadays, which in turn utilize TCP to transport the data from the server to the client. This creates a new problem as neither HTTP nor TCP were originally designed for media streaming applications.
Specifically, TCP has a built-in congestion-control mechanism which performs two tasks. First, it incrementally increases the transmission rate to probe for the bandwidth available in the path from the sender to the receiver. Second, it detects the network congestion by monitoring packet losses so that the transmission rate, controlled by the congestion window, may be reduced to alleviate the network congestion. In a typical TCP flow, the transmission rate will be increased incrementally until it exceeds the network's bandwidth limit, which then results in packet losses, and thus triggering the congestion control mechanism to cut down the transmission rate.
However, this congestion control mechanism will generate unnecessary packet losses in cases where (a) the media stream has known bandwidth requirements; and (b) the network has sufficient bandwidth to satisfy the media stream's bandwidth requirements. As an illustration, assume a media stream is encoded at a data rate of 200 kbps and the network has 500 kbps bandwidth available, which is more than sufficient for the media stream, if the media stream is delivered using a server such as a web server, the web server will attempt to send the media stream data to the receiver using the HTTP protocol, which in turn utilizes TCP for the actual data delivery.
At the beginning of the session, the server will only send at a low data rate, but the transmission rate will incrementally increase accordingly as TCP grows its congestion window. Eventually the transmission rate will exceed the network's 500 kbps bandwidth limit and result in packet losses. Unaware to TCP the congestion is in fact self-induced. This triggers TCP's congestion control mechanism into cutting down the transmission rate dramatically to cope with the network congestion. It will take some time before the transmission rate may ramp up again, and thus the overall throughput achieved will be substantially lower than the network bandwidth (500 kbps) and in some cases, even lower than the media stream data rate (200 kbps). In the latter case, it will result in playback interruptions.
FIG. 1 illustrates this problem by plotting the TCP throughput versus time for streaming a 250 kbps media using HTTP over a 3G mobile network. Note that the transmission rate consistently increases to around 500 kbps as TCP will continue to probe for additional bandwidth. The deep valleys in FIG. 2 are the self-induced congestions which occurred repeatedly over the entire streaming duration. In this experiment, although the network may sustain up to 500 kbps, which is double the media stream's data rate at 250 kbps, the overall average throughput achieved was in fact less than 250 kbps. So, the playback will be paused repeatedly over the entire streaming duration, leading to very poor quality of service.