1. Technical Field
The invention is related to controlling the coding bit rate of streaming media, and more particularly to a system and process for controlling the coding bit rate of streaming media data that provides fast startup, continuous playback, and maximal quality and smoothness over the entire streaming session.
2. Background Art
Perhaps the major technical problem in streaming media on demand over the Internet is the need to adapt to changing network conditions. As competing communication processes begin and end, the available bandwidth, packet loss and packet delay all fluctuate. Network outages lasting many seconds can and do occur. Resource reservation and quality of service support can help, but even they cannot guarantee that network resources will be stable. If the network path contains a wireless link, for example, its capacity may be occasionally reduced by interference. Thus it is necessary for commercial-grade streaming media systems to be robust to hostile network conditions. Moreover, such robustness cannot be achieved solely by aggressive (nonreactive) transmission. Even constant bit rate transmission with re-transmissions for every packet loss cannot achieve a throughput higher than the channel capacity. Some degree of adaptivity to the network is therefore required.
End users expect that a good streaming media system will exhibit the following behavior: content played back on demand will start with low delay; once started, it will play back continuously (without stalling) unless interrupted by the user; and it will play back with the highest possible quality given the average communication bandwidth available. To meet these expectations in the face of changing network conditions, buffering of the content at the client before decoding and playback is required.
Buffering at the client serves several distinct but simultaneous purposes. First, it allows the client to compensate for short-term variations in packet transmission delay (i.e., “jitter”). Second, it gives the client time to perform packet loss recovery if needed. Third, it allows the client to continue playing back the content during lapses in network bandwidth. And finally, it allows the content to be coded with variable bit rate, which can dramatically improve overall quality. Note that even so-called constant bit rate (CBR) coded content is actually coded with variable bit rate within the constraints of a decoding buffer of a given size. The larger the decoding buffer size, the better the quality. The required decoding buffering is part of the larger client buffer.
The size of the client buffer can be expressed as the number of seconds of content in the buffer, called the buffer duration. The buffer duration tends to increase as content enters the buffer and tends to decrease as content leaves the buffer. Content leaves the buffer when it is played out, at a rate of v seconds of content per second of real time, where v is the playback speed (typically 1 for normal playback, but possibly more than 1 for high speed playback or less than 1 for low speed playback). Content enters the buffer when it arrives at the client over the network, at a rate of ra/rc seconds of content per second of real time, where ra is the arrival rate, or average number of bits that arrive at the client per second of real time, and rc is the coding bit rate, or the average number of bits needed to encode one second of content. Thus the buffer duration can be increased by increasing ra, decreasing rc, and/or decreasing v (and vice versa for decreasing the buffer duration). Although the buffer duration can be momentarily controlled by changing ra or changing v, these quantities are generally not possible to control freely for long periods of time. The arrival rate ra on average is determined by the network capacity, while the playback speed v on average is determined by user preference. Thus if the network capacity drops dramatically for a sustained period, reducing the coding bit rate rc is the only appropriate way to prevent a rebuffering event in which playback stops (v=0) while the buffer refills.
Thus, adaptivity to changing network conditions requires not only a buffer, but also some means to adjust the coding bit rate rc of the content. This can be done by stream switching in combination with multi bit rate (MBR) coding or coarse grained or fine grained scalable coding. Today's commercial streaming media systems [1] rely on MBR coding as well as thinning, which is a form of coarse grained scalability. In MBR coding, semantically identical content is encoded into alternative bit streams at different coding bit rates and stored in the same media file at the server, allowing the content to be streamed at different levels of quality corresponding to the coding bit rates rc, possibly using bit stream switching [2]. In coarse grained scalable coding (such as MPEG-2/4 temporal or SNR scalability) the content is encoded into several sub-streams or layers, so that the coding bit rate rc can be changed in large deltas by adding or dropping (at possibly restricted times) one layer of content at a time. Thinning is a special case of coarse grained scalability in which dependent video frames (P and B frames) are dropped before independent video frames (I frames), which are in turn are dropped before audio frames. Future commercial systems may support fine grained scalability (FGS) as well. Fine grained scalable coding (such as 3D SPIHT [6], MPEG-4 FGS [7], or EAC [8]) allows the coding bit rate rc to change at any time in deltas sometimes as small as one byte per presentation. FGS coding offers great flexibility in adapting to variable network conditions, and can demonstrably improve quality under such conditions.
Some examples of existing technology that adjusts the coding bit rate rc of the content in an attempt to adapt to changing network conditions includes de Cuetos and Ross [9], which decouples the transmission rate and the coding bit rate. They assume that the transmission rate is determined by the network transport protocol (TCP or TFRC). Based on this, they develop a heuristic real time algorithm for adaptive coding bit rate control and compare its performance to an optimal offline coding bit rate control policy if the transmission rate is given prior to streaming. The work of Rejaie, Handley and Estrin [4] proposes a scheme for transmitting layered video in the context of unicast congestion control, which basically includes two mechanisms. One mechanism is a coarse-grained mechanism for adding and dropping layers (changing the overall coding bit rate and quality). The other is a fine-grained interlayer bandwidth allocation mechanism to manage the receiver buffer (not changing the overall coding bit rate or quality). A potential issue with this approach is that it changes the coding bit rate by adding or dropping one (presumably coarse) layer at a time. If the layers are fine-grained, as in the case of FGS coded media, then adding or dropping one (fine-grained) layer at a time typically cannot provide a prompt enough change in coding bit rate. Moreover, since the adding and dropping mechanism is rather empirical, the mechanism may simply not be suitable for FGS media. The work of Q. Zhang, Zhu and Y-Q. Zhang [5] proposes a resource allocation scheme to adapt the coding bit rate to estimated network bandwidth. The novelty of their approach is that they consider minimizing the distortion (or equivalently maximizing the quality) of all applications, such as file-transfers and web browsing in addition to audio/video streaming. However, their optimization process does not include the smoothness of individual streams and might lead to potential quality fluctuations.
However, even with buffering and the ability to adjust the coding bit rate, existing technologies for streaming media on demand over the Internet suffer from two problems:                1. Playback often stalls during network congestion. That is, during playback of high bit rate content, if the network bit rate drops below the content bit rate, the client buffer runs out of content and playback stops while the client rebuffers (known as a “rebuffering” event).        2. Start-up delay is often too long (about 5 seconds).There are existing solutions to both of these problems, but they do not always work well. One solution to the first problem is to stream the content encoded at a coding bit rate that is low relative to the average bit rate transmitted over the network (the transmission bit rate). This will enable the buffer to build up over time. With such a large reserve of unplayed information on the client, temporary network congestion will not affect playback. However, this solution has two problems. First, the coding bit rate of the content is not as high as the average transmission bit rate of the network and hence the quality is lower than it could be. Second, the buffer can grow nearly as large as the streamed file itself. This may demand too many resources on the client device.        
Another solution to the first problem is to try to maintain the client buffer at a constant level (typically about 10 seconds), while switching between different coding bit rates for the same content, trying to match the transmission bit rate of the network. However, rebuffering events are still commonly observed in practice, because choosing the right time to switch streams is difficult. One reason that it is difficult is that there are natural variations in the instantaneous coding bit rate of the content, even in so-called constant bit rate encodings, which can confuse the client buffer management algorithm.
The second problem above (long start-up delay) also has multiple solutions. One solution is to fill up the client buffer quickly, with a quick initial transmission rate burst. With the client buffer full, playback can safely begin. However, this solution has several problems. First, it is only applicable when there is sufficient “headroom” in the network to increase the transmission bit rate for a few seconds. Thus it is usually not applicable for modem connections, for example. Second, it stresses the network, causing other applications in the network to back off. It has been shown that during the burst period, there can be as much as 80% packet loss, causing all TCP connections sharing the same bottleneck to back off. Third, by implication, if there is headroom in the network for bursting, then the streaming application may not be using the full bandwidth available to it during the remainder of the file, meaning that quality is lower than it should be.
Another solution to the second problem is to play back the content slower than real time, allowing playback to begin while the client buffer builds up. This is an innovative solution, but has the obvious temporal distortion.
A final solution to the second problem is to lower temporarily the coding bit rate of the content below the transmission bit rate of the network, allowing playback to begin while the client buffer builds up. This is a solution proposed by Chou et al. in [13].
The system and process of the present invention resolve the problems of the existing techniques and provide fast startup, continuous playback, and maximal quality and smoothness over the entire streaming session.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.