The Transmission Control Protocol (TCP) is a transport layer protocol for reliable delivery of Internet (IP) packets (datagrams). TCP uses an Additive Increase Multiplicative Decrease (AIMD) rate control mechanism to ensure fair use of shared network resources (e.g., the available bit rate). With TCP/AIMD operation, whenever all outstanding packets sent within the last round-trip time (RTT) cycle are acknowledged by the receiver, TCP increases the transmission rate of the sender by a constant amount additively. On the other hand, when TCP detects congestion (or packet loss) by not having all outstanding packets acknowledged by the onset of the next RTT period, it halves the transmission rate of the sender, i.e., it multiplicatively reduces the rate by a factor of ½. Such TCP/AIMD rate control operation can create significant variations in the transmission bit rates, leading to exceedingly high latencies in packet delivery. This drawback makes TCP unsuitable for transport of interactive media packets, which are typically characterized by stringent delivery deadlines.
In some situations involving interactive multimedia communications, however, it is necessary to employ TCP transport in spite of its drawbacks. For example, corporate firewalls are sometimes set to block all traffic to, and from, the corporate Local Area Network (LAN) except over TCP connections. Therefore, media packets from the outside world destined for a receiver on the corporate LAN must be delivered via TCP, or otherwise face the prospect of being blocked by the firewall prior to entering the LAN.
Several studies or investigations on the use of TCP for interactive media transmission have been reported. See, e.g., Sally Floyd, Mark Handley, Jitendra Padhye, and Joerg Widmer, “Equation-Based Congestion Control for Unicast Applications,” August 2000, SIGCOMM 2000; Bing Wang, Wei Wei, Zheng Guo, and Don Towsley, “Multipath Live Streaming via TCP: Performance and Benefits,” UConn CSE Technical Report: BECAT/CSE-TR-06-7; S. Sakazawa, Y. Takishima, Y. Nakajima, M. Wada, and K. Hashimoto, “Multimedia contents management and transmission system ‘VAST-web’ and its effective transport protocol ‘SVFTP’”, ICME 2004; and T. Nguyen and S.-C. Cheung, “Multimedia Streaming Using Multiple TCP Connections,” IPCCC 2005.
The first of these studies (i.e., Equation-Based Congestion Control for Unicast Applications) describes a TCP-friendly scheme, which provides an equation-based rate control technique as an alternative to the TCP/AIMD rate control mechanism while preserving the feature of sharing in a fair manner the available network bit rate with existing TCP flows. The equation-based rate control technique yields smoother send rate fluctuations (than TCP/AIMD) in response to network congestion, and therefore makes it more suitable for streaming applications. The second of the cited studies (i.e., Multipath Live Streaming via TCP: Performance and Benefits) considers employing TCP transport over multiple network paths in order to improve TCP performance for streaming applications. Similarly, the third and fourth of the cited studies (i.e., Multimedia contents management and transmission system ‘VAST-web’ and its effective transport protocol ‘SVFTP’, and Multimedia Streaming Using Multiple TCP Connections, respectively) explore transmission over multiple TCP connections on the same network path as a way to increase TCP throughput in media streaming. These two studies, however, deal only with stored (pre-encoded) media content in the context of multimedia content management systems and streaming applications, respectively; furthermore, they treat the individual media packets uniformly, and do not take advantage of a possible scalable structure in the transmitted media. When scalable coding is used in the transmitted media, different packets have different importance in terms of how they affect the reconstruction quality of the media in the receiver.
Scalable coding is a well-known technique in multimedia data encoding, in which the encoder generates two or more “scaled” bitstreams that collectively represent a given medium in a bandwidth-efficient manner. Scalability can be provided in a number of different dimensions, namely temporal, spatial, and quality (also referred to as SNR (Signal-to-Noise Ratio) scalability) dimensions. For example, a video signal may be scalable-coded in different layers at CIF and QCIF resolutions, and at frame rates of 7.5, 15, and 30 frames per second (fps). Depending on the codec's structure, any combination of spatial resolutions and frame rates may be obtainable from the codec bitstream. The bits corresponding to the different layers can be transmitted as separate bitstreams (i.e., one stream per layer), or they can be multiplexed together in one or more bitstreams. For convenience in description herein, the coded bits corresponding to a given layer may be referred to as that layer's bitstream, even if the various layers are multiplexed and transmitted in a single bitstream. Codecs specifically designed to offer scalability features include, for example, MPEG-2 (ISO/IEC 13818-2, also known as ITU-T H.262) and the currently developed SVC (known as ITU-T H.264 Annex G or MPEG-4 Part 10 SVC). Scalable coding techniques specifically designed for video communication are described, for example, in commonly assigned International Patent Application No. PCT/US06/028365 “SYSTEM AND METHOD FOR SCALABLE AND LOW-DELAY VIDEOCONFERENCING USING SCALABLE VIDEO CODING.”
It is noted that even codecs that are not specifically designed to offer scalability features can exhibit scalability characteristics in the temporal dimension. For example, consider an MPEG-2 Main Profile codec, a non-scalable codec, which is used in DVDs and digital TV environments. Further, assume that the codec is operated at 30 fps and that a group of pictures (GOP) structure of IBBPBBPBBPBBPBB (period N=15 frames) is used. By sequential elimination of the B pictures, followed by elimination of the P pictures, it is possible to derive a total of three temporal resolutions: 30 fps (all picture types included), 10 fps (I and P only), and 2 fps (I only). The sequential elimination process results in a decodable bitstream because the MPEG-2 Main Profile codec is so designed that coding of the P pictures does not rely on the B pictures, and, similarly, coding of the I pictures does not rely on other P or B pictures. For convenience, in the following description, single-layer codecs with temporal scalability features are considered to be a special case of scalable video codecs, and understood to be included in the term “scalable video coding” unless explicitly indicated otherwise.
Scalable codecs typically have a pyramidal bitstream structure in which one of the constituent bitstreams (called the “base layer”) is essential in recovering the original medium at some basic quality. Use of one or more of the remaining bitstream(s) (called the “enhancement layer(s)”) together with the base layer increases the quality of the recovered medium. Data losses in the enhancement layers may be tolerable, but data losses in the base layer can cause significant distortions or complete loss of the recovered medium.
Simulcasting is a coding solution that is less complex than scalable coding but has some of the advantages of the latter. In simulcasting, two different versions of the source are encoded (e.g., at two different spatial resolutions) and transmitted. Each version is independent, in that its decoding does not depend on reception of the other version. In the following description, simulcasting is considered to be a special case of scalable coding (where no inter layer prediction is performed), and referred to simply as scalable coding unless explicitly indicated otherwise.
Consideration is now being given to improving packet-based communication systems in which multiple TCP connections are used for transmitting scalable coded data. In particular, attention is being directed to live audio and video communication scenarios where providing low latency packet delivery is essential.