The Internet, in its present form, does not have any support for Quality of Service (QOS) guarantees, however, real-time distribution of video would ideally require the reservation of resources within the network in order to provide some guarantees in terms of end-to-end latency, jitter and packet loss. RSVP systems, such as disclosed in "RSVP: A New Resource ReSerVation Protocol", I.E.E.E. Network Magazine, September 1993, by L. Zhang, S. Deering, D. Estrin, S. Shenker and D. Zappala, the contents of which are incorporated herein by reference, allow applications at the end-hosts to interact with the network layer by requesting resources for a certain QOS guarantee. ATM will also provide QOS guarantees by setting up virtual circuits. However, both RSVP and ATM are far from being deployed on a wide area network and even further away from being used ubiquitously for real-time video distribution. In addition to that, even when they are employed on a large scale, end-users would have to pay significantly higher prices for a guaranteed QOS service as opposed to a best effort delivery service.
Currently, video multicast systems employing layered encoding schemes enable transmitters to deliver optimal quality video to one or more receivers having heterogeneous capabilities. Layered encoding schemes, such as MPEG, MPEG-2, JPEG,H.261, and others, essentially separate an encoded video stream into two or more layers: one base layer and one or more enhancement layers, with the base layer capable of being independently decoded to provide a "basic" level of video quality and the enhancement layers capable of being decoded only together with the base layer for video quality improvement. One video encoding scheme that is based upon high compression efficiency and a low overhead for the layering process is the MPEG International Standard, and particularly, MPEG-2. As is known, in MPEG-2 video coding, frames are coded in one of three modes: intraframe (I), predictive (P) or bidirectionally-predictive (B). These modes provide intrinsic layering in that an I frame can be independently decoded, while P frames require I frames, and B frames generally require I and P frames to decode. By using a multicast group for each frame type a simple layering mechanism is obtained. It has been argued that for video coding algorithms of high efficiency, such as MPEG-2, at the expense of lower error resilience, is undesirable for IP (best-effort) networks. This has lead to the use of less efficient intraframe coding techniques for IP multicasting.
In present video receivers, a rate control scheme is implemented to decide what video layers a receiver should receive. There are two principal rate control approaches for multicast video: Sender-initiated control and receiver initiated control such as described in S. McCanne and V. Jacobson, "Receiver-Driven Layered Multicast," Proceedings of A.C.M. SIGCOMM '96, October 1996. In the sender-initiated approach, the sender multicasts a single video stream whose quality is adjusted based on feedback information from receivers. The receiver-initiated approach is based on the layered coding scheme, in which the sender multicasts several layers of video (typically a base layer and several enhancement layers) in a different multicast group, and a receiver subscribes to one or more layers based on its capabilities, i.e., will decide on its own whether to drop an enhancement layer or to add one.
Currently, a distributed approach is employed by which receivers in current video transmission systems decide to add or drop a layer either, 1) indiscriminantly, or, 2) by receiving and maintaining state information about other receivers through a "shared learning" process. The shared learning process requires that each receiver maintain certain state information which it may not require. Furthermore, in such multicast sessions to exchange control information may lead to a decrease in usable bandwidth on low-speed links and leads to lower quality for receivers on these links.
For packet retransmission in real-time video systems, current handling of delay variability is implemented by adaptively setting playback points in accordance with the maximum jitter in the network. A proposed adaptation scheme that adaptively sets playback points in accordance with jitter is described in R. Ramjee, J. Kurose, D. Towsley, and H. Schulzrinne "Adaptive Playout Mechanisms for Packetized Audio Applications in wide-Area Networks", Proceedings of IEEE INFOCOM '94, March 1994, the contents and disclosure of which is incorporated by reference as if fully set forth herein. It has been suggested in the reference "A New Error Control Scheme for Packetized Voice over HighSpeed Local Networks," Proc. IEEE 18th Local Computer Networks Conference, Minneapolis, Pages 91-100, September 1993, authored by B. Dempsey, J. Liebeherr, and A. C. Weaver, that the control time, which is defined as the duration between the arrival instant and playback point of the first frame, can be extended to allow more time for retransmissions. This scheme has been implemented in retransmission schemes for interactive packetized voice traffic over local networks, but not for non-interactive video packet traffic.
It would thus be highly desirable to provide a layered multicast video transmission system including a transport mechanism that improves the quality of video transmission by minimizing packet loss in an unreliable packet-switched network, and furthermore, implements a smart retransmission-based error recovery algorithm.
Furthermore, it would be highly desirable to provide a layered multicast video transmission system that readily integrates any layered video coding algorithms of high efficiency.