IP (Internet Protocol) Multicast, an IETF (Internet Engineering Task Force) standard, has become an important component of the Internet. A framework for distributing data to multiple receivers through the use of multicast groups is described in S. Deering, D. R. Cheriton, Multicast Routing in Datagram Internetworks and Extended LANs, ACM Transactions on Computer Systems, vol. 8, no. 2, pp. 85-110, May 1990. More recent proposals to streamline the efficiency of the multicast delivery of data continue to use the multicast group concept to allow protocols to scale for applications that involve many receivers. See, for example, S. Deering, et al., The PIM Architecture for Wide-Area Multicast Routing, IEEE/ACM Transactions on Networking, vol. 4, no. 2, pp. 153-162, April 1996; F. Ballardie, et al., Core based trees (CBT) An architecture for Scalable Interdomain Multicast Routing, Proceedings of ACM SIGCOMM '93, pp. 85-95, ACM, September 1993.
Because the Internet is a best-effort service network, end-to-end reliability mechanisms are needed to ensure delivery of data. Receivers typically insure they obtain enough data by requesting repair packets when a data packet fails to arrive. When multicast groups grow large, simple reliable multicast protocols suffer from a condition known as feedback implosion, which is an overload of network resources due to many receivers trying to send repair requests for a single packet. A number of solutions exist to avoid this implosion effect, using techniques such as randomized timer, local recovery (in which intermediate receivers, rather than the original sender, send repair packets), and hierarchical recovery. See, for example, D. Towsley, et al., A Comparison of Sender-Initiated and Receiver-Initiated Reliable Multicast Protocols, IEEE Journal on Selected Areas in Communications, April 1997. While such techniques are effective in providing reliability, they can result in significant and unpredictable delays, making them unsuitable for applications that have stringent real-time constraints. See, for example, S. Pejhan, et al., Error Control Using Retransmission Schemes in Multicast Transport Protocols for Real-Time Media, IEEE/ACM Transactions on Networking, vol. 4 no. 3, pp. 413-427, June 1996.
Multicast protocols that elicit feedback from receivers have used a variety of techniques to obtain the feedback without implosion. The Realtime Transport Protocol/Realtime Transport Control Protocol (RTP/RTCP) places a maximum rate at which a receiver can issue feedback reports, where the rate is inversely proportional to the size of the multicast group. See January 1996 Internet RFC 1889 RTP: A Transport Protocol for Real-Time Applications. The reliable multicast protocol(see, e.g., S. Paul, et al., Reliable Multicast Transport Protocol (RMTP), IEEE Journal on Selected Areas in Communications, special issue on Network Support for Multipoint Communication, 15(3): pp. 407-421, April 1997) also uses periodic feedback from receivers, while Scalable Reliable Multicast (SRM) (S. Floyd, et al., A reliable multicast framework for light-weight sessions and application level framing, IEEE/ACM Transactions on Networking, December 1997) uses randomized delays to avoid implosion. These techniques for preventing feedback implosion introduce additional latencies. Thus, using them to provide reliability decreases the likelihood of a repair reaching a receiver before a hard deadline.
To reduce repair latencies, entities in the network other than the sender can be used to provide repairs. Choosing the repairing entity can be done in a variety of ways. SRM uses an approach where nearby members are usually first to respond to a repair request. RMTP selects fixed receivers in a predetermined fashion to perform repairs, while the Lorax protocol and the Structure-Oriented Resilient Multicast (STORM) protocol build virtual trees connecting various receivers over which repairs are unicast. Lorax and STORM are respectively described in B. N. Levine, et al., The Case for Concurrent Reliable Multicasting Using Shared Ack Trees, Proceedings of ACM Multimedia 1996; and R. Xu, et al., Resilient Multicast Support for Continuous Media Applications, NOSSDAV (Workshop on Networking and Operating System Support for Digital Audio and Video) 1997. In one system, described in D. DeLucia, K. Obraczka, Multicast Feedback Suppression Using Representatives, Proceedings of IEEE Infocom '97, the sender dynamically selects a subset of receivers to provide immediate feedback of loss: its success depends upon a selection process that chooses which receivers make good representatives.
There has also been a recent interest in providing resilient multicast service for real-time data, where data is retransmitted only if delivery can occur before the real-time deadline. Data is not reliably delivered, but a higher throughput (the effective rate of data transfer, considering both the data and needed repairs) can be achieved than without any retransmission. Two protocols that are designed to provide resilient multicast are STORM and Layered Video multicast with Retransmissions (LVMR) (described in X. Li et al., Layered Video multicast with Retransmissions (LVMR): Evaluation of Hierarchical Rate Control, Proceedings of IEEE INFOCOM '98.) Both STORM and LVMR make use of unicast retransmissions to reduce the cost of packet retransmission and rely on other receivers nearer to the loss than is the sender to provide the repair and thereby reduce delay and implosion. However, latency can increase when there is no nearby receiver to repair the loss. Rather than follow the shortest path within the network, repairs are unicast from receiver to receiver, resulting in a longer propagation path. Additional latencies also arise because end hosts are involved in forwarding a repair and do so at a rate that is much slower than internal routers. These factors make it difficult for such protocols to place bounds on their resilience, even if delays and loss rates are known from a receiver to its repair point.
Active networking can also improve real-time reliable multicast performance by providing mechanisms that allow the Internet to make a better effort than merely the best effort service which it currently performs. Two examples of techniques that have been shown to improve performance are repair servers in the network and the rerouting of repair requests by routers in the network. Active networking requires routers to perform additional services, and applications that require these services will have to compete for limited router resources.
Forward error correction (FEC) is a technique that reduces the bandwidth overhead of repairing errors or losses in bit streams. See, for example, Richard E. Blahut, Theory and Practice of Error Control Codes, Addison-Wesley, 1983. The FEC approach has been compared to a local recovery approach in J. Nonnenmacher, et al., How bad is reliable multicast without local recovery?, Proceedings of IEEE INFOCOM 98, in an environment where losses occur only on links that are directly connected to receivers. However, this type of loss does not resemble what is observed in the Internet. Real-time performance of reliable Multicast techniques that use FEC have previously been examined and compared unfavorably to ARQ (Automatic Repeat reQuest). See, M. Lucas, et al., Distributed Error Recovery for Continuous Media Data in Wide-Area Multicast, University of Virginia Technical Report CS95-52, Jul. 18, 1995. An interesting approach to the use of FEC that can deliver data reliably without ARQ is presented in L. Rizzo, L. Vicisano, A Reliable Multicast data Distribution Protocol based on software FEC techniques, Proceedings of the Fourth IEEE HPCS '97 Workshop, Chalkidiki, Greece, June 1997. However, present join-leave latencies for multicast groups make it too inefficient in use of bandwidth to support real-time applications.
Reed-Solomon codes are often suggested as the means by which data losses can be efficiently repaired. A description of the mathematics used to perform Reed-Solomon encoding is found in A. McAuley, Reliable Broadband Communication Using a Burst Erasure Correcting Code, ACM SIGCOMM '90, pp. 297-306, September 1990. It has been shown that a Reed-Solomon encoding combined with ARQ can significantly reduce bandwidth requirements of a large reliable multicast session over that which is consumed using standalone ARQ. See, J. Nonnenmacher, et al., Parity-Based Loss Recovery for Reliable Multicast Transmission, ACM SIGCOMM '97, September 1997, pp. 289-300. FEC encoding and decoding can be performed at a rate sufficient for many real-time applications. See, for example, L. Rizzo, Effective Erasure Codes for Reliable Computer Communication Protocols, Computer Communication Review, April 1997.
Systems equipped with Reed-Solomon encoders and decoders can make use of repair packets to recover from loss. The sender forms blocks, where each block consists of a subset of the data packets it wishes to deliver reliably. The number of data packets that are used to form a block is commonly referred to as the blocksize. The subset of data packets are fed into the encoder to generate repair packets, which may also be referred to as FEC packets. Each receiver contains a decoder that is used to retrieve lost data packets of a particular block by applying the decoder to a sufficient number of received data packets and FEC packets from the same block. The sufficient number of packets that must be received for a Reed-Solomon decoder to perform decoding equals the blocksize. Once this number of packets is received, any lost data packet in the block can be retrieved from the decoder. A detailed discussion of packet-level FEC techniques can be found in J. Nonnenmacher, et al., Parity-Based Loss Recovery for Reliable Multicast Transmission; implementation issues are considered in A. McAuley, Reliable Broadband Communication Using a Burst Erasure Correcting Code, and L. Rizzo, Effective Erasure Codes for Reliable Computer Communication Protocols, all of which are cited above. FEC techniques exist that can be used to generate as many repair packets as needed, and this can be done at data rates on the order of 8 Mbytes/sec on commodity personal computers, such as a 133 MHZ Pentium.RTM. microprocessor-based system.