1. Technical Field
The invention is related to a system and process for correcting errors and losses occurring during a receiver-driven layered multicast of real-time media over a heterogeneous packet network such as the Internet.
2. Background Art
Real-time media, such as radio and television programs, are broadcast from a single sender to multiple, geographically distributed receivers, who have all xe2x80x9ctunedxe2x80x9d to that sender. Commonly, the signals are broadcast from the sender by a terrestrial antenna, but satellite and wired solutions also exist. For example, in cable TV, the signals are broadcast from a sender by propagating a voltage along a coaxial cable to receivers connected to the cable.
It is also possible to use the Internet infrastructure to broadcast audio and video information. This is typically accomplished using the Internet Protocol (IP) Multicast mechanism and its associated protocols. An Internet broadcast (or more properly, xe2x80x9cmulticastxe2x80x9d) is provided to the set of receivers who have first xe2x80x9csubscribedxe2x80x9d to the information. Specifically, through an announcement mechanism, such as a web page, a broadcaster announces the IP multicast group address to which it will send a particular broadcast. The multicast group address is just a special case of an ordinary IP address. However, unlike an ordinary address which is used to identify the xe2x80x9clocationxe2x80x9d of a receiver where data is to be sent, a multicast group address is used by routers in the network to identify data being transmitted on the network as part of the broadcast, so that it can be routed to a subscribing receiver (who will have a completely different address). The receiver""s address is not included in the broadcasted information.
A receiver subscribes to the broadcast by notifying the network that it wishes to xe2x80x9cjoinxe2x80x9d the multicast group. The subscriptions cause various routers in the network to update their states, to ensure that the multicast information eventually reaches the subscribers. At some point the sender begins to send packets to the specified address. When a router receives a packet with that address, it sends copies of the packet through each outgoing interface that leads to a subscriber. This causes the packets to reach the subscribers at some point, albeit with the inevitable packet loss due to network congestion and buffer overflow.
At a later point in time a receiver may unsubscribe for reasons that will be discussed later. This also causes the routers to update their states. If a router no longer has subscribers downstream from an interface, it stops copying the multicast packets to that interface. If a router no longer has subscribers downstream from any of its interfaces, then the router itself unsubscribes from the multicast group, and hence no longer receives (from upstream routers) multicast packets addressed to that group. This process is reversible, and dynamic. Receivers may subscribe and unsubscribe as many times as desired. Thus, information is propagated through the network only as necessary to reach currently subscribing receivers. The processes of subscribing and unsubscribing takes only fractions of a second, thereby network bandwidth is not wasted unnecessarily.
In the Internet, the channels between the sender and each receiver vary dramatically in capacity, often by two or three orders of magnitude. These differences in capacity exist because the data transmission rates associated with the connections to a particular receiver can vary (e.g., phone line capacity, LAN and/or modem speeds). This heterogeneity in capacity can cause problems in the context of an Internet broadcast of real-time audio and video information. For example, a particular receiver may not have the bandwidth available to receive the highest quality transmission that a broadcaster is capable of providing. One early attempt to cope with this problem involved broadcasting the audio and video data at different transmission rates to different multicast group addresses, with the quality being progressively better in the data broadcast at the higher rates. The receiver then subscribed to the transmission that suited its capability. However, this solution was very bandwidth intensive as the same information (and more) had to be repeated in each channel. To overcome this problem, an Internet broadcast can be transmitted via a xe2x80x9clayered multicastxe2x80x9d. In a layered multicast, audio and video information is encoded in layers of importance. Each of these layers is transmitted in a separate data stream. A data stream is a sequence of packets all transmitted to the same multicast group address. The base layer is an information stream that contains the minimal amount of information, for the least acceptable quality. Subsequent layers enhance the previous layers, but do not repeat the data contained in a lower layer. Thus in order to obtain the higher quality, a receiver must subscribe to the lower layers in addition to the higher layers that provide the desired quality. For example, a video signal can be layered into packetized data streams of 8 Kbps (thousand bits per second). Each stream is sent to a different multicast group address. A receiver can subscribe to as many streams as it wants, provided the total bandwidth of the streams is not greater than the bandwidth of the most constrained link in the network between the sender and the receiver. For example, if the receiver is connected to the Internet by a 28.8 Kbps modem, then it can feasibly subscribe to one, two, or three 8 Kbps video layers. If it subscribes to more than three layers, then congestion will certainly result and many packets will be dropped randomly, resulting in poor video quality.
It has been proposed that the congestion problem be addressed using a xe2x80x9cReceiver-driven Layered Multicastxe2x80x9d (RLM) scheme, where each receiver attempts to optimize its received quality by subscribing to as many layers as possible without incurring substantial congestion and loss. It does this by xe2x80x9ctest joins,xe2x80x9d in which the receiver tentatively joins, or subscribes to, the multicast group containing the next layer. If performance improves, the test join is made permanent. Otherwise, the layer is dropped, i.e., the receiver unsubscribes from the multicast group. In addition, if performance degrades at any point during the multicast (due to congestion in the network), the topmost layer is dropped. However, in complex network environments such as the Internet, there is often congestion along the path between the sender and receiver that is xe2x80x9cambientxe2x80x9d in the sense that it is due to cross traffic between other senders and receivers. Therefore, it is not always possible to eliminate all congestion along the path from a sender to a receiver by cutting back on the rate of transmission between the sender and receiver, i.e, by dropping layers of the multicast. This presents a significant problem because in RLM the receiver attempts to drop multicast groups until there is little or no loss. When there is ambient congestion, this results in the receiver subscribing to few, if any, layers, which in turn results in sub-optimal video and audio quality.
Accordingly, there is a need for a system and process that can overcome the congestion issue and its concomitant packet losses, without eliminating so many multicast layers that the quality of the received audio and video information is unacceptable.
The present invention accomplishes this task by, in essence, augmenting RLM with one or more layers of error correction information. This allows each receiver to separately optimize the quality of the received audio and video information by subscribing to at least one error correction layer. Thus, a unique receiver-driven, layered, error correction multicast system and process is created.
Ideally, each source layer in a RLM would have one or more multicasted error correction data streams (i.e., layers) associated therewith. Each of the error correction layers would contain information that can be used to replace lost packets from the associated source layer. More than one error correction layer is proposed as some of the error correction packets contained in the data stream needed to replace the packets lost in the associated source stream may themselves be lost in transmission. These lost error correction packets can be picked up by subscribing to a second error correction layer, and so on until all (or an acceptable number) of the source packets have been replaced, or there are no more additional error correction layers available.
The decision as to how many broadcast source layers and associated error correction layers to subscribe to at one time is in essence based on the inherent packet loss rate of the network connection and the maximum bandwidth available to the receiver. The idea is to subscribe to as many of the source layers as possible or desired, while leaving enough bandwidth available to also subscribe to the number of error correction layers for each source layer that will compensate for the inherent packet loss rate of the connection and provide an acceptable audio and video quality. It should be noted that there is also an option not to subscribe to any error correction layers. For example, in networks where the inherent packet loss rate is very low, the optimum quality might be obtained by subscribing to additional source layers and no error correction layers.
The decision logic could be viewed as a one time decision made prior to receiving the broadcast. However, the packet loss rate, or even the available bandwidth could fluctuate during the broadcast. Accordingly, the decision logic could also be implemented dynamically in that the number of source layers and associated error correction layers subscribed to is reevaluated on a periodic basis to ensure the optimum quality is maintained throughout the broadcast. It should be noted that this may entail unsubscribing to the topmost layer (or layers) at least temporarily to maintain a desired audio and video quality.
Any audio and/or video layering process currently used in RLM could feasibly be employed in conjunction with the above-described system and process to create the source layers. Likewise, any conventional process currently used in RLM can be adopted for encoding/decoding and packetizing/unpacketizing the audio and/or video signal in the present invention.
However, in regard to the error correction layers, while it would be possible to use any currently existing error correction technique appropriate for packetized audio and video signals, or combination thereof, it is preferred that two specific error correction methods, or that a combination of these two methods, be employed.
The first of these methods is a unique adaptation of an existing process known as Forward Error Correction (FEC). In essence, the FEC technique involves encoding the transmission data using a linear transform which adds redundant elements. The redundancy permits losses to be corrected because any of the original data elements can be derived from the encoded elements. Thus, as long as enough of the encoded data elements are received so as to equal the number of the original data elements, it is possible to derive all the original elements. Specifically, the FEC technique is adapted to the present invention by producing parity layers that go along with each source layer. To form the parity layers, the packets in a source layer are partitioned into k packets per block, forming for each block a k by m matrix of bytes, X. Then a systematic, rate-compatible forward error correction code is applied to each block, to produce an n by m matrix of bytes Y, which forms nxe2x88x92k parity packets (in addition to the original k source packets). Each of these nxe2x88x92k parity packets is assigned to a different parity stream or layer. Each parity layer is then transmitted in a separate stream to a different multicast group address. In this way, both the source and parity information is layered. For each source layer, there are nxe2x88x92k parity layers. By having many source layers, and many parity layers for each source layer, each receiver is able to independently subscribe to the optimal number of source layers, and the optimal number of parity layers for each source layer, so as to maximize quality for a given transmission rate.
The second of the aforementioned preferred error correction techniques is new, but loosely based on an existing process called automatic repeat request (ARQ) protocol. In the ARQ protocol, the receiver is able to identify (e.g., via missing packet sequence numbers) which packets have been lost in transmission, and request retransmission of the lost packets from the sender. This process would typically be impractical however in the context of a IP multicast on a large network such as the Internet because the broadcaster would be overwhelmed by retransmission requests from the receivers. In the aforementioned new error correction process, dubbed pseudo-ARQ, the receivers do not request retransmission of specific lost packets. Instead, the broadcaster sends not only the source packets in a primary stream, but also sends delayed versions thereof in one of more redundant streams to different multicast group addresses. In this way, the receiver can subscribe as necessary to one or more of the delayed streams to pick up those packets needed to replace lost packets in the primary streams. Specifically, in pseudo-ARQ, the sender multicasts the source packets in a primary stream, and also multicasts delayed versions of the source packets in one or more redundant streams. If a receiver loses a packet from the primary stream, then it has the opportunity to subscribe to the first of the redundant streams, in an attempt to receive the packet again in a second transmission. If that fails too, then the receiver has the opportunity to subscribe to the second redundant stream, and so forth, until either the receiver recovers the desired packet, or there are no more streams to subscribe to.
The two error correction processes can also be combined to form an advantageous third process. This hybrid FEC/pseudo-ARQ process differs from the pure FEC in that all or some of the parity streams are progressively delayed. This permits the receiver to subscribe to additional parity information on demand. In one version of the hybrid error correction process, one or more undelayed parity layers are subscribed to and the first part of the process is identical to that described earlier in connection with the pure FEC procedure. However, if the number of parity packets lost in transmission results in fewer parity packets than missing source packets in a block of source layer packets, these parity packets can be obtained by subscribing to a delayed parity stream.
In another version of the hybrid process, all the parity streams are delayed and a receiver subscribes as needed to the streams to obtain parity packets that can be used to compute replacement source layer packets. This version is similar to the pseudo-ARQ method except parity packets are transmitted instead of copies of the source layer packets. This has advantages because rather than sending a copy of every source stream packet making up a block in an error correction stream, only one parity packet need be sent which multiple receivers can use to recover different source packets belonging to the block associated with the parity packet. This makes for an efficient use of shared network bandwidth. For example, two receivers may have a different loss pattern (of two losses) in a particular block of source data. But both receivers can still subscribe to the same parity packet associated with that block to try to recover the losses. Because the parity streams contain parity packets rather than a copy of a source packet, these. packets can be used to recover different source packets in the same block. This reduces the network bandwidth shared by all receivers, because the receivers can capture the same packet, rather than many different packets, to recover different loss patterns.
A further refinement of the hybrid error correction involves collapsing groups of parity streams into a single combined parity stream. The advantage of collapsing multiple streams into a single stream is that the receiver can then subscribe and unsubscribe to the single stream as necessary to obtain the packets it needs, rather than subscribing and unsubscribing to multiple streams. Thus, a receiver can capture multiple parity packets associated with the same block in the same collapsed parity stream to compute replacements for more than one lost source packet in that block of source data. This decreases the memory requirements in the network associated with keeping so many connections open.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.