This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Packet switched network transmission protocols typically employed for VoIP comprise the RTP encapsulated in the User Datagram Protocol (UDP), further encapsulated into the Internet Protocol (IP). The checksums employed in the UDP and IP result in discarding all the packets in which the receiver detects bit errors. In other words, the protocol stack in the receiver does not convey any distorted packets to the application layer. Therefore, when IP packets are transmitted over an error prone radio link or over any media introducing transmission errors, the application layer is likely to face packet losses. Conversely, none of the packets reaching the application layer contain any residual bit errors. Due to this phenomenon, an error concealment algorithm is not able to utilize partially correct frames, as can be done, for example, in a circuit switched GSM telephone service. Instead, the erroneous frame needs to be completely replaced. This is likely to make the error concealment process less effective than the approach that is used in circuit-switched services.
Another aspect of packet-switched networks involves media level scalability. This scalability may be deployed in the transmission level, e.g., for controlling the network capacity or shaping a multicast media stream to facilitate operation with participants behind access links of different bandwidths. In the application level, the scalability can be utilized for controlling, e.g., computational complexity, encoding delay, or desired quality level.
The scalable media data comprises a core layer, which is always needed to enable reconstruction in the receiving end, and one or several enhancement layers that can be used to provide added value to the reconstructed media (e.g., improved media quality). It should be noted that, while in some scenarios the scalability can be applied in the transmitting end-point, there are also operating scenarios where it makes more sense to permit an intermediate network element to perform the scaling. The enhancement layers can be transmitted either together with the core layer data or in separate packets. While transmitting the enhancement layers in separate packets from the core layer data makes the scaling operation more straightforward since the scaling operation can be performed by dropping full packets, it provides some challenges, e.g. in the session setup and control procedures. On the other hand, the transmitting of the enhancement layers in the same packet together with the core layer data is more effective in terms of transmission bandwidth usage and also enables a simpler session setup.
Network congestion and limited link bandwidth are examples of reasons that may necessitate the removal of a portion of the scalable content from an IP packet. Furthermore, all receivers may not be capable of receiving or consuming full bit rate content. Hence, a network element controlling the transmission link may remove some of the higher layers of content.
When data from enhancement layers is transmitted in the same packet with core layer data, a scaling operation implies the modification of packet contents. The modification of an IP packet requires always opening and repacketizing the payload. Although the receiver and sender information do not change, the packet size and corresponding header information may be modified. For example, any error detection checksums need to be recalculated when the content of the payload is changed.
Media data that can be divided into two or more classes of importance can benefit from an approach where the error detection is performed separately for different classes of data. In the event that there are errors only in the “less important” parts of the data, many applications are still able to make use of the error-free part of the received data. A scalable media packet carrying the enhancement layers together with the core data clearly forms a similar case by its design; possible errors in the enhancement layers do not affect the core layer, and therefore the core data should be made available for the application also in case some of the enhancement layers are corrupted. A further benefit for the scalable media arises from the fact that separate error detection checksums for each of the layers facilitates simple scaling functionality. Furthermore, different forward error correction (FEC) arrangements can also be used for core and enhancement layers.
Various methods have been introduced to handle packet loss conditions. Some methods, such as partial checksum methods, involving handling only portions of the payload data. In particular, when a UDP-Lite protocol is at issue, a partial checksum could be utilized for unequal error detection. With scalable content, a problem arises involving the need for recalculating a partial checksum when the payload size is modified.
Typically, in circuit switched systems, the most sensitive bits of a speech codec are protected with stronger FEC arrangements compared to the least sensitive bits. A partial error detection code, such as cyclic redundancy check (CRC), can be used to classify the whole frame as lost when the most sensitive bits contain errors. A similar method can be used in packet switched networks.
Unequal FEC protection methods for speech frames are based on the ordering of parameters in descending order of priority. For example, the scalable codec bit stream can be classified to various classes according to the scalability layers and the importance of them. Each priority class can then be protected with an unequal error correction capability. Such an unequal protection arrangement requires two features for receiver implementations that are not typically standardized. First, the protocol stack, and especially the UDP and IP protocol implementations, should be modified to pass partially incorrect packet payloads to protocol stack layers above. Alternatively, protocols allowing erroneous payloads to be passed to the application, such as UDP-Lite, can be used.
An advantage of unequal error detection involves permitting increased capacity, particularly in wireless links when the errors in less sensitive bits do not cause the dropping of the whole packet. The application layer still copes with payloads containing errors in the least sensitive content.
One method for error detection involves having a single checksum covering the entire payload data. This is the approach used, for example, in the UDP protocol. This mode of operation does not enable UED at all and, in the case of scalable content, the removal of one or more enhancement layers from the payload requires re-computation of the checksum. While this is a fully working solution, the computational complexity of the additional CRC computation each time a packet is being scaled may not be a feasible approach in certain scenarios. On the other hand, only one checksum is needed to cover the whole payload in this approach, thereby providing an efficient usage of transmission bandwidth. Additionally, in this arrangement, the end-points only have to compute/verify a single checksum.
An enhancement to the single-checksum approach discussed above is to have a single checksum that covers only selected portions of the payload data, thereby leaving the rest of the payload data uncovered. This approach enables a simple UED, providing a checksum-covered part and uncovered parts of the payload data. Furthermore, a simple two-level scalability can be supported by having the core layer as the checksum-covered part and the enhancement layer as the uncovered part. In this case, achieving scalability by dropping the enhancement layer would not require changing the checksum unless the change of the payload affects the value of the checksum. This kind of functionality is employed, e.g. in UDP-Lite by providing the ability to permit the checksum to cover only the selected part of the data at the beginning of the payload data. Making use of this UED-enabling functionality requires that the payload data must be arranged in a suitable way (i.e., the most important data should appear at the beginning of the payload) and the application can benefit by letting the errors in uncovered part go unnoticed. An example of such a payload is the RTP payload for the AMR and AMR-WB speech codecs, which provides an option for including a CRC bit field in the payload header. The CRC check is performed over the most sensitive speech bits, namely the class A bits. In this scenario, the goal is to discard the frame only if there are bit errors in the most sensitive bits. On the other hand, errors in the remaining portion of the frame can be tolerated. Currently, unequal error detection can be utilized in full only with the UDP-Lite and DCCP protocols. These protocols enable the utilization of functionality, which allows for the passing of partially corrupted packets up to the application layer.
In a variation of the above method, the UDP-Lite checksum, for example, can be used to cover part of the payload, with a separate payload-internal checksum being used to cover the parts of the payload not covered by the UDP-Lite checksum. This approach provides full checksum coverage for the payload, which enables also discarding the less important portion of the payload data in case errors are detected. This can be particularly useful in some applications that cannot tolerate any (undetected) errors.
A further option for a more flexible payload design involves dividing the data into several subsections and to have separate checksums for each subsection. This system enables both flexible UED and flexible scalability. When each of the layers is covered by a separate checksum, the dropping of any of the layers (either due to a detected error or due to a scaling operation) can be performed without the need to re-compute the checksum(s). However, a drawback to this system is that the use of several checksums requires sending more data compared to previous approaches with only a single checksum. Furthermore, in this arrangement, several checksums need to be separately computed at the sender and verified at the receiving end. On the other hand, this approach enables an increased robustness to errors and flexible scalability, without the need to re-compute checksum(s) in an intermediate element that performs the scaling.
IP transport mechanisms provide tools for FEC packets. For example, the Internet Engineering Task Force (IETF) Request for Comments (RFC) 2733, which can be found at www.ietf.org/rfc/rfc2733.txt, provides a generic mechanism for transporting XOR-based forward error correction data within a separate RTP session. The payload header of FEC packets contains a bit mask identifying the packet payloads over which the bit-wise XOR operation is calculated and a few fields for RTP header recovery of the protected packets. One XOR FEC packet enables recovery of one lost source packet.
Work is being conducted to replace TETF RFC 2733 with similar RTP payload format for XOR-based FEC protection also including the capability of uneven levels of protection, herein referred to as the ULP Internet Draft (A. H. Li, “RTP payload format for generic forward error correction,” Internet Engineering Task Force Internet Draft draft-ietf-avt-ulp-23.txt, August 2007). The payloads of the protected source packets are split into consecutive byte ranges, starting from the beginning of the payload. The first byte range, starting from the beginning of the packet, corresponds to the strongest level of protection, and the protection level decreases as a function of byte range order. Hence, the media data in the protected packets can be organized in such a way that the data appears in descending order of importance within the payload and a similar number of bytes correspond to similar subjective impact in quality among the protected packets. The number of protected levels in FEC repair packets is selectable, and an uneven level of protection is obtained when the number of levels protecting a set of source packets is varied. For example, if there are three levels of protection, one FEC packet may protect all three levels, a second packet may protect the two first levels, and a third packet may protect only the first level. When applied to RTP payloads containing AMR-WB coded data, the ULP Internet Draft can be used to protect class A bits more robustly compared to class B bits. Details concerning unequal error protection and bit classification of AMR and AMR-WB frames can be found in section 3.6 of IETF RFC 4867, which can be found at www.ietf.org/rfc/rfc4867.txt, and the Third Generation Partnership Project (3GPP) specification 3GPP TS 26.201.
Another emerging trend in the field of media coding involves what are referred to as “layered codecs.” Such layered codecs include, for example, the International Telecommunications Union (ITU) Telecommunication Standardization Sector (ITU-T), Embedded Variable Bit-Rate (EV-VBR) speech/audio codec and the ITU-T Scalable Video Codec (SVC). The scalable media data comprises a core layer, which is always needed to enable reconstruction at the receiving end, and one or more enhancement layers that can be used to provide added value to the reconstructed media (e.g., to provide improved media quality or an increased level of robustness against transmission errors, etc). The scalability can be deployed at the transmission level, e.g., for controlling the network capacity or shaping a multicast media stream in order to facilitate operation with participants behind access links of different bandwidth. At the application level, the scalability can be utilized for controlling, e.g., computational complexity, encoding delay, or desired quality level. It should be noted that, while in some scenarios the scalability can be applied at the transmitting end-point, there are also operating scenarios where it makes more sense to permit an intermediate network element to perform the scaling.
From a media transport point of view, scalable codecs provide two basic options—the enhancement layers can be transmitted either together in the same packets with the core layer data, or the (subsets of) enhancement layers can be transmitted in separate packet stream(s). The transmission in separate packet streams also requires a signalling mechanism that can be used to bind together the packet data streams carrying layers of the same media source.
The approach of carrying all layers (of a media frame) in a single packet provides for a low overhead and easy cross-layer synchronization. On the other hand, this approach results in more complex scaling in an intermediate network element; the scaling possibility requires the network element to be aware of the details of the packet structure. Furthermore, the scaling also implies modification of packets.
The approach employing separate data streams for (subsets of) layers provides for a simple scaling possibility because the scaling can be realized by discarding all packets of some data streams. This does not require in-depth knowledge about the packet structure, as long as the (signalling) information on the relationships between the data streams is available. However, this approach results in increased overhead, since each data stream introduces its own protocol overhead (e.g. IP/UDP/RTP). A further challenge associated with this approach involves cross-layer synchronization, i.e., how the receiver can reconstruct the media frames based on layers it receives distributed across multiple data streams. It should be noted that employing multiple data streams, e.g., multiple RTP sessions, is the traditional method for transmitting layered media data within the RTP framework (often referred to as scalable multicast).
The timeline of a media stream received in RTP packets can be reconstructed based time stamp (TS) information included in the RTP header. The RTP TS provides information on the temporal difference compared to the other RTP packets transmitted in the same RTP session, which enables the placement of each received media frame in its correct place in the timeline. However, the initial value of the RTP TS of an RTP session is random, which implies that RTP TS does not indicate an absolute time. Instead, the RTP TS only references a timing reference within the RTP session. It should be noted that this “randomness” can be considered to be an unknown offset from the absolute time, which is different in each RTP session. Therefore two or more RTP sessions cannot be synchronized based soley on their RTP TS values. This is also true for separate RTP sessions used to carry (subsets of) layers of layered encoding.
A conventional mechanism within RTP for synchronizing multiple RTP sessions is based on RTCP reports transmitted within each session. In this approach, the sender includes both the timing reference (NTP) and the sending instant in the RTP TS domain in the RTCP Sender Reports (SR) that it transmits according to specified rules. This enables the receiver to compute the RTP TS offset from the timing reference (NTP) for each of the RTP sessions it receives. These offset values can then be used to match the timing of the media data received in separate RTP sessions, for example to combine layers of a media frame received in multiple RTP sessions. However, this approach requires the first RTCP SRs for each of the RTP sessions to be available before the full reconstruction of a media frame is possible. In practice, this implies that only the core layer of the layered encoding is available until the synchronization information is available.
One alternative to the above involves pre-synchronizing the RTP TS spaces across RTP sessions in the transmitting end-point. In this approach, the “random” initial value of the RTP TS would be the same for each of the RTP sessions. While this may provide a simple cross-session synchronization mechanism without having to transmit additional data, it is not fully in line with the RTP specification, and existing RTP implementations may not support it. Furthermore, such a mechanism would provide the (pre-)synchronization at the RTP (header) level, but it would be available only for subset of RTP payloads. Such payload type-dependent processing at RTP level is not a desirable feature in a system handling multiple payload types.
Still another approach for synchronizing multiple RTP sessions involves attaching additional information for each transmitted layer, indicating its temporal location in the presentation timeline. Such information may comprise, for example, a cross-layer sequence number or additional timestamp information that can be used to reconstruct the presentation order of media frames and layers within the frames. However, this approach would still introduce additional overhead for each transmitted layer. This is important because, particularly in the case of smaller pieces of speech/audio data, e.g., in the order of 10-20 bytes, even one additional byte of overhead may have a significant effect on the overall system performance.