Modern media content distribution systems such as mobile video transmission systems are becoming increasingly popular. Bitstream scalability is a desirable feature in such systems. An encoded media bitstream is generally called scalable when parts of the bitstream can be removed so that the resulting sub-bitstream is still decodable by a target decoder. The media content of the sub-bitstream can be reconstructed at a quality that is less than that of the original bitstream, but still high when considering the resulting reduction of transmission and storage resources. Bitstreams that do not have these properties are also referred to as single-layer bitstreams.
Scalable Video Coding (SVC) is one solution to the scalability needs posed by the characteristics of video transmission systems. The SVC standard as specified in Annex G of the H.264/Advcanced Video Coding (AVC) specification allows the construction of bitstreams that contain scaling sub-bitstreams conforming to H.264/AVC. H.264/AVC is a video compression standard equivalent to the Moving Pictures Expert Group (MPEG)-4 AVC (MPEG-4 AVC) standard.
The SVC standard encompasses different scalability concepts as described, for example, in H. Schwarz et al., “Overview of the Scalable Video Coding Extension of the H.264/AVC standard”, IEEE Transactions on Circuits and Systems for Video Technology“, Vol. 17, No. 9, September 2007. For spatial and quality bitstream scalability, i.e. the generation of a sub-bitstream with lower spatial resolution or quality than the original bitstream, Network Abstraction Layer (NAL) units are removed from the bitstream when deriving the sub-bitstream. In this case, inter-layer prediction, i.e., the prediction of the higher spatial resolution or quality bitstream based on information contained in the lower spatial resolution or quality bitstream, is used for efficient encoding. For temporal bitstream scalability, i.e., the generation of a sub-bitstream with a lower temporal sampling rate than the original bitstream, complete access units are removed from the bitstream when deriving the sub-bitstream. An access unit is defined as a set of consecutive NAL units with specific properties. In the case of temporal bitstream scalability, high-level syntax and inter prediction reference pictures in the bitstream are constructed accordingly.
In the SVC standard, the sub-bitstream having a lower temporal sampling rate, lower spatial resolution or lower quality is referred to as Base Layer (BL) sub-bitstream, while the higher temporal sampling rate, higher spatial resolution or higher quality sub-bitstream is referred to as Enhancement Layer (EL) sub-bitstream. In scenarios with multiple sub-bitstreams of, for example, different higher spatial resolutions, two or more EL sub-bitstreams may be provided in total. Each sub-bitstream can be interpreted as constituting a separate media layer.
An image of an SVC video image sequence is represented as so-called “frame” (i.e., as an encoded representation of this image). Each SVC sub-bitstream comprises a sequence of so called SVC “sub-frames”. Each SVC sub-frame constitutes either a full SVC frame or a fraction of a SVC frame. In other words, each SVC frame is either represented as a single data item (i.e., one BL “sub-frame” or one EL “sub-frame”) or is sub-divided in at least two separate data items, i.e., in one BL “sub-frame” containing only the BL information associated with the respective frame and (at least) one EL “sub-frame” containing the EL information associated with the respective frame.
The scalability feature introduced by the SVC standard allows for a bitstream adaptation dependent on, for example, decoder capabilities, display resolutions and available transmission bit rates. If only the BL sub-frames are decoded, the video content can be rendered for example at a basis resolution or quality (e.g., at Quarter Video Graphics Array, or QVGA, resolution). If, on the other hand, both the BL and the EL sub-frames are decoded, then the video content can be rendered at a higher resolution or quality (e.g., at VGA resolution).
G. Xylomenos et al., “Reducing the Transmission Power Requirements of the Multimedia Broadcast/Multicast Service”, in Proceedings of the IST Mobile & Wireless Communications Summit 2007, suggest distributing scalably encoded media layers in a multicasting network via the Multimedia Broadcast and Multicast Service (MBMS). MBMS was specified in Universal Mobile Telecommunication System (UMTS) Release 6 in order to support efficient delivery of identical media content from one source to multiple media recipients. With the introduction of a new Point-to-Multipoint (PTM) bearer, the unicast or Point-to-Point (PTP) solution in UMTS was extended by multicast and broadcast capabilities, thus enabling a virtually unlimited number of recipients to simultaneously receive the same media content on common radio resources.
In contrast to the PTP bearer, the PTM bearer does not support channel quality feedback from the recipients. As a result, transmit power as well as the Modulation and Coding Scheme (MCS) are both statically configured. This static approach implies that the PTM bearer leads to a waste of radio resources if there are no or only a few recipients in a content distribution area interested in the same service. Therefore, it is also possible to deploy the PTP transmission mode in MBMS to exploit the advantages of link adaptation.
In enhanced MBMS (eMBMS), it is also possible to deploy an adaptive PTM (aPTM) transmission mode which combines the advantages of simultaneous reception on common resources by multiple recipients and link adaptation. In the adaptive PTM mode, the PTM bearer is used and supports link adaptation based on channel quality feedback and Hybrid Automatic Repeat request (HARQ) status reports from multiple recipients. Since the amount of feedback increases with the number of interested users while the link adaptation gains are reduced, the adaptive PTM mode is particularly appropriate for a relatively small MBMS recipient group. The controlling node will thus select an appropriate bearer type depending on the number of recipients interested in a certain media content.
For the transmission of scalably encoded media layers, G. Xylomenos et al. propose selecting an appropriate bearer type for each media layer individually and assigning a separate MBMS group to each media layer. The bearer type selection is based on the number of recipients interested in a specific media layer and thus requires counting for each media layer the recipients that have requested the specific media layer. The resulting counting procedures can become time consuming and additionally consume hardware resources. While not discussed by G. Xylomenos et al., resource consumption would be particularly high in cases in which the counting procedures rely on a request/response messaging scheme with each individual recipient.