This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Multimedia applications include local playback, streaming or on-demand, conversational and broadcast/multicast services. Technologies involved in multimedia applications include, among others, media coding, storage and transmission. Different standards have been specified for different technologies.
Video coding standards include ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 (also known as ISO/IEC MPEG-4 AVC). In addition, there are currently efforts underway with regards to the development of new video coding standards. One such standard under development is the scalable video coding (SVC) standard, which will become the scalable extension to the H.264/AVC standard (H.264/AVC). Another such effort involves the development of China video coding standards. Another such standard under development is the multi-view video coding (MVC) standard, which will become another extension to H.264/AVC.
SVC can provide scalable video bitstreams. In SVC, a video sequence can be coded in multiple layers, and each layer is one representation of the video sequence at a certain spatial resolution or temporal resolution or at a certain quality level or some combination of the three. A portion of a scalable video bitstream can be extracted and decoded at a desired spatial resolution, temporal resolution, a certain quality level or some combination of these resolutions. A scalable video bitstream contains a non-scalable base layer and one or more enhancement layers. An enhancement layer may enhance the temporal resolution (i.e. the frame rate), the spatial resolution, or simply the quality of the video content represented by the lower layer or part thereof. In some cases, data of an enhancement layer can be truncated after a certain location, even at arbitrary positions, and each truncation position can include some additional data representing increasingly enhanced visual quality. Such scalability is referred to as fine-grained (granularity) scalability (FGS). In contrast to FGS, the scalability provided by a quality enhancement layer that does not provide fined-grained scalability is referred as coarse-grained scalability (CGS). Base layers can be designed to be FGS scalable as well. SVC is one example of scalable coding of video. A draft of the SVC standard is described in JVT-S202, “Joint Scalable Video Model JSVM-6: Joint Draft 6 with proposed changes,” 19th JVT Meeting, Geneva, Switzerland, April 2006.
In multiple description coding (MDC), an input media sequence is encoded into more than one sub-stream, each of which is referred to as a description. Each description is independently decodable and represents a certain media quality. However, based on the decoding of one or more descriptions, additional decoding of another description can result in an improved media quality. MDC is discussed in detail in Y. Wang, A. Reibman, and S. Lin, “Multiple description coding for video delivery,” Proceedings of the IEEE, vol. 93, no. 1, January 2005.
In multi-view video coding, video sequences output from different cameras, each corresponding to a view, are encoded into one bitsream. After decoding, to display a certain view, the decoded pictures belong to that view are displayed. A draft of the MVC standard is described in JVT-T208, “Joint multiview video model (JMVM 1.0),” 20th JVT meeting, Klagenfurt, Austria, July 2006.
The H.264/AVC standard and its extensions include the support of supplemental enhancement information (SEI) signaling through SEI messages. SEI messages are not required by the decoding process to generate correct sample values in output pictures. Rather, they are helpful for other purposes, e.g., error resilience and display. H.264/AVC contains the syntax and semantics for the specified SEI messages, but no process for handling the messages in the recipient is defined. Consequently, encoders are required to follow the H.264/AVC standard when they create SEI messages, and decoders conforming to the H.264/AVC standard are not required to process SEI messages for output order conformance. One of the reasons to include the syntax and semantics of SEI messages in H.264/AVC is to allow system specifications, such as 3GPP multimedia specifications and DVB specifications, to interpret the supplemental information identically and hence interoperate. It is intended that system specifications can require the use of particular SEI messages both in encoding end and in decoding end, and the process for handling SEI messages in the recipient may be specified for the application in a system specification.
The mechanism for providing temporal scalability in the latest SVC specification is referred to as the “hierarchical B pictures” coding structure. This feature is fully supported by H.264/AVC, and the signaling portion can be performed by using sub-sequence-related SEI messages.
The SELI messages in H.264/AVC are described without any references to the scalable extension annex. Consequently H.264/AVC encoders generate and H.264/AVC decoders interpret the messages as described and suggested by the semantics of the messages in the H.264/AVC standard, respectively,and the messages cannot be used as such for signaling the properties of pictures above the base layer in an SVC bitstream. The access units and pictures to which H.264/AVC SEI messages pertain are specified in the semantics of each SEI message. For example, the information in a sub-sequence layer information SEI message is valid from the access unit that contains the SEI message until the next access unit containing a sub-sequence layer information SEI message, exclusive, or the end of the bitstream if no succeeding sub-sequence layer information SEI message is present. The pan-scan rectangle SEI message contains a syntax element (pan_scan_rect_repetition_period), specifying for which pictures the message is valid. The sub-sequence information SEI message contains data that is valid only for the access unit that contains it.
An access unit according to the H.264/AVC coding standard comprises zero or more SEI messages, one primary coded picture, zero or more redundant coded pictures, and zero or more auxiliary coded pictures. In some systems, detection of access unit boundaries can be simplified by inserting an access unit delimiter into the bitstream. An access unit according to SVC comprises at least one coded picture that is not a redundant or auxiliary coded picture. For example, an SVC access unit may comprise one primary coded picture for the base layer and multiple enhancement coded pictures. A coded picture as described herein refers to all of the network abstraction layer (NAL) units within an access unit having particular values of dependency_id and quality_level.
There are a number of different possibilities for the scope of an SEI message. When an SEI message contains data that pertain to more than one access unit (for example, when the SEI message has a coded video sequence as its scope), it is contained in the first access unit to which it applies. SEI messages that contain data which pertain to a single access unit, such as scene information, stereo video, etc, equally apply for all of the pictures in the access unit. An SEI message may relate to filler data, user data, etc. and not be associated to any particular access unit.
In scalable video coding, multiple description coding, multiview video coding, and other video coding methods, an access unit may comprise multiple coded pictures, wherein each picture is one representation of the video sequence or sequences at a certain spatial resolution, temporal resolution, certain quality level, view, description or some combination thereof. In certain applications, for example, it may be desirable to apply the method of pan and scan only to the pictures with the same picture size so that they can be shown on one type of display while a different type of pan and scan may be desired at a different picture size. In this situation, it would be desirable to have a mechanism for specifying which pictures within an access unit a particular SEI message applies to. For example, it would be helpful to have a mechanism for specifying a pan-scan rectangle for each picture size present in an access unit and have as many reference picture marking repetition SEI messages as needed to represent each possibility for memory management control operations.
Lastly, the semantics of an SEI message according to the H.264/AVC standard may apply only to the AVC picture (e.g., the base layer picture in SVC) in the access unit. In this situation, it may be desirable to extend the scope and semantics of the SEI message to any other picture in the access unit. This is the case for SEI messages indicating items such as spare pictures, sub-sequence layer characteristics, sub-sequence characteristics, motion-constrained slice group sets, film grain characteristics, deblocking filter display preferences, etc. For example, the current semantics of the statistical data in sub-sequence layer characteristics and sub-sequence characteristics are for the H.264/AVC base layer only, but similar statistics could also be meaningful for pictures with particular values of dependency_id and quality_level. In another example, the indicated spare picture is sufficient for the base layer, but when the picture quality improves in enhancement layers, the corresponding picture in an enhancement layer may not be sufficient as a spare picture.