Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Over the last 25 years, various video codec standards have been adopted, including the ITU-T H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263, and H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards, the MPEG-1 (ISO/IEC 11172-2) and MPEG-4 Visual (ISO/IEC 14496-2) standards, and the SMPTE 421M (VC-1) standard. More recently, the H.265/HEVC standard (ITU-T H.265 or ISO/IEC 23008-2) has been approved. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. For example, recent video codec standards (e.g., H.264/AVC, H.265/HEVC) define various syntax structures, where a syntax structure is a set of zero or more syntax elements (elements of data) in the bitstream in a specified order. In many cases, a video codec standard also provides details about the decoding operations a decoder should perform to achieve conforming results in decoding. Aside from codec standards, various proprietary codec formats define other options for the syntax of an encoded video bitstream and corresponding decoding operations.
In recent video codec standards (e.g., H.264/AVC, H.265/HEVC), a picture is organized as one or more slices, where a slice is set of blocks (e.g., macroblocks in the H.264/AVC standard; coding tree units in the H.265/HEVC standard). The encoded data for a slice is organized in a specific syntax structure, which is contained in a network abstraction layer (“NAL”) unit. A NAL unit is a syntax structure that contains (1) an indication of the type of data to follow and (2) a series of zero or more bytes of the data (e.g., the encoded data for a slice). The size of the NAL unit (in bytes) may be indicated outside the NAL unit or may be measured by identifying the boundaries between NAL units in a byte stream format (e.g., in some cases, a decoder can measure the sizes of NAL units when the decoder searches for start codes that begin the NAL units, and in other cases the size of a NAL unit might be indicated by “out-of-band” information such as data carried in a data field according to a multimedia system multiplexing protocol, packet network protocol, or file format). An access unit is a set of one or more NAL units containing the encoded data for the slice(s) of a picture (and possibly other associated data such as metadata).
For decoding according to the H.264/AVC standard or H.265/HEVC standard, a decoder may be designed to start the decoding process for a given picture after the decoder has received a coded slice for the given picture, in which case the decoder can start to decode the coded slice. In many implementations, however, a decoder is designed to start the decoding process for a given picture after the decoder has received all of the encoded data for the given picture. To determine that it has received all of the encoded data for the given picture, the decoder can wait until it receives encoded data for the next picture (in the next access unit), which includes one or more syntax elements that indicate the start of encoded data for the next picture. Or, to determine that it has received all of the encoded data for the given picture, the decoder can fully parse the encoded data for the given picture. Either approach can introduce delay or extra complexity in the decoding process, which is not desired, especially in real-time video communication scenarios in which having very low latency is critical (such as video conferencing, wireless “screen casting” from a computing device to a nearby display, remote video gaming, etc.).
In another approach, within a media playback tool or particular system environment, one component within the media playback tool (or particular system environment) can create a custom syntax structure that is used to signal the end of encoded data for a given picture, then provide that syntax structure to a decoder within the media playback tool (or particular system environment). This custom syntax structure is provided outside the elementary bitstream that includes encoded video data, e.g., in system-level information such as system multiplexing data. Thus, this approach does not carry the custom syntax structure within the elementary bitstream of a video codec standard or format. It lacks general applicability, and it excludes an encoder or other component outside the specific media playback tool (or particular system environment) from involvement.