Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Over the last 25 years, various video codec standards have been adopted, including the ITU-T H.261, H.262 (MPEG-2 or ISO/IEC 13818-2), H.263, H.264 (MPEG-4 AVC or ISO/IEC 14496-10) standards, the MPEG-1 (ISO/IEC 11172-2) and MPEG-4 Visual (ISO/IEC 14496-2) standards, and the SMPTE 421M (VC-1) standard. More recently, the H.265/HEVC standard (ITU-T H.265 or ISO/IEC 23008-2) has been approved. A video codec standard typically defines options for the syntax of an encoded video bitstream, detailing parameters in the bitstream when particular features are used in encoding and decoding. In many cases, a video codec standard also provides details about the decoding operations a video decoder should perform to achieve conforming results in decoding. Aside from codec standards, various proprietary codec formats define options for the syntax of an encoded video bitstream and corresponding decoding operations.
At a low level, a bitstream of encoded video is a series of bits (zeros and ones) that form the coded representation of the video. A bitstream is organized according to rules defined in a video codec standard or format. When parsing a bitstream, a decoder reads one or more bits at a current position in the bitstream and interprets the bit(s) according to the rules that apply at the current position in the bitstream. After updating the current position to shift out the bits that have been read and interpreted, the decoder can continue by reading and interpreting one or more bits at the current (updated) position in the bitstream. To parse a bitstream correctly, a decoder tracks the current position in the bitstream and applies the appropriate rules for bit(s) read at the current position. If encoded data in the bitstream is lost or corrupted (e.g., due to network congestion or noise), the decoder may lose synchronization between the current position in the bitstream and correct rules to apply. In this case, the decoder may incorrectly interpret bits read from the bitstream, causing decoding to fail.
Some codec standards and formats use start codes to designate the boundaries of separate units of encoded data in a bitstream. In general, a start code is a sequence of bits that only appears in the encoded data when marking the start of a unit of encoded data. If a decoder starts decoding in the middle of a bitstream, or if a decoder loses synchronization when parsing a bitstream (e.g., because of loss or corruption of encoded data), the decoder can locate the next start code in the bitstream and begin parsing encoded data from that position, which is the start of some type of unit according to the codec standard or format. In the SMPTE 421M standard, for example, a start code is a four-byte value, which includes the three-byte prefix 0x000001 (in binary, 23 zeros followed by a 1) and a one-byte suffix that identifies the type of bitstream data unit at the start code. As another example, in the H.264 standard and H.265 standard, a start code begins with a three-byte prefix 0x000001. In the H.264 standard, the start code prefix is followed by the first byte of a network abstraction layer (“NAL”) unit, which includes an identifier of the type of the NAL unit. In the H.265 standard, the start code prefix is followed by a two-byte NAL unit header, which includes a type identifier for the NAL unit. During regular operation, a decoder typically scans encoded data in a bitstream to identify start codes and thereby determine lengths of units of encoded data. A decoder may also scan for the next start code if synchronization or byte alignment has been lost. (Encoded data can be scanned byte-after-byte, with start codes aligned with byte boundaries. If synchronization is lost, byte alignment may also be lost. In this case, a decoder may scan bit-after-bit for a pattern such as a zero-value byte followed by a start code, in order to recover byte alignment.)
In a bitstream, encoded data includes values for different parameters, one value after another. This can cause a problem if, inadvertently, some combination of values matches (emulates) a start code. Some codec standards and formats address this concern by defining values such that no valid combination can possibly emulate a start code. More recently, some codec standards use start code emulation prevention (“SCEP”) processing to address this concern. For SCEP, an encoder can scan encoded data to identify any pattern of bits that inadvertently matches (emulates) a start code. The encoder then disrupts this pattern. For a bitstream defined according to the SMPTE 421M standard, H.264 standard, or H.265 standard, for example, an encoder can insert a SCEP byte of 0x03 (in binary, 00000011) whenever the encoder encounters the pattern 0x000000, 0x000001, 0x000002, or 0x000003 in encoded data, resulting in the pattern 0x00000300, 0x00000301, 0x00000302, or 0x00000303. (In each of these patterns, the third byte is the inserted SCEP byte 0x03.) In binary, whenever the encoder finds the bit pattern 00000000 00000000 000000xx (where xx represents any two-bit pattern), the encoder can replace that bit pattern with 00000000 00000000 00000011 000000xx, where 00000011 is the SCEP byte. In this way, emulation of the start code prefix, which is 23 zeros followed by a one, is disrupted, since the replacement pattern includes at most 22 zeros followed by a one. To undo SCEP, after locating start codes for the current unit (and perhaps the next unit), but before parsing encoded data for the current unit, a decoder can scan the encoded data of the current unit to find any occurrences of the bit pattern 00000000 00000000 00000011 000000xx. If such a pattern is encountered, the decoder can remove the SCEP byte, leaving 00000000 00000000 000000xx, which is the original bit pattern of encoded data.
While SCEP bytes provide an effective way to prevent emulation of start codes within encoded data, using SCEP bytes adds processing overhead. For example, during or after encoding, an encoder scans encoded data, or otherwise tracks encoded data for output, in order to identify any pattern that should be disrupted with a SCEP byte. Before decoding a given unit of encoded data, a decoder scans the encoded data to identify any pattern from which a SCEP byte should be removed. Although the operation of inserting or removing a SCEP byte is simple, scanning encoded data on a byte-by-byte basis for occurrences of relevant bit patterns can require significant resources. Also, SCEP bytes increase the amount of data in a bitstream. For some units (e.g., units with encoded data in which the pattern 0x000000 is common), the increase in bit rate due to SCEP bytes can be significant.