In modern communications systems a video signal may be sent from one device to another over a medium such as a wired and/or wireless network, often a packet-based network such as the Internet. Typically video content, i.e. data which represents the values (e.g. chrominance, luminance) of samples in slices of the video, is encoded by an encoder at the transmitting device in order to compress the video content for transmission over the network. Herein, “slice” means a frame of the video or region of a frame of the video i.e. a frame is comprised of one or more slices. The encoding for a given slice may comprise intra frame encoding whereby (macro) blocks are encoded relative to other blocks in the same slice. In this case a target block is encoded in terms of a difference (the residual) between that block and a neighbouring block. Alternatively the encoding for some frames or slices may comprise inter frame encoding whereby blocks in the target slice are encoded relative to corresponding portions in a preceding frame, typically based on motion prediction. In this case a target block is encoded in terms of a motion vector identifying an offset between the block and the corresponding portion from which it is to be predicted, and a difference (the residual) between the block and the corresponding portion from which it is predicted. The residual data may then be subject to transformation into frequency coefficients, which are then subject to quantization whereby ranges of frequency coefficients are compressed to single values. Finally, lossless encoding such as entropy encoding may be applied to the quantized coefficients. A corresponding decoder at the receiving device decodes the slices of the received video signal based on the appropriate type of prediction, in order to decompress them for output on a display.
Once the video content has been encoded, the encoded video content is structured for transmission via the network. The coded video content may be divided into packets, each containing an encoded slice. For example, the H.264 and HEVC (High Efficiency Video Coding) standards define a Video Coding Layer (VCL) at which the (e.g. inter/intra) encoding takes place to generate the coded video content (VCL data), and a Network Abstraction Layer (NAL) at which the VCL data is encapsulated in packets—called NAL units (NALUs)—for transmission. The VCL data represents values of samples in the video slices. Non-VCL data, which generally includes encoding parameters that are applicable to a relatively large number of frames or slices, is also encapsulated in NALUs at the NAL. Each NALU has a payload which contains either VCL or non-VCL data (not both) in byte (8 bit)-format, and a two-byte header which among other things identifies the type of the NALU.
The NAL representation is intended to be compatible with a variety of network transport layer formats, as well as with different types of computer-readable storage media. Some packet-orientated transport layer protocols provide a mechanism by which the VCL/non-VCL data can be divided into packets; however, other stream-orientated transport layer protocols do not. With a view to the latter, an H.264 byte stream format is defined, whereby the raw NAL data-comprising encoded VCL data, non-VCL data and NALU header data—may be represented and received at the transport layer of the network for decoding, or from local computer storage, as a stream of data elements. A “stream of data elements” (stream) means a sequence of data elements which is received, and which thus becomes available for decoding, over time so that decoding and outputting of video content in earlier parts of the stream can commence before later parts of the stream have been received. For the H.264 byte stream format, the stream is a byte stream i.e. the data elements are bytes. A similar format is defined in the HEVC standard, the successor to H.264. Some similar format is also adopted in SMPTE VC-1 standard.
Dividing markers, called start code prefixes, are included in the byte stream to mark the boundaries between NALUs so that each NALU header is preceded by a start code prefix marking the start of that NALU. 3 and 4 byte start code prefixes are defined, which are 0x 00 00 01 and 0x 00 00 00 01. Note “0x ij kl . . . ” means each of “ij”, “k1”, . . . is a hexadecimal representation of a 1-byte value e.g. 0x 00 is equivalent to 0000 0000 in binary, 0x 0A to 0000 1010, 0x FF to 1111 1111 etc. At the receiving terminal, the sequence is parsed to identify the start code prefixes and, in turn, the NALUs. The payload data is separated out, and any encoded slices are supplied to the decoder, which decodes the video slices by effectively inverting the various encoding processes. Certain byte patterns—specifically 0x 00 00 0y where y=0, 1 or 2—are illegal within an NALU payload i.e. the device decoding the byte sequence operates on the assumption that these sequences will not occur within a NALU payload so, if they do, this is liable to cause an error in or failure of the separation process. For example, the sequence 0x 00 00 01 is illegal in a payload because that sequence is reserved for the start code prefixes; if it were to occur in an intended payload, the decoding device would mistake it for a start code prefix marking the start of the next NALU and treat it as such. For this reason, at the same time as inserting the start code prefixes the encoding device inserts emulation prevention markers as follows: whenever the byte pattern 0x 00 00 0z, where z=0, 1, 2 or 3, occurs in the NALU payload data, a 1-byte emulation prevention marker, which is an emulation prevention byte, 0x 03 is inserted so that the pattern becomes 0x 00 00 03 0z; at the decoding device, at the same time as parsing the sequence to identify the start code prefixes, occurrences of the byte pattern 00 00 03 0z are identified, and the emulation prevention byte 0x 03 is removed before the relevant part of the stream is decoded; z=3 is included to ensure that 0x 03 bytes which occur ‘naturally’ in the NALU payloads are not mistaken for emulation prevention bytes and mistakenly removed before decoding.