Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the “Joint Collaborative Team on Video Coding” (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).
The Joint Collaborative Team on Video Coding (JCT-VC) has produced a new video coding standard that significantly outperforms the “H.264/MPEG-4 AVC” video coding standard. The new video coding standard has been named “high efficiency video coding (HEVC)”. Further development of high efficiency video coding (HEVC) is directed towards introducing support of different representations of chroma information present in video data, known as ‘chroma formats’, and support of higher bit-depths. The high efficiency video coding (HEVC) standard defines two profiles, known as ‘Main’ and ‘Main10’, which support a bit-depth of eight (8) bits and ten (10) bits respectively. Further development to increase the bit-depths supported by the high efficiency video coding (HEVC) standard are underway as part of ‘Range extensions’ activity. Support for bit-depths as high as sixteen (16) bits is under study in the Joint Collaborative Team on Video Coding (JCT-VC).
Video data includes one or more colour channels. Typically three colour channels are supported and colour information is represented using a ‘colour space’. One example colour space is known as ‘YCbCr’, although other colour spaces are also possible. The ‘YCbCr’ colour space enables fixed-precision representation of colour information and thus is well suited to digital implementations. The ‘YCbCr’ colour space includes a ‘luma’ channel (Y) and two ‘chroma’ channels (Cb and Cr). Each colour channel has a particular bit-depth. The bit-depth defines the width of samples in the respective colour channel in bits. Generally, all colour channels have the same bit-depth, although they may also have different bit-depths.
One aspect of the coding efficiency achievable with a particular video coding standard is the characteristics of available prediction methods. For video coding standards intended for compression sequences of two-dimensional video frames, there are two types of prediction: intra-prediction and inter-prediction. Intra-prediction methods allow content of one part of a video frame to be predicted from other parts of the same video frame. Intra-prediction methods typically produce a block having a directional texture, with an intra-prediction mode specifying the direction of the texture and neighbouring samples within a frame used as a basis to produce the texture. Inter-prediction methods allow the content of a block within a video frame to be predicted from blocks in previous video frames. The previous video frames may be referred to as ‘reference frames’. The first video frame within a sequence of video frames typically uses intra-prediction for all blocks within the frame, as no prior frame is available for reference. Subsequent video frames may use one or more previous video frames from which to predict blocks. To achieve the highest coding efficiency, the prediction method that produces a predicted block that is closest to captured frame data is typically used. The remaining difference between the predicted block and the captured frame data is known as the ‘residual’. This spatial domain representation of the difference is generally transformed into a frequency domain representation. Generally, the frequency domain representation compactly stores the information present in the spatial domain representation. The frequency domain representation includes a block of ‘residual coefficients’ that results from applying a transform, such as an integer discrete cosine transform (DCT). Moreover, the residual coefficients (or ‘scaled transform coefficients’) are quantised, which introduces loss but also further reduces the amount of information required to be encoded in a bitstream. The lossy frequency domain representation of the residual, also known as ‘transform coefficients’, may be stored in the bitstream. The amount of lossiness in the residual recovered in a decoder affects the distortion of video data decoded from the bitstream compared to the captured frame data and the size of the bitstream.
The logic complexity of the transform and the quantiser logic is dependent on factors including the binary width of internal signals (or ‘busses’). Support for higher bit-depths generally requires increasing the width of internal busses. For a given bit-depth, a particular set of ‘extreme’ input data exists that must be supported by video encoders and video decoders. This condition is generally referred to as a ‘worst case’ condition. Such extreme input data, although rarely encountered in practice, is theoretically possible and for a video decoder to claim ‘conformance’ such cases must be correctly processed.
Generally, video coding standards define the required (i.e. ‘normative’) behaviour of a video decoder. From this required behaviour, the architecture of a video encoder may also be inferred. Although a video encoder may be expected to operate within certain limits, it is possible for bitstreams to exist that, while within the normative scope of the video coding standard, exhibit extremes of behaviour that may place unreasonable burden upon implementations of a video decoder. To some extent, such burden can be alleviated by introducing additional restrictions beyond the normative scope of the video coding standard. Such restrictions are considered ‘non-normative’ and may include clipping operations. Non-normative clipping operations would not generally have any affect when decoding bitstreams produced by a video encoder. However, the non-normative clipping operations may come into effect when decoding extreme input data, sometimes referred to as ‘evil’ bitstreams.
A quantiser is said to have a ‘step size’ that is controlled via a ‘quantisation parameter’ (or ‘QP’). The step size defines the ratio between the values output by the transform and the values encoded in a bitstream. At higher quantisation parameter values, the step size is larger, resulting in higher compression. The quantisation parameter may be fixed, or may be adaptively updated based on some quality or bit-rate criteria. Extreme cases of residual coefficient magnitude, resulting from a transform and quantisation parameter, define a ‘worst case’ for residual coefficients to be encoded and decoded from a bitstream. When encoding frame data at a high bit-depths (e.g., 16-bits), it is desirable for a video decoder to reproduce a very close approximation of the frame data. For example, if a large quantiser step size were used for 16-bit data, performance would be similar to using a lower bit-depth in the video encoder and the video decoder. Consequently, high peak signal to noise ratio (PSNR) values are desirable. As such, very low or even negative values for the quantisation parameter may be expected when the bit-depth is 16-bits. Modules within the video encoder and the video decoder separate the quantisation parameter into two portions, a ‘period’ (or ‘QP_per’) and a ‘remainder’ (or ‘QP_rem’). The remainder is the result of a modulo six of the quantisation parameter and the period is the result of an integer division by six of the quantisation parameter. The behaviour of these operations, including at negative quantisation parameters, is exemplified in the Table 1, below:
TABLE 1QP. . .−8−7−6−5−4−3−2−101234567. . .QP_per. . .−2−2−1−1−1−1−1−100000011. . .QP_rem. . .4501234501234501. . .