The technology described herein relates to a method of and apparatus for encoding data arrays, and in particular for encoding streams of data arrays, such as in the case of encoding frames of video, e.g., for display.
It is common to encode a stream of arrays of data elements, such as arrays of image data values (e.g. frames of video for display), so as to compress the data in order to reduce bandwidth and memory consumption. This is particularly desirable in data processing apparatus, e.g. of portable devices, where processing resources and power may be limited.
In order to encode a stream of arrays of data elements, each array of data elements is often divided into smaller “source” blocks of data elements and encoded on a block by block basis based on the difference between the source block and a “reference” block of data elements (which may be a predicted block of data elements) that is derived from one or more arrays of the stream of arrays.
The particular encoding options to use when encoding an array typically vary from region to region of the array. For example, the particular size of the source block(s), the particular way in which the reference block(s) are derived, etc., may be different for different regions of the array.
The particular encoding options to use when encoding an array are often selected by calculating cost values for various different sets of encoding options for a region and then selecting one or more particular sets of encoding options to use when encoding that region of the array that have an acceptably low cost value (e.g. using a so-called “Rate Distortion Optimisation” (RDO) process).
FIG. 1 illustrates this process 100 of selecting an encoding option to use for a source block of data elements of a data array, encoding the source block using the selected encoding option, and, subsequently, decoding the encoded block for use.
As shown in FIG. 1, for a given source, e.g. video, frame to be encoded, a set of encoding options to consider for a region of the source frame is selected (step 102).
A cost value is then determined for that set of encoding options (step 104). This is repeated for each set of encoding options to be considered (steps 106 and 108).
Once all of the sets of encoding options have been considered then one or more of the sets of encoded options is selected as the encoding option to use for the source frame blocks (step 110) and each of the one or more source blocks for the region are then encoded in accordance with the selected set of encoding options for that source block (step 112).
The encoded source blocks may then, e.g., be written out to memory (step 114) and subsequently read and decoded for output (e.g. display) (steps 116, 118 and 120).
As discussed above, one element of the above encoding process is to assess and compare the cost of different encoding possibilities for blocks of a source frame to be encoded. This is typically done using an RDO process.
An example RDO process 500 for calculating a cost value for one particular set of encoding options under consideration is shown in FIG. 2.
The RDO process 500 of FIG. 2 initially comprises subtracting (−) the data elements of a particular source block (Src) for a region from the data elements of a particular reference (predicted) block (Pred) to generate a set of difference values (a set of “residuals”).
The reference (predicted) block (Pred) may be another block within the source frame itself (i.e. using “intra” mode encoding), or it may be a block formed from one or more other frames of the sequence of frames, i.e. using “inter” mode encoding. In the latter case a motion estimation process may be used to search within one or more reference frames for one or more suitable candidate reference blocks to consider using when encoding a source block. Such a candidate reference block can be derived by using a motion vector that describes how a particular block of a particular frame is mapped (e.g. by translation, rotation and/or scaling) to that candidate reference block. The appropriate reference (predicted) block to use may be selected using any suitable metric that indicates the similarity or difference between the source block and the potential reference block in question, such as a sum of absolute differences (SAD) value.
A forward discrete cosine transformation process (F-DCT) is then applied to the set of difference values (residuals) to generate a set of frequency domain coefficients. A quantisation process (Q) is then applied to the set of frequency domain coefficients to generate a set of quantised frequency domain coefficients.
These steps essentially replicate the steps that would be applied prior to encoding the frequency domain coefficients for the source block, for example using entropy encoding. Thus, at this point, a bit count process (Bitcount) can be applied to the set of quantised frequency domain coefficients to determine a bit count cost that would be incurred when encoding the source block in accordance with the particular set of encoding options under consideration.
The bit count process may also take account of other bit costs, such as a measure of the bits needed to specify the prediction type being used, if desired.
A de-quantisation process (DQ) is then applied to the set of quantised frequency domain coefficients to generate a set of de-quantised coefficients. An inverse discrete cosine transformation process (I-DCT) is then applied to the set of de-quantised frequency domain coefficients to generate a set of reconstructed difference values. The set of reconstructed difference values is then added (+) to the reference (predicted) block to generate a reconstructed source block.
These steps essentially replicate the steps that would be applied so as to reconstruct the source block subsequent to decoding the encoded frequency domain coefficients for the source block. Thus, at this point, the data elements of the reconstructed source block are subtracted (−) from the data elements of the original source block to generate a set of error values and a sum-square distortion measuring process (Σx2) is then applied to the set of error values to determine the total amount of distortion that would be introduced when encoding and then decoding the source block in accordance with the particular set of encoding options under consideration.
A cost value process (Cost) is then performed to determine an overall cost value for the particular set of encoding options from the bit count value and the distortion value.
As shown in FIG. 1, the RDO process of FIG. 2 is repeated in accordance with various different sets of encoding options for the region (e.g. different source block sizes, differently derived reference blocks, etc.) to produce a cost value for each of those different sets of encoding options. One or more sets of encoding options to use when encoding the region of the array are then selected based on the cost values for the different sets of encoding options.
(The RDO process is repeated across the array of data elements to select the sets of encoding options to use when encoding the remaining regions of the array of data elements.)
As discussed above, one aspect of the encoding of an array of, e.g. video, data according to many encoding standards is the derivation and encoding of a set of frequency domain coefficients representing, in the frequency domain, the differences (the residuals) between a block being encoded and a reference (e.g. predicted) block.
Many, e.g. video, encoding arrangements and standards support the option of not encoding any (quantized) frequency domain coefficients at all for a source block being encoded (such that the block will just rely on and use the reference (e.g. the predicted) block to represent that part of the array (frame) being encoded at the decoder (in the decoded frame).
The decision to omit the encoding of (quantized) frequency domain coefficients for a block being encoded may be based on, for example, a comparison of the increased image distortion if the encoding of the (quantized) frequency domain coefficients block is omitted to the decrease in bits that will be provided by not encoding the frequency domain coefficients (with the encoding of the (quantized) frequency domain coefficients for the block being omitted if the gains are determined to outweigh the losses). A block for which the encoding of the (quantized) frequency domain coefficients has been omitted is usually indicated as such in the encoded bit stream representing the data array (e.g. video frame), such that the decoder can identify the presence of such a block and reuse the reference (e.g. predicted) block for that block (without adding any differences or residuals to it) accordingly.
Video data (video frames) is often provided as data in the “YUV” colour space. YUV colours are described by a “luma” (Y) value representing the luminance or brightness of the colour, and two “chroma” values (U and V) representing the chrominance information of the colour.
In the case of “YUV” video data, for example, for respective data elements within the data array representing, e.g., the video frame, appropriate luma (Y) and chroma (U and V) values will be stored. In such arrangements, the luminance part (“luma”) is typically encoded separately from the two chrominance parts (“chroma”). In particular, the differences between the reference (e.g. predicted) block and the source block (the “residuals”) are typically encoded separately for the chroma and luma parts of the video data.
When determining the encoding options to use for encoding a frame YUV video data, it is possible to consider only the luma data elements, since the luma data elements alone can be used, for example, to establish any motion that is to be used for a motion estimation process, and, equally, the results of the motion estimation (the prediction) determined from the luma data elements can usually be satisfactorily used for the chroma parts of the frame as well. Thus it is, for example, possible to perform the motion estimation process using the luma data for the video frame only, and to apply the results of that luma processing to the chroma parts of the video frame, without actually testing the chroma parts of the video frame themselves. This can then allow, for example, motion estimation processing of the chroma data elements to be omitted, thereby saving both execution time and power.
In the case where the motion estimation process only considers the luma part of a YUV data array (e.g. video frame), then that motion estimation will accordingly identify blocks for which the encoding of the (quantized) frequency domain coefficients for the luma part can be omitted (i.e. for which there will be no luma quantized frequency domain coefficients for the differences (residuals) between the reference block and the source block encoded).
While it would be possible in such arrangements to, whenever the encoding of the (quantized) frequency domain coefficients for the luma block has been omitted, also correspondingly omit the encoding of the quantized frequency domain residual (differences) coefficients for the chroma parts of the block in question as well, the Applicants have recognised that that can lead to and result in unwanted colour artefacts in the reconstructed image. Thus, omitting the encoding of the (quantized) frequency domain coefficients for blocks for the chroma parts of a YUV data array being encoded based only on a motion estimation analysis of the luma parts of the data array can lead to undesirable distortions and artefacts in the reconstructed data array.
On the other hand, the Applicants have recognised that to perform a full analysis to determine when the encoding of the (quantized) frequency domain coefficients for the chroma parts of a block of a YUV data array to be encoded can be omitted would increase the processing burden on the encoding process, and in particular, potentially negate any advantage to be gained by considering only the luma part of the data array when selecting the encoding option to use for blocks of a YUV data array to be encoded.
The Applicants accordingly believe that there remains scope for improved techniques for encoding YUV data arrays, such as frames of video data that are encoded in a YUV format.