The technology described herein relates to a method of and apparatus for encoding data arrays, and in particular for encoding streams of data arrays, such as in the case of encoding frames of video data, e.g., for display.
It is common to encode a stream of arrays of data elements, such as arrays of image data values (e.g. frames of video for display), so as to compress the data in order to reduce bandwidth and memory consumption. This is particularly desirable in data processing apparatus, e.g. portable devices, where processing resources and power may be limited.
Many, e.g. video, encoding processes use a two-step process when encoding arrays of data elements, such as input images: a first step to process the input images to generate a set of intermediate data representing the input images; and then a second step of encoding (e.g. entropy encoding) the intermediate data to provide an output set of data (bitstream) representing the input data arrays in an encoded form.
FIG. 1 illustrates this process, and shows input data arrays (e.g. images (pictures)) 1 to be encoded being subject first to a data element (e.g. pixel) processing step 2, to generate “intermediate data” 3 representing the input images, followed by an “entropy encoding” step 4 which encodes the intermediate data 3 to provide an output bitstream 5 that represents the input data arrays (e.g. pictures) in an encoded (e.g. compressed) form.
The first, data elements (pixel) processing step 2 of such encoding methods typically divides each array of data elements (e.g. image) to be encoded into plural smaller “source” blocks of data elements, and processes the array on a block-by-block basis. For each source block of data elements being considered, typically a corresponding “reference” block of data elements (which may be a predicted block of data elements) that is derived from one or more arrays of the stream of data arrays is determined, and then a set of difference values representative of the difference between the source block and the determined “reference” block of data elements is determined.
The reference (predicted) block may be another block within the same data array (i.e. using “intra” mode encoding), or it may be a block formed for one or other data arrays of the sequence of data arrays (i.e. using “inter” mode encoding). A “motion estimation” process may be used to search within one or more reference frames for one or more suitable candidate reference blocks to consider when encoding a source block. The reference block used for a “source” block being encoded is typically indicated to the decoder by using a motion vector that describes how the source block is mapped (e.g. by translation, rotation and/or scaling) to the reference block.
As well as determining a reference (predicted) block to use when encoding a block of a data array (and the corresponding motion vector(s) defining that reference block), the data element processing step 2 will also then determine the differences between the (source) block of the data array being encoded and its respective determined reference (predicted) block, so as to provide a set of difference values (residuals) representative of the difference between the source block being encoded and its corresponding “reference” block of data elements.
A transformation, such as a forward discrete cosine transformation process, is then typically applied to the set of difference values (residuals) to generate a set of frequency domain coefficients representing the differences (residuals) in the frequency domain. A quantization process is typically then applied to the set of frequency domain coefficients to generate a set of quantized frequency domain coefficients.
Thus, the data element (e.g. pixel) processing step 2 will typically generate for a block of a data array being encoded, motion vector information defining a reference block for the block being encoded, and a set of difference values (residuals) information, typically in the form of a set of quantized frequency domain coefficients. This information is then provided as the intermediate data 3 that is then subjected to the encoding (to the entropy encoding) 4 when generating the output (encoded) bitstream 5 representing the input data array.
This is (usually) done for each block that a data array being encoded is being divided into, so there will be a respective set of intermediate data comprising, e.g., and in an embodiment, motion vector information, and difference value (residuals) information (e.g., and in an embodiment, in the form of a set of quantized frequency domain coefficients), for each block that a given data array has been divided into for encoding purposes.
(The particular encoding options to use when encoding an array may be calculated by calculating cost values for various different sets of encoding options for a region of the array, and then selecting one or more particular sets of encoding options to use when encoding that region of the array that have an acceptably low cost value, e.g. using a so-called “rate distortion optimisation” (RDO) process. Such an RDO process may be performed as part of the data element (pixel) processing 2, with the intermediate data 3 that is generated by that step then comprising the intermediate data generated for the particular encoding option that is selected for the regions and blocks of the data array.)
As shown in FIG. 1, the intermediate data that is generated for the blocks of a data array (and for data arrays of a stream of data arrays) is subjected to an encoding process 4, which is typically an entropy encoding process, to provide an output bitstream 5 representing the input data arrays in an encoded (and, e.g., compressed) form. The output bitstream 5 may, e.g., be written out to memory and/or transmitted, e.g. for subsequent decoding and output, e.g. for display.
One important aspect of a data array, e.g. video, encoding process of the type illustrated in FIG. 1, particularly in the case of real-time, e.g., video encoding, particularly in the case of lower powered and/or portable devices, is often to provide the output bitstream 5 at a particular target output bit rate (e.g. number of bits per second). Typically, the encoder will try to maximise visual quality while keeping a target number of bits per second. This is commonly referred to as “rate control”.
In order to achieve this, the, e.g., video, encoder, will typically vary an aspect or aspects of the intermediate data generation by the data element (pixel) processing 2 so as to affect the number of output bits that will be generated for the output bitstream 5. Typically this is done by varying some form of “quality” parameter of the data elements (pixel) processing 2, such as a quantization parameter that is applied to the set of frequency domain coefficients when generating the set of quantized frequency domain coefficients representing the difference values (residuals) for a given block of a data array being encoded.
In the case where a sequence of data arrays (e.g. video frames) is being encoded “offline”, then such rate control could be achieved simply by, if necessary, re-processing a and each data array (e.g. video frame) a number of times, until the desired number of bits per frame is achieved.
However, in the case of “real-time” encoding, for example, it may not be possible to process a data array multiple times in order to facilitate good rate control. In this case, the rate control operation could be based, e.g., on the known number of bits that have been produced when encoding earlier data arrays (e.g. frames) in the sequence of data arrays being encoded, but that information may not always be readily or immediately available (or completely up-to-date), e.g. particularly in the case where the encoding process is distributed across a number of parallel processors (which may be desirable for speed, throughput and/or power consumption purposes).
The Applicants believe therefore that there remains scope for improvements to the output bit rate control operation when encoding sequences of data arrays, particularly in the case of real-time and/or distributed encoding of sequences of data arrays.