The technology described herein relates to methods of and apparatus for encoding data arrays, and in particular to methods of and apparatus for performing motion estimation when performing video encoding.
Video image data is typically generated as a sequence of frames, each frame comprising an array of pixels (data positions). In “differential” video coding standards such as MPEG and VP9, frames in a sequence of frames to be encoded (which can be thought of as “source” frames to be encoded) are usually encoded with reference to another, e.g. previous, frame or frames of the sequence of frames to be encoded (which other frame or frames can accordingly be thought of as a “reference” frame(s) for the (source) frame being encoded). This is usually done by dividing a source frame into a plurality of blocks of pixels and encoding (and subsequently decoding) each block with respect to a reference block (or blocks) from a reference frame (or frames).
Each pixel block for a source frame that has been differentially encoded is usually defined by a vector value (the so-called “motion vector”) pointing to the reference frame block in the reference frame and data (the “residual”) describing the differences between the data in the source (i.e. current) frame block and the reference frame block. (This thereby allows the video data for the block of the source frame to be constructed from the reference frame video data pointed to by the motion vector and the difference data describing the differences between that reference frame block and the block of the source frame.)
An important aspect of such differential encoding of video data is identifying which area of the video frame being used as a reference frame for the encoding process is most (or at least sufficiently) similar to the source frame block being encoded (such that there is then a reduced or minimum number of differences to be encoded). This is complicated by the fact that, typically, the area of the reference frame that most closely matches a given block in a current frame being encoded will not be in the same position within the reference frame as it is in the current frame, e.g. because objects and/or the camera position move between frames.
Differential encoding of video data typically therefore involves firstly identifying the position in the intended reference video frame of the block in that frame that most closely matches the block of the video frame currently being encoded.
The process of identifying the block position in a reference frame to use when differentially encoding a block in a video frame being encoded is usually referred to as “motion estimation”. This process is usually carried out by comparing the video data values (usually luminance values) for the block being encoded with a plurality of corresponding-sized blocks each having a different position in the reference video frame until the closest (or a sufficiently close) match in terms of the relevant video data values is found. The relative match between the blocks being compared is assessed using a suitable difference (error) measure, such as a sum of absolute differences (SAD) function. The vector pointing to the so-identified block position in the reference frame is then used as the “motion vector” for the block being encoded.
Because this motion estimation operation to find the motion vector to be used for encoding a block of a video frame is done, in effect, by calculating a lot of correlations between the block to be encoded and different offsets into the reference frame, it is one of the most computationally heavy operations that is performed when encoding video data. This may be exacerbated in arrangements where motion vectors can be tested for and determined at resolutions smaller than a whole pixel, such as quarter pixel or one eighth pixel resolution. In that case, in order to determine the difference measure when considering block positions in the reference frame that are not aligned with integer pixel positions, it is necessary to determine data values for reference frame positions that do not align with pixel positions in the reference frame (e.g. that are intermediate the reference frame pixels). This is usually done by interpolation from the (known) pixel data values. This can be a very computationally intensive operation.
To reduce the amount of computation that is required for this process, it is known therefore to initially assess potential candidate blocks in the reference frame for use when encoding a block of a video frame at a coarser resolution (e.g. at integer pixel positions), and to use the results of that assessment to identify a smaller region of the reference frame to then assess at a finer resolution (e.g. at a sub-pixel resolution), and so on, if desired, so as to gradually spiral down to finer and finer resolutions until a motion vector for the desired finest resolution (e.g. at a quarter pixel resolution) has been determined.
The Applicants believe that there remains scope for improved techniques for performing the motion estimation operation used for video encoding.
Like reference numerals are used for like features throughout the drawings, where appropriate.