Many techniques are known in the art to deal with compression and decompression of multidimensional signals or of signals evolving along time. This is the case of audio signals, video signals and other multidimensional signals like volumetric signals used in scientific and medical areas.
In order to achieve high compression ratios, those techniques exploit the spatial and time correlation inside the signal. Conventional methods identify a reference and try to determine the difference of the signal between a current location and the given reference. This is done both in the spatial domain, where the reference is a portion of already received and decoded spatial plane, and in the time domain, where a single instance in time of the signal (e.g., a video frame in a sequence of frames) is taken as a reference for a certain duration. This is the case, for example, of MPEG-family compression algorithms, where previously-decoded macro blocks are taken as reference in the spatial domain and I-frames and P-frames are used as reference for subsequent P-frames in the time domain.
Known techniques exploit spatial correlation and time correlation in many ways, adopting several different techniques in order to identify, simplify, encode and transmit differences. In accordance with conventional methods, in order to leverage on spatial correlation a domain transformation is performed (for example into a frequency domain) and then lossy deletion and quantization of information is performed. In the time domain, instead, conventional methods transmit the quantized difference between the current sample and a motion-compensated reference sample.
In order to maximize the similarity between samples, encoders try to estimate the modifications along time occurred vs. the reference signal. This is called, in conventional encoding methods (e.g., MPEG family technologies, VP8, etc.), motion estimation and compensation.
Motion information is transmitted to the decoder in order to enable reconstruction of the current sample by leveraging information already available at the decoder for the reference sample: in state-of-the-art methods this is done using motion vectors at a macro block basis. In other words, a motion vector can indicate motion at a block level including multiple display elements.
Traditionally, motion information has been represented by means of offset motion vectors, i.e., vectors indicating the position of a similar portion of a picture (e.g., a block of plane elements, or “pels”, often called picture elements or “pixels” for the case of 2D images) in a reference picture. For example, as discussed above, using block motion compensation (BMC), the images of a video sequence can be partitioned into blocks of pixels. Each block B in a current image can be predicted based on a block B0 of equal size in a reference frame. The position of the block B0 in the reference image with respect to the position of B in the current image can be encoded as an offset motion vector. In such cases, the motion vector indicates the opposite of the estimated x and y movement of the block of pixels (in particular, it indicates the opposite of the movement since it points from B to B0, while the movement is from B0 to B).
A motion vector is typically encoded with sub pixel precision (i.e., can specify movements also of fractions of a pixel) because the encoder wants to be able to capture also subtle movements of less than a full pixel. According to MPEG family codecs, the blocks are not transformed other than being shifted to the position of the predicted block, and additional information must be encoded through residual data indicating differences between block B0 and block B.
Motion estimation is typically referred to as the process of determining motion vectors that suitably describe the transformation from one picture to another, usually from adjacent frames in a video sequence. Motion estimation is typically based on an assumption that image values (brightness, color, etc., expressed in a suitable color space) remain constant over time, though their position in the image may change. The underlying assumption of motion estimation through motion vectors is that the possible movements of the portion of the image identified by the motion vector (e.g., macro-block) are limited to translational movements.
In state of the art technologies, coordinates of motion vectors associated to either a pel or a group of pels are expressed based on a discrete coordinate system (i.e., with a finite set of symbols), either possessing step width of the same resolution as the current image (“pel resolution”, i.e., current image and reference image have the same resolution) or possessing sub-pel resolutions (e.g., by way of non-limiting examples, ¼th of a pel, ⅛th of a pel, etc.). In this last case, the reference image has a higher resolution than the current image, in order to allow a motion vector to point to a given position with sub-pixel resolution (with respect to the resolution of the current image); essentially, the reference image is supersampled with a given scale factor, and the coordinates of motion vectors are expressed with integer numbers in the coordinate system of the supersampled reference image. In other words, even though a display does not have the ability to display such a high resolution, a supersampled (high-resolution) rendition of an image is produced for a given reference image, just to support motion compensation operations. Motion vectors can be used to identify which portion of the rendition of the image is to be used to reconstruct a display signal.
Leveraging motion vectors with sub-pel resolution allows for better precision in motion estimation and in motion compensation, but also implies the significant disadvantage of requiring a higher amount of memory at the decoder side, since the buffer that stores the “super high resolution” of the reference image needs to store a much higher number of pels than the number that it is necessary to display on a respective display screen.
Known encoding techniques based on block motion compensation and on offset motion vectors using integer coordinates (i.e., coordinates with fixed precision, such as ⅛th of a pixel) have several important drawbacks, suitably addressed by novel methods described herein. Most notably, the use of offset coordinates with a given sub-pixel precision typically requires to buffer an upsampled rendition of the reference image at the given sub-pixel resolution: as a consequence, capturing very subtle movements (e.g., 1/128 of a pixel, important for instance in the case of high frame-rate video signals or in the case of complex movements such as a 1% zoom with 2-degree rotation) is not feasible due to memory limitations and to the high amount of computations that would be necessary to calculate the supersampled reference image. Generation and processing of a super high-resolution reference image is undesirable for a number of reasons.