The present invention concerns the generation of image data, particularly, though not exclusively for motion estimation in the context of video coders employing inter-frame differential coding.
FIG. 1 shows a known form of video coder. Video signals (commonly in digital form) are received by an input buffer 1. A subtractor 2 forms the difference between the input and a predicted signal from a frame store 3 which is then further coded in box 4. The coding performed here is not material to the present invention, but may include thresholding (to suppress transmission of zero or minor differences) quantisation or transform coding for example. The input to the frame store is the sum, formed in an adder 5, of the prediction and the coded difference signal decoded in a local decoder 6 (so that loss of information in the coding and decoding process is included in the predictor loop).
The differential coding is essentially inter-frame, and the prediction could simply consist of a one-frame delay provided by the frame store 3; as shown however a motion estimator 7 is also included. This compares the frame of the picture being coded (also referred to herein as the reference frame) with the previous frame (also referred to herein as the target frame) being supplied to the predictor. For each block of the current frame, into which the picture is regarded as divided, the motion estimator identifies that region of the previous frame which the block most closely resembles. The vector difference in position between the identified region and the block in question is termed a motion vector (since it usually represents motion of an object within the scene depicted by the television picture) and is applied to a motion compensation unit 8 which serves to shift the identified region of the previous frame into the position of the relevant block in the current frame, thereby producing a better prediction. This results in the differences formed by the subtractor 2 being, on average, smaller and permits the coder 4 to encode the picture using a lower bit rate than would otherwise be the case.
The motion estimator must typically compare each block with the corresponding block of the previous frame and regions positionally shifted from that block position; in practical systems this search is limited to a defined search area rather than being conducted over the entire frame, but even so it involves a considerable amount of processing and often necessitates many accesses to stored versions of both frames. Note that this requires that the input buffer 1 introduces sufficient delay that the motion estimator 7 has access to the current block and its search area to complete its motion estimation for that block before it arrives at the subtractor 2.
Usually the motion estimator regards a “current” frame of a television picture which is being coded as being divided into 8×8 blocks—that is, eight picture elements (pixels) horizontally by eight lines vertically. Although the principles are equally applicable to interlaced systems, for simplicity of description a non-interlaced picture is assumed. It is designed to generate for each block a motion vector which indicates the position of the 8×8 region, lying within a defined search area of the (or a) previous frame of the picture, which is most similar to the block in question (alternatively, a motion vector may be associated with a 16×16 macro block). FIG. 2 illustrates a field with an 8×8 block N (shaded) and a typical associated 23×23 search area indicated by a rectangle SN. If the pixels horizontally and lines vertically are identified by co-ordinates x, y, with an origin at the top left-hand corner, then the search area for a block whose upper left hand corner pixel has co-ordinates xN,yN is the area extending horizontally from (xN−8) to (xN+14) and vertically from (yN−8) to (yN+14).
In order to obtain the motion vector it is normal to conduct a search in which the block is compared with each of the 256 possible 8×8 regions of the previous frame lying within the search area—i.e. those whose upper left pixel has co-ordinates xN+u, yN+v where u and v are in the range −8 to +7. The motion vector is the values of u,v for which the comparison indicates the greatest similarity. The test for similarity can be any conventionally used—e.g. the sum of the absolute values of the differences between each of the pixels in the “current” block and the relevant region of the previous frame.
Many video compression algorithms support motion vectors to fractional pixel accuracy. For example, MPEG-1 (IS 11172-2), MPEG-2 (IS 13828-2), H.263 and MPEG-4 (IS 14496-2) allow motion vectors to half pixel accuracy. H.264 (also known as IS 14496-10 and Advanced Video Coding (AVC)) allows motion vectors to quarter pixel accuracy.
Fractional pixel motion vectors are usually computed in a two step process. In the first step, integer motion vectors only are considered, and the best motion vector is computed. In the second step, fractional pixel motion vectors around the best integer motion vector are considered and the best is computed.
In standard compression algorithms that support motion vectors to half pixel accuracy, the values of the pixels at half unit offsets are calculated by simple averaging of neighbouring pixel values. Fast half pixel motion searching techniques are known where the values of pixels at half unit offsets around the best integer motion vector are calculated “on the fly” and are used when calculating SOADs (Sum Of Absolute Differences) for the eight half pixel offset positions, (0.5, 0), (0.5, −0.5), (0, −0.5), (−0.5, −0.5), (−0.5, 0), (−0.5, 0.5), (0, 0.5), (0.5, 0.5).
In H.264, which allows motion vectors to quarter pixel accuracy, the values of the pixels at half unit offsets are calculated by applying a six tap filter to neighbouring pixel values, and the values of the pixels at quarter unit offsets are calculated by simple averaging of neighbouring pixel and half-pixel values. In this case, fast fractional pixel motion searching can not easily calculate the values of pixels at fractional pixel offset positions on the fly because of the number of pixel values that need to be calculated. It is more efficient in terms of processing power to calculate values of pixels at fractional pixel offset positions preferably once and then store them for later motion searching. That is, the values for the fractional pixels are calculated and stored in a first step, and in a subsequent step, the similarity tests are performed.
The storage of the values of pixels at fractional pixel offset positions becomes even more beneficial as more previous pictures are considered for motion compensation. H.264 allows up to 16 previous pictures to be used. H.263 Annex U allows multiple previous pictures, and so despite allowing only motion vectors to half pixel accuracy, there is benefit in storing the values of pixels at fractional pixel offset positions rather than calculating them “on the fly”.
The stored pixels (integer and fractional) will each have a respective address which can be used to selectively retrieve a pixel from the memory. The addresses, each of which is normally formed by a binary word, can be viewed as a one-dimensional ordered sequence (the position of an address along the one-dimensional sequence need not be related to the physical position of the corresponding memory location).
The fractional (interpolated) pixels are normally stored together with the integer (sampled) pixels so as to form a single up-sampled image, the up-sampled image having a resolution corresponding to the separation between the fractional pixels. In terms of the addressing arrangement, the address of pixels are incremented in a raster like fashion, with neighbouring pixels (integer and fractional) along a line having consecutive addresses. This may be considered desirable in many situations, since reading the addresses in order will produce a raster signal of image at the up-sampled resolution. However, it has been appreciated by the present inventors that for some applications, such as motion searching, such prior art addressing arrangements can be inconvenient.
According to the present invention there is provided a method of generating image data using a set of sampled pixels arranged along a plurality of lines, the method including the steps of: (i) at intermediate positions between sampled pixels, interpolating the sampled pixels so as to provide at least one set of interpolated pixels, the or each set of interpolated pixels having a respective offset relative to the set of sampled pixels, and, (ii) storing the or each set of interpolated pixels with a respective address in a memory, the addresses forming an ordered sequence, wherein along a given line, consecutive interpolated pixels having the same offset are stored with respective addresses that are consecutive to one another in the ordered sequence.
Because at least along a given line, pixels with the same offset are stored with consecutive addresses in the ordered sequence, the retrieval and manipulation of such pixels will be easier than it would be in situations where the sampled and interpolated pixels are simply stored in the order in which they are positioned along the line.
Preferably, the pixels are interpolated at least at a half unit offset and a quarter unit offset from sampled pixels (the unit being determined by the spacing of the sampled pixels). By providing interpolations at half unit and quarter unit offsets, the image will effectively be up-sampled, allowing the processing of the image, for example through the calculation of motion vectors to be carried out more accurately than with the sampled pixels alone. However, the interpolation may be carried out to provide yet further accuracy, for example with offsets of ⅛ of a unit or smaller.
The respective addresses of interpolated pixels originating from a plurality of lines but having a common offset may be grouped together in the ordered sequence. For a given offset position, the pixels from each of the lines may be grouped in this way, with the result that the ordered sequence includes a plurality of portions which follow-on from one another, each portion being representative of a raster-scan of the image notionally sampled (or actually sampled in the case of sampled pixels) at a respective offset position.
However, pixels (interpolated and sampled) need not be grouped solely according to their offset position. The respective addresses of interpolated pixels on a given line may be grouped together in the ordered sequence. That is, for a given line, pixels from that line may be grouped, pixels from a line-group being further divided into sub-groups, each sub-group corresponding to a given offset position.
The pixels may be grouped according to both offset position and the line in which they occur. Thus, pixels with at least one first common offset position (or the sampled pixels) at different lines may be grouped together, and the remaining pixels may be grouped on a line-by-line basis (for each line, pixels with the same offsets being themselves grouped).
In a first embodiment, pixels with a given offset on each line are stored together; that is, for a given offset, each line of the image is incremented when generating the ordered sequence until each pixel with the given offset is stored, following which the lines are again incremented for the next offset position. In a second embodiment, for a given line, the offset position is incremented: once all the offset positions of a given line have been stored, pixels values on the next line are stored. In a third embodiment, the sampled pixels are stored together as in the first embodiment, the remaining (interpolated) pixels being stored as in the second embodiment. In a fourth embodiment, first the sampled pixels are stored, followed by pixels interpolated at a ½ unit offset, followed by pixels sampled at ¼ unit offset (in each of these embodiments, the order may be reversed).
The stored data will preferably be used to generate a motion vector in respect of a reference frame relative to a target frame, the target frame being formed at least in part by image data stored as described above. Preferably, the generation of the motion vector will include the further steps of: selecting an image portion in the reference frame; and, comparing the selected portion with one or more portions of corresponding size in the target frame.