Image or video processing systems can place heavy demands on memory systems. An image processing device is typically connected to a memory using a bus with a finite bandwidth. In many cases there will also be memory local to the image processing device, operating as a buffer or cache, which aims to reduce the amount of data being read from or written to the memory via the bus. Reading data from memory via a bus is relatively slow, and consumes more power, compared to reading the same data from local memory.
Some memory bandwidth is inevitable. A typical processing system must read each input image frame from memory via a bus, and write each output image frame to an memory via a bus. For High Definition (HD) video, a frame may be 1920 pixels wide and 1080 pixels high. With a typical 420 chrominance subsampling mode, the size of the frame is approximately 3.1 megabytes (MB). For an input frame rate of 24 frames per second (fps) the input data bandwidth alone is approximately 74 MB/sec.
The output video size and frame rate may differ from the input video size and frame rate, depending on the processing being applied. One possible processing algorithm is a high definition motion compensated frame rate converter, which may read input frames at 24 fps, and write output frames at, for example, 120 fps. In this case the output data bandwidth is in excess of 300 MB/sec, in addition to the input data bandwidth of 74 MB/sec described previously. This much data transfer is largely unavoidable.
An example frame rate conversion algorithm operates in two main phases. First, motion estimation computes a vector field representing the motion between a pair of consecutive input frames. Secondly, a picture building or interpolation phase constructs a number of output frames using pixels taken from the input frames, and placed in the output frames at positions determined by the motion vectors.
There are various approaches to motion estimation. A common one is to divide a frame into small rectangular blocks, and for each block, to search for a matching area of pixel data in an adjacent input frame. The search process typically requires the evaluation of each of a number of motion vector candidates, and the quality of the match is determined using a metric such as the sum of absolute differences (SAD) between the pixels in the block and the pixels from the adjacent frame. The positional offset between matching areas of pixels determines the motion vector for that block.
Motion vector candidates may be any vector within a search range. The search range surrounds the block, and its size determines the range of motions that can be detected and tracked. There are often several candidates with similar values, meaning that pixels from the adjacent frame must be read more than once. Therefore, it is appropriate to store the adjacent frame pixel data corresponding to the search range in a local buffer. Once the buffer is filled many vector candidates can be tested without consuming any additional memory bandwidth, however, the memory bandwidth consumption of filling the buffer must be considered.
In the picture building phase of the algorithm, pixel data corresponding to the selected motion vector is projected into its position in the output frame. Several alternative motion estimation results may be projected, with the pixel values in the output frame formed from a composite of each of them. The pixel data used in picture building is also read from the same pixel data buffer that supplies the pixels for motion estimation.
FIG. 1 illustrates one approach to the design of a pixel data buffer. The buffer 100 is shown the same size as the motion estimator's search range 105, and is centred on the block 110 for which motion estimation is taking place. In practice the buffer size may be extended slightly to supply data to other processes, such as the frame rate converter's picture building phase. When motion estimation processing moves to the next block 115, the pixel data buffer discards data corresponding to a column of blocks 120 at one edge of the search range, and reads data corresponding to a column of blocks 125 at the opposite edge of the search range. If the height of the search range is V blocks, then V blocks of pixel data must be read for each new block that is processed in the motion estimator. Some saving is made as the search range begins to overlap the edge of the screen, but this is roughly offset by the need to read a long row of blocks 130 as the processing position steps down to the next row. A boustrophedon processing order 135 is preferred over a raster scanning order 140, as for processing to jump back to the opposite end of the next row would require a costly replacement of the entire buffer contents 145.
It is reasonable to expect that for motion estimation of a frame, every block of pixel data in the adjacent frame must be examined at least once. Most motion is approximately translational, and the vector field is relatively uniform. While there are exceptions where significant parts of the adjacent frame are never visited by the motion estimator, this cannot ever be relied upon to be the case. It can reasonably be said that a motion estimator is optimal in terms of memory bandwidth if each block of pixel data from the adjacent frame is read once, and once only.
The design of FIG. 1 has the advantage of minimising the amount of storage required for the pixel data buffer, at the expense of approximately V times the memory bandwidth consumption of an optimal design. For a reasonable vertical motion search range, V is not particularly small.
FIG. 2 illustrates a pixel data buffer 200 that is the full width of the frame. The motion estimator's search range 205 corresponds to the block being processed 210. When motion estimation advances to the next block 215, one additional block 220 must be read, and one block 225 may be discarded. The read block position wraps to the beginning of the next row as the processing position approaches the end of the current row, making the distinction between boustrophedon and raster scanning processing orders is less significant. This design achieves the optimal memory bandwidth because each block of the frame need be read only once. Where the frame is wide there may be a greater delay before the buffer is sufficiently full that the first block can be processed. The most significant disadvantage of this design is the considerably larger amount of local memory required for the pixel data buffer.
A memory cache is not normally a suitable alternative to the pixel data buffer in this type of application, due to the time taken to retrieve data from memory via a bus in the event of a cache miss. In contrast, the pixel data buffer designs guarantee that data is available immediately. This is significant in a computationally intensive real-time application such as video processing.
As video frame sizes increase, the amount of storage required for the pixel data buffer increases proportionally. A high-definition (HD) image is 1920 by 1080 pixels. While there is no limit on the size of the motion in a video sequence, there are often practical constraints on the size of a motion estimator's search range. The vertical size of the search range, V, is of particular importance. For a motion estimator to track the majority of movements in HD video, the buffer may need to be several hundred pixels tall. If the pixel data buffer is the width of the screen and if the vertical search range, V, is 200 pixels, then storage for approximately 1920 by 200 pixels is required. (FIG. 2 shows that the buffer may be one block shorter, over part of its width). Motion estimation often operates between different frames, or over different intervals simultaneously, requiring pixel data for perhaps two, three, or four frames at the same time. The storage requirements must therefore be scaled up accordingly. The total amount of storage required is large, and pixel data buffering represents a significant proportion of the silicon area of a motion estimation device.
Ultra High Definition Television (UHDTV) standards define new video formats with higher resolution than HD. 4K UHDTV is a format with frames twice the size of HD in each dimension, i.e. 3840 by 2160 pixels. The need to read the frames remains unavoidable, and so the memory bandwidth increases at least in proportion to the area of the frames, i.e. four times. Consequently, it is desirable to design the pixel data buffering to be as close to optimal as possible, adding no unnecessary memory bandwidth. This suggests the use of a pixel data buffer the full width of the screen. Typical motion vectors also scale in proportion to the frame size, so the search range dimensions will double. The full width pixel data buffer will therefore now be approximately 3840 by 400 pixels, four times the size, and therefore roughly four times the silicon area of the HD solution.
8K UHDTV frames are twice the size of 4K UHDTV in each dimension. This increases the bandwidth requirement to sixteen times that of HD video, and requires sixteen times the amount of pixel data buffer storage.