Digital transmission and storage systems generally use block-based compression, as used in the well known JPEG or MPEG formats, to achieve acceptable image quality within the available transmission bandwidth and storage capacity. JPEG is a video compression system based upon performing Discrete Cosine Transformation (DCT) on groups, or blocks, of pixel data. MPEG is a motion video compression system based upon the same principles, but with additional features to support motion between image frames. To achieve substantial data compression, the DCT coefficients representing each block of pixels are subjected to adaptive quanfisation and Variable Length Encoding (VLE). Blocks are also grouped together in fours, to form “Macroblocks”, and chrominance (colour) components are represented with half the spatial resolution provided for luminance (brightness) component. These techniques can be applied in both still images (JPEG) and motion video (MPEG). For moving pictures, temporal redundancy between image frames is identified and significantly reduced using motion-compensated inter-frame predictive encoding.
The terminology used for describing MPEG sequences includes ‘frames’, which contain a complete image, and ‘fields’, which are half an image, arranged every other line. The unit of decoding, however, is a picture, which can be field or frame structured. An image buffer is used to store both frames and/or fields, depending upon the storage format employed.
Three image picture types, known as ‘I’, ‘B’ and ‘P’ pictures, are used to construct an image sequence for transmitting data across a restricted channel between an encoder and a decoder. A “channel” includes bandwidth-limited communication links and storage of imagery on mass storage media such as hard drives, Compact Disks or video tape (where it is desirable to maximise storage efficiency). ‘I’ picture frames are “intra” frames, which are similar in construction to a single JPEG frame and contain a complete, moderately compressed, frame of image data. ‘P’ picture frames are “predictive” frames and are encoded with reference to a previous ‘I’ or ‘P’ frame (known as ‘key frames’) within a video stream. ‘B’ picture frames are “bi-directional interpolated” frames which require both earlier and later reference frames in order to be encoded.
To support production of an MPEG image sequence data has to be stored in image memory. When ‘B’ or ‘P’ frames are being produced, one or two previous source I or P images must have been stored and are referenced to provide motion prediction data.
A typical arrangement for an image processor performing MPEG processing comprises modules (in hardware or software) for inputting data, converting the image between analogue and digital domains (in either direction, depending on whether the processor is an encoder or a decoder), storing the data, performing compression or decompression on the data, compensating for motion, and for outputting the data for subsequent use, such as display.
In order to provide an implementation that is reliable, efficient and providing high quality output, it must be designed to accommodate worst-case conditions which as a result of processor overload would result in frame skipping or other image artefacts that degrade image quality. Furthermore, memory bandwidth is a crucial issue for both hardware and software implementations.
In order to maximise throughput (and therefore reduce the opportunity for processor overload), processors often employ a cache memory with a very high access speed, allowing the processor to obtain data very rapidly. The cache is arranged so that as much data as possible is sourced from the cache, rather than from slower “external” memory. For software implementations in particular, cache size and traffic are often crucial performance-determining factors. Worst-case performance can be significantly affected by cache activity, as it is possible for prediction data to be sourced from widely separated parts of input images resulting in significant “cache thrashing” (significant and substantially unnecessary cache use and discarding of content).
Image memory is often provided as “paged” memory, whereby image data for a picture is stored over many pages, each of which is accessed by the processor by page access. Page access is a very rapid way of accessing data from a paged Random Access Memory (RAM), requiring only the provision by the processor of a base address, after which the data is clocked out and passed to the processor without provision of further addressing to the memory. However, page crossings are inefficient, as they require termination of the current paged memory transfer and generation of new addressing for the next set of data transfer.
It is understood in the field of MPEG image processing that when paged memory is used different storage formats provide different advantages, depending upon the content of the images being processed. Storing image data on a linear basis (for example, following a raster scan) minimises the complexity required to subsequently display the image. However, when linear storage is used for a reference frame of image data for use in motion compensation/reconstruction, retrieval of reference data may require a large number of memory page crossings. Alternatively, storing reference image data in “tiled format” (where two-dimensional blocks of pixels as used in the coding process are stored in sequence in memory) reduces the number of page crossings by taking advantage of the fact that image pixels naturally have a two-dimensional spatial relationship. With this in mind U.S. Pat. No. 5,912,676 describes a re-configurable image memory interface for storing or retrieving image data to/from the memory according to different image storage formats, such as scan line (raster), tiled or “skewed-tiled” formats. However, the present inventors have recognised that, since the content of picture sequences varies greatly in the amount and nature of motion between the frames or fields that make up each picture, it is not possible to select a configuration for the reference data memory that will be the optimum one for all sequences. Accordingly, the US patent provides a range of configurations that may be advantageous if properly selected, but does not define what is the proper configuration. Nor is it possible to define a single configuration that will be optimal in all sequences. The optimal configuration may further depend on what steps are to be performed on the image data after decoding. The need for conversion to another format for a subsequent processing stage may for example negate any saving by adopting a preferred format for a motion estimation stage.