Many processing devices process digital representations such as images in the form of pictures, movie frames, etc. The digital representations are usually in the form of cells arranged in a regular grid. For example, images are conventionally represented as equally sized cells, referred to as picture elements or pixels, arranged in a 2-dimensional square grid. Each cell has a cell value associated with it, e.g. representing the colour, intensity or greyscale value of the cell. The position of each cell is conveniently identified in terms of its coordinates in a suitable coordinate system. For example, the pixels of a 2-dimensional image may be identified by their respective coordinates relative to a 2-dimensional Cartesian coordinate system.
There is an increasing demand on the image processing capabilities in processing devices such as portable or hand-held devices, e.g. mobile terminals etc. For example, camera images may be encoded using video encoding techniques, and there is an increasing demand for pre-processing functionality of camera images for video encoding combined with increased quality and throughput requirements on displayed images. These and other demands put increasingly higher demands on hardware (HW) for imaging processing.
In particular, it is generally desirable to reduce the data latency impact between the image processing hardware and external memory on which image data may be stored. The external memory bandwidth utilization is typically a bottleneck in a multimedia system's performance. Single accesses to external memory make a HW accelerator considerably sensitive to data latencies.
It is further generally desirable to perform memory access efficiently to extract the necessary pixel information. During imaging processing the same data is commonly read more than once from a word in memory. Depending on the color formats, many pixels are stored in a single word in memory or the information of single pixel is spread out in more than one word in memory.
It is further generally desirable to provide image processing hardware that is tunable/adjustable to fit the available external bandwidth to/from external memory, so as to provide efficient processing for a range of bandwidths.
The design of HW accelerators that fulfil some or all the above requirements has proven to be a complex task, and the properties of the resulting designs are often hard to verify. Moreover, it is desirable to provide a hardware architecture with a performance/functionality that is adaptable to changing requirements by smoothly adding Register Transfer Level (RTL) changes, i.e. without major RTL changes and area increase.
U.S. 2002/0004860 discloses a method of increasing image processing performance by copying image data between a memory and an I/O RAM. In particular, the copying is accomplished by calling a memory copy function (ANSI memcpy). The image data may be copied in a single call of the memory copy function or a subset of the image data may be copied one line at a time by repeated calls of the memory copy function. Even though this prior art method provides a reduced data latency between an image processing hardware and external memory, it remains a problem to provide a flexible architecture that efficiently utilises the available bandwidth.