As computing technology progresses, it is apparent that the single-processor model of executing at most one instruction in every machine cycle will be supplanted in many applications by parallel-processor arrangements. One such application for parallel-processors is in the computer graphics field where image processing requires picture elements (pixels) to be read, written and processed simultaneously. Image processing constantly requires faster machines to process ever-increasing amounts of data and to render higher quality images in less time.
Systems used to generate and manipulate 3D (three-dimensional) images typically include a transformation processing subsystem, a rendering subsystem, a frame buffer, and a raster display subsystem. The transformation processing subsystem performs operations including 3D transformations, clipping, shading and projection. These are operations that are required to convert the data from 3D "world coordinates" into the coordinates needed to render an object on the raster display.
The rendering subsystem takes the output of the transformation processing subsystem and renders the objects as pixels in the frame buffer. This process includes operations such as raster conversion, shading calculations, texture mapping and anti-aliasing.
The frame buffer is a memory subsystem that is used to store the image that is being generated. The information in the frame buffer can be read and written by the rendering subsystem, and is read by the raster display subsystem to generate data in the next display.
The raster display subsystem converts the data in the frame buffer into a video output image signal. The raster display subsystem must be able to read sequential pixels from the frame buffer very quickly in order to generate the video output signal.
The implementation of a rendering subsystem frequently includes a number of processing elements operating in parallel to satisfy the computational demands of the rendering task. It is important for the processing elements of the rendering subsystem to be able to quickly read and write pixels to and from the frame buffer. Frame buffer access for parallel rendering subsystems is difficult because of the very high-bandwidth required by the frame buffer interface.
A prior art solution to the frame buffer access problem is to make each processor responsible for generating a specified set of the pixels in the frame buffer, and distributing the frame buffer memory among the processing elements such that each processing element has sufficient memory to store the pixels for which it is responsible. Frame buffers that utilize this technique are referred to as distributed frame buffers. It is possible to allocate pixels to distributed frame buffer elements in many different ways. These include contiguous block, 2D interleaved, scan line interleaved, and column interleaved.
The ever-increasing computer performance requirements provide a continuing challenge to improve distributed frame buffer organization and operation in view of competing factors such as load balancing, algorithm efficiency, scalability, and raster display complexity and cost.