In the field of computer graphics processing it is often imperative to achieve a high level of data parallelism. That is because identical or similar operations regularly need to be performed on a large set of data. For example, the same operation or operations may need to be performed for each pixel of a computer graphic. With contemporary high computer graphic resolutions reaching thousands or even millions of pixels and the desire to avoid lags or delay, the advantage of parallelizing and thus speeding up graphics processing becomes readily apparent. A typical layout of parallel processors is characterized by the combination of a single, central control processor and an array of tiles, each comprising a fixed number of processing elements as well as shift registers and local frame memory associated to each processing element or group of processing elements. The tiles are usually arranged in a cascade, such that the input data is shifted down the line of registers until each tile has obtained its part of the input data in its shift registers. Each of the processing elements then executes an identical sequence of instructions operating on that data, with the local memory used for storing intermediate and temporary data. Each processing element will regularly also be able to access the data of neighboring processing elements or the data of processing elements of the same tile. Once the processing for this set of data is complete, the current data which is now the output data is shifted down the line to the output and new input data is shifted into the shift registers of the tiles. The role of the central control processor is to provide the identical instruction to each of the processing elements, to control the shifting operations and perform other global management functions of the parallel processor.
Because the same instruction is executed simultaneously in each processing element, even though the data used with this instruction may be different, it is not easily possible to manipulate the data for an individual pixel which is currently being processed by the parallel processing elements. Doing so may, however, be advantageous in a number of circumstances, for example when an individual pixel is known to be defective or in other way special beforehand. It may also be advantageous for marking or masking purposes to be able to modify the data for an individual pixel. This is particularly relevant for text and sprites as they are used in gaming applications, for example.
It is therefore an object of the invention to provide a parallel processor architecture which allows for manipulating the data for an individual pixel.