The present invention relates to parallel data processing apparatus, and in particular to SIMD (single instruction multiple data) processing apparatus.
Increasingly, data processing systems are required to process large amounts of data. In addition, users of such systems are demanding that the speed of data processing is increased. One particular example of the need for high speed processing of massive amounts of data is in the computer graphics field. In computer graphics, large amounts of data are produced that relate to, for example, geometry, texture, and colour of objects and shapes to be displayed on a screen. Users of computer graphics are increasingly demanding more lifelike and faster graphical displays which increases the amount of data to be processed and increases the speed at which the data must be processed.
A previously proposed processing architecture for processing large amounts of data in a computer system uses a Single Instruction Multiple Data (SIMD) array of processing elements. In such an array all of the processing elements receive the same instruction stream, but operate on different respective data items. Such an architecture can thereby process data in parallel, but without the need to produce parallel instruction streams. This can be an efficient and relatively simple way of obtaining good performance from a parallel processing machine.
However, the SIMD architecture can be inefficient when a system has to process a large number of relatively small data item groups. For example, for a SIMD array processing data relating to a graphical display screen, for a small graphical primitive such as a triangle, only relatively few processing elements of the array will be enabled to process data relating to the primitive. In that case, a large proportion of the processing elements may remain unused while data is being processed for a particular group.
It is therefore desirable to produce a system which can overcome or alleviate this problem.