As a data processor for performing three dimensional graphic processing, MICROPROCESSOR REPORT, vol. 13, no. 5, Apr. 19, 1999, pp. 1, 6-11 discloses a processor having two single instruction multiple data (SIMD) type floating-point units which execute four floating-point multiply-add operations at one instruction. The processor disclosed therein can execute two operations, comprising a multiplication and an addition, in one floating-point multiply-add operation, four operations in two floating-point multiply-add operations, or a total of 16 operations with the two units. As the processor also has two conventional floating-point multiply-add execution units in addition to the aforementioned two units, it can perform four additional operations, or a total of 20 operations in a single cycle.
Other such data processors include one disclosed in IEEE Micro., vol. 18, no. 2, March/April 1998, pp. 26-34, which, having instructions to calculate floating-point inner products, can calculate two length-4 vector inner products by executing four multiplications and three additions at one instruction. It performs seven operations in one cycle when executing an instruction to calculate inner products.
Further, the Japanese Published Unexamined Patent Application No. Hei 10-124484 discloses a data processor which can calculate inner products by providing eight floating-point numbers to four multipliers and adding in parallel the results of multiplication with four input adders, i.e. in one round of parallel multiplications and additions.
The present inventor made a study on data processors and data processing systems which could accomplish graphic processing using floating-point numbers in multimedia equipment faster than conventional processors or systems.
Particularly important and heavy-load modes of processing for a data processor for use with multimedia equipment and a data processing system for multimedia processing are three dimensional graphic processing and image processing. Of these modes, image processing is standardized and therefore the method involving the least manufacturing cost is mounting of dedicated hardware. There are already available conventional processors mounted with dedicated hardware for image processing.
On the other hand, three dimensional graphic processing requires geometric processing such as coordinate transformation and rendering such as color scheming. Since rendering is not suitable for a general purpose processor but is usually processed in formatted processing, it is a common practice to use dedicated hardware where fast processing is required. By contrast, for geometric. processing such as coordinate transformation, which has greater freedom and handles floating-point data, is usually carried out by floating-point units of the processor. The most frequent mode of geometric processing is length-4 vector inner product operation. Intensity calculation is processed by calculating the inner product; coordinate transformation, by calculating the product of a 4×4 matrix and a length-4 vector; and transformation matrix generation, by calculating the product of 4×4 matrices. These modes of processing can be accomplished by one length-4 vector inner product operation for intensity calculation, four length-4 vector inner product operations for coordinate transformation, and 16 length-4 vector inner product operations for transformation matrix generation. There also are conventional processors specialized in length-4 vector inner product operations to achieve faster processing, resulting in an efficient speed increase in geometric processing.
However, there is a stringent requirement for higher speed in three dimensional graphic processing, and a further increase in processing speed is needed to increase the reality of moving pictures. Yet, since basic data of graphic processing are length-4 vectors, it is difficult for any conventional processor arrangement to further raise the level of parallelism. There are many applications whose processing speed can be enhanced by defining a hypercomplex vector inner product instruction, such as finite impulse response (FIR), but what requires the highest floating-point operation performance in the field of consumer multimedia is three dimensional graphic processing. Even if a known processor having a length-4 vector instruction can efficiently enhance the level of parallelism, it will be meaningless unless it contributes to increasing the speed of three dimensional graphic processing.
On the other hand, as a matter of principle, it is easy to enhance the level of parallelism with the SIMD system. However, the SIMD system also has inefficient aspects, and its cost tends to significantly increase with a rise in the level of parallelism. It cannot be considered a realistic solution to further expand the SIMD part by several times, which already occupies a large area in a conventionally available processor. For instance, the data processor disclosed in the first reference cited as an example of the prior art has as many as 10 floating-point multiply-add execution units built into it, and its chip area would amount to a huge area of 240 square millimeters even if produced in a 0.25 μm process. Out of this total chip area, the area of the parallel SIMD type floating-point unit to execute four floating-point multiply-add operations is estimated at about 22 square millimeters from the chip photograph. Since dividers are not fully formed in a parallel SIMD configuration and not quite as many as four control circuits are necessarily needed, the required area will be about three times as large as that of a usual floating-point unit.
The chip area of the data processor disclosed in the second reference cited as another example of the prior art will be about 56 square millimeters if produced in a 0.25 μm process. Out of this total chip area, the area of the floating-point unit is estimated at about 10 square millimeters from the chip photograph, and the area excluding the unit for executing the inner product instruction is about 7.5 square millimeters. This means that the addition of the inner product instruction results in a floating-point unit increased by about 1.3 times.
An object of the present invention is to provide a data processor and a data processing system efficiently improved in the level of operation parallelism.
Another object of the invention is to provide a data processor and a data processing system which are minimized in circuit dimensions and yet capable of floating-point number operations highly accurately and at high speed.