1. Technical Field to Which the Invention Belongs
The present invention relates to a graphic translate engine and particularly relates to a graphic translate engine for performing geometrical processing used in computer graphic processing, etc.
The present invention also relates to a floating point multiply-add calculation unit and particularly relates to a floating point multiply-add calculation unit capable of performing processing about a continuous multiply-accumulation operation in a dependent relation at high speed.
2. Prior Art
Recently, computer graphics (CG) have become very important by a rapid spread of multimedia, thoroughness of WYSWYG(What You See What You Get), spreads of a high grade GUI (Graphic User Interface) and a TV game using graphic, etc. In particular, requirements for three-dimensional computer graphics (3D-CG) as an application used in a high performance processor, especially requirements for a moving image of a high quality have been increased by a rapid spread of personal computers to homes, spread of a TV game machine. It is necessary to process one frame for {fraction (1/30)} to {fraction (1/60)} second so as to process the moving image. Accordingly, a large computing amount and computing ability are required in this processing.
Graphic processing using a computer is mainly divided into two phases, namely, geometrical processing and rendering processing. In the geometrical processing, processing for generating an image displayed on a CRT is geometrically performed by a coordinate transformation such as a movement of modeled data themselves, a movement conformed to a view point, etc. and projection. In the rendering processing, an image is really drawn on the CRT. A matrix calculation and a vector calculation are made in the geometrical processing as a phase for performing transformation processing of a geometrical graphic model such as a coordinate transformation, a view point transformation, etc. and light irradiating processing. Therefore, the calculation of an inner product is used in many cases. The coordinate transformation is variously introduced in detail in literatures of computer graphics.
FIG. 1 shows the construction of a typical graphic translate engine (GTE). The GTE is constructed by an arithmetic unit section 801, a register file 802, an input output interface 804, etc. The arithmetic unit 20 section 801 is a data path for making a matrix calculation and is constructed by an adder-subtracter, a multiplier, a divider, a square root extracting arithmetic unit, etc. The input output interface 804 is an interface of an external memory unit, and the register file 802 and the arithmetic unit 801.
1: Data Transfer
Data of 3D computer graphics depend on modeling, but are generally treated as a set of independent triangles. Three vertexes of an independent triangle are represented by homogeneous coordinates and are stored to the external memory unit.
No memory unit having a large capacity is mounted to the interior of the conventional graphic translate engine in many cases. Therefore, graphic data are read from the external memory unit and are sent to a data path such as an arithmetic unit, a register file, etc. through a FIFO, etc. In this method, a fluctuation of a data transfer speed caused by a latency of a bus, an access speed of the memory unit, etc. is hidden by using the FIFO as a buffer for an input or an output. However, this fluctuation is rate-determined by the access speed of the memory unit and a responsive speed of the bus so that no sufficient transfer band width can be secured.
In contrast to this, there is a system in which an internal memory unit is mounted to a certain extent and data are taken in at a high speed and a calculation is made by a DMA (Direct Memory Access) system. In an arithmetic unit of such a system, the internal memory unit is adapted to be accessed by the external memory unit, an internal arithmetic unit and a register file. Therefore, it is difficult to execute data transfer and an arithmetic operation in parallel with each other. Accordingly, two phase processings of data transfer and data processing are alternately performed so that no processings can be efficiently executed as a pipeline. Data are transferred at a high speed by the DMA, but no entire processing can be sufficiently performed at a high speed.
It is considered that the transfer and arithmetic operations are executed in parallel with each other by a similar construction and a memory unit having plural ports is mounted to increase processing efficiency. However, in this case, control greatly becomes complicated in mediation of an access conflict to the same memory unit, etc., and cost of the memory unit is also increased. Accordingly, no memory unit having a large capacity capable of obtaining sufficient processing performance can be mounted to the graphic translate engine.
2: Transformation Processing
Here, an example of a simple perspective transformation is shown before a conventional example is shown. The perspective transformation is a transformation for projecting a three-dimensional graphic model onto two dimensions in consideration of perspective. Assuming that an input (x, y, z, 1) is a vertex coordinate to be transformed, the perspective transformation is performed on the basis of the following formulas (1) to (3), and X and Y coordinates on a screen are outputted after (X, Y) perspective transformation.                                                                         (                                                      x                    xe2x80x2                                    ,                                      y                    xe2x80x2                                    ,                                      w                    xe2x80x2                                                  )                            =                              xe2x80x83                            ⁢                                                (                                      x                    ,                    y                    ,                    z                    ,                    1                                    )                                xc3x97                                  (                                                                                                              a                          ,                          b                          ,                          c                                                                                                                                                              d                          ,                          e                          ,                          f                                                                                                                                                              g                          ,                          h                          ,                          i                                                                                                                                                              j                          ,                          k                          ,                          l                                                                                                      )                                                                                                        =                              xe2x80x83                            ⁢                              (                                                      ax                    +                    dy                    +                    gz                    +                    j                                    ,                                      bx                    +                    ey                    +                    hz                    +                    k                                    ,                                      cx                    +                    fy                    +                    iz                    +                    1                                                  )                                                                        (        1        )            xe2x80x83W=1/wxe2x80x2xe2x80x83xe2x80x83(2)
(X, Y)=(xxe2x80x2, yxe2x80x2)xc3x97Wxe2x80x83xe2x80x83(3)
Thus, in the perspective transformation, it is necessary to make a multiply-accumulation operation caused by a matrix calculation and further make a divisional calculation by using results of this multiply-accumulation operation. Calculations with respect to respective coordinates of x, y, z and w are approximately the same and are independent of each other so that there are features in that the perspective transformation has high parallel and symmetrical properties with respect to these calculations.
In the typical conventional example of FIG. 1, one multiply-add calculation unit and one adder-subtracter unit are mounted. In such a transformation processor, only the above-mentioned arithmetic operations can be sequentially processed by a simple pipeline processing. Accordingly, the features of the high parallel and symmetrical properties with respect to calculations are simply used only in scheduling instructions.
FIG. 2 shows a construction to which the features with respect to calculations are applied. In this construction, a register file and a multiply-add calculation unit are set to correspond to each of coordinates of x, y, z and w so that these calculations can be independently made. Namely, ax+dy+gz+j, bx+ey+hz+k and cx+fy+iz+l in the formula (1) are respectively allocated to first, second and third arithmetic units and are independently calculated. Thus, a high speed arithmetic calculation can be performed in consideration of arithmetic characteristics. However, in such a construction, no calculations of the above formulas (2) and (3) can be efficiently made. It is sufficient to make a divisional calculation once. Accordingly, while the divisional calculation is made, no plural arithmetic units can be effectively utilized. Further, since the divisional calculation has a large latency in comparison with the other arithmetic calculations, no expensive plural arithmetic units can be particularly operated effectively. Accordingly, in such a construction, no sufficient performance corresponding to invested hardware can be obtained.
3: Light Irradiating Processing
Light irradiating processing is performed with respect to an object to obtain an image of a real feeling. In the following example, a color is represented by synthesis of red (R), green (G) and blue (B) and the light irradiating processing is set to be performed by each of these colors. The calculation of brightness depends on modeling of light, but is generally made as follows. Namely, a vertex color is calculated by adding reflection of light from a material at its vertex, whole environmental light enlarged and reduced in size by environmental optical characteristics of the material at its vertex, and influences of diffused light, a mirror surface light and environmental light suitably damped from all light sources. This light irradiating processing is schematically shown in the following description.
Processing Start
(a) A light beam and a normal line at the vertex are normalized if necessary.
(b) Radiated light and environmental light in a light source nonexistent state are set to constants.
(c) The environmental light, diffused light and mirror surface light every light source are calculated with respect to the individual light source and are added together in the following procedures.
(i) A vector (a light incident vector: a light direction vector) from the vertex to the light source is calculated.
(ii) The distance between the vertex and the light source is calculated from this vector, and the vector from the vertex to the light source is also normalized.
(iii) A damping factor is calculated from the distance.
(iv) An inner product (cos xcex8) of the light source vector and the vertex normal line is calculated.
(v) A spot light effect is considered.
(vi) An influence of the environmental light every light source is considered on the basis of the following formula (4).
Environmental influence=light source environmental coefficientxc3x97substance (vertex) environmental coefficientxe2x80x83xe2x80x83(4)
(vii) An influence of the diffused light every light source is considered on the basis of the following formula (5).
Diffusive influence=(light source vectorxc2x7normalized line at vertex)xc3x97light source diffusion coefficientxc3x97substance (vertex) diffusion coefficientxe2x80x83xe2x80x83(5)
(viii) An influence of the mirror surface light every light source is calculated as follows.
Assuming that L is a unit vector in an incident direction of light and V is a unit vector in a viewing direction. Also, N is a unit vector in a normal line direction and xcex8 is an incident angle. Further, xcex1 is an angle formed between a viewing vector and a reflecting vector. In this case, the following relation of formula (6) is formed.                                                                                           (                                      L                    ⁢                                          -                                        ⁢                    V                                    )                                ·                N                            =                              xe2x80x83                            ⁢                                                L                  ·                  N                                ⁢                                  -                                ⁢                                  V                  ·                  N                                                                                                        =                              xe2x80x83                            ⁢                                                cos                  ⁢                                      xe2x80x83                                    ⁢                  θ                                -                                  cos                  ⁡                                      (                                          θ                      +                      α                                        )                                                                                                                          ≈                              xe2x80x83                            ⁢                                                -                  cos                                ⁢                                  xe2x80x83                                ⁢                α                                                                                        =                              xe2x80x83                            ⁢                              cos                ⁢                                  xe2x80x83                                ⁢                α                                                                        (        6        )            
When the viewing vector is calculated from a vertex vector, S(sx, sy, sz) is calculated from the following formulas (7) to (9) and an inner product of S and norm is calculated.
sx=1xxe2x88x92vxxe2x80x83xe2x80x83(7)
sy=1yxe2x88x92vyxe2x80x83xe2x80x83(8)
sz=1zxe2x88x92vzxe2x80x83xe2x80x83(9)
When it is assumed that the viewing vector is compulsorily directed to a xe2x88x92Z axis direction, S(sx, sy, sz) is calculated by the following formulas (10) to (12).
sx=1xxe2x80x83xe2x80x83(10)
sy=1yxe2x80x83xe2x80x83(11)
sz=1z+1xe2x80x83xe2x80x83(12)
The result of the inner product is raised to mirror surface coefficient Shininess [i] power every light source i so that spec_coef is calculated.
Accordingly, the influence of the mirror surface light every light source is calculated by the following formula (13).
Mirror surface influence=spec_coefxc3x97light source mirror surface coefficientxc3x97substance (vertex) mirror surface coefficientxe2x80x83xe2x80x83(13)
(ix) All the influences are calculated by the following formula (14).
All the influences=damping factorxc3x97spot light effectxc3x97(environmental light influence+diffused light influence+mirror surface light influence)xe2x80x83xe2x80x83(14)
(d) All the influences of the light source i are added to red (R), green (G) and blue (B).
(e) After the influences of all the light sources are added, R, G and B are clamped between 0 and 1.
Processing Termination
As mentioned above, each of the calculations in the light irradiating processing depends on modeling of light. Therefore, these calculations are slightly different from each other in detail. However, it is important here that brightness is defined by each of values of [0,1] and a calculated brightness is clamped to each of these values. Here, [0,1] shows value n in a range of 0xe2x89xa6nxe2x89xa61.
In a conventional arithmetic unit, as shown in the following processing flow, a brightness value and xe2x80x980xe2x80x99 and xe2x80x981xe2x80x99 are compared with each other by a comparison instruction, and a branching operation is performed by a conditional branching instruction if necessary. Thus, the clamping processing is performed by outputting constants xe2x80x980xe2x80x99 and xe2x80x981xe2x80x99.
/* Clamping flow of R, G and B values */
If (R less than 0.0) {R=0.0}
If (R greater than 1.0) {R=1.0}
If (G less than 0.0) {G=0.0}
If (G greater than 1.0) {G=1.0}
If (B less than 0.0) {B=0.0}
If (B greater than 1.0) {B=1.0}
In such a method, execution of the branching instruction is caused in clamping so that a disturbance of an arithmetic pipeline is caused. In the calculation of brightness, the three primary colors of R, G and B are calculated at each of vertexes constituting a picture so that a large processing amount is required. Accordingly, in the above-mentioned conventional flow, the pipeline disturbance is often caused so that processing performance of the brightness calculation is greatly deteriorated.
As mentioned above, there were the following problems in the conventional graphic translate engine (GTE).
(1) No graphic data to be transformed can be efficiently transferred to an arithmetic unit and a register file.
(2) It is impossible to efficiently execute the inner product calculation caused by a matrix calculation for performing the perspective transformation and the divisional calculation by xe2x80x98depthxe2x80x99.
(3) It is impossible to execute the clamping processing of R, G and B brightnesses in the light irradiating processing at high speed.
The geometrical processing in the computer graphics (CG) is a phase for performing transforming processing of a geometrical graphic model such as a coordinate transformation, a perspective transformation, etc and for performing light irradiating processing. Therefore, in these processings, a matrix calculation and a vector calculation are made so that calculations of inner products are used in many cases. The calculations of inner products are similarly used in many cases in a numerical calculation in conventional science and technology calculations except for the above 3D-CG processing.
Accordingly, realization of a high speed multiply-add calculation unit is desired by the above requirements. The construction of a conventional floating point multiply-add calculation unit will next be explained concretely. A method for constructing the multiply-add calculation unit is generally divided into two methods.
In a first constructing method of the conventional floating point multiply-add calculation unit, the multiply-add calculation unit is directly constructed. FIG. 3 shows a block diagram of a mantissa arithmetic unit and an exponent part arithmetic unit in the first conventional floating point multiply-add calculation unit. The mantissa arithmetic unit is constructed by multiplication trees 301, 302 for calculating a product of first and second operands, a bidirectional shifter 303 for performing a digit alignment of a third operand, adders 304, 305 for calculating a sum of a multiplied result and a digit-aligned result(i.e. multiply-add), a normalizing circuit 307 for normalizing results of the multiply-accumulation operation obtained by the adders, and a leading zero anticipation circuit 306. The exponent part arithmetic unit is constructed by an adder 308 for calculating the value of an exponent part of the product of the first and second operands (i.e. a sum of exponent parts), a selecting circuit 318 for calculating an exponent part (a larger value of an exponent part of the third operand and the exponent part of the product of the above first and second operands) of a sum of the third operand and the product of the first and second operands (i.e. multiply-add), a subtracter 309 for calculating an aligned digit number (the difference between the exponent part of the third operand and the exponent part of the product of the above first and second operands), and a subtracter 312 for performing normalization.
This arithmetic unit is an arithmetic unit of four operands in total constructed by three source operands and one destination. The multiply-accumulation operation is executed as follows. Namely, the first and second operands are inputted to the multiplication trees 301 and 302 for calculating the product of the first and second operands and are multiplied. The digit alignment of the third operand is performed by the bidirectional shifter 303 in parallel with this multiplying processing. The number of shifts on a left-hand or light-hand side is calculated as the difference between an exponent sum of the first and second operands and the exponent of the third operand. A sum of the product of the first and second operands and a digit-aligned result of the third operand(multiply-add) is calculated by the adders 304 and 305. The multiply-add calculation result obtained by the adders is normalized by the normalizing circuit 307.
A series of these processings is executed by pipeline processings at two stages. Accordingly, calculation results of a certain operand can be used as an operand in the next arithmetic operation only after two clocks. Namely, an arithmetic operation having a dependent relation can be executed only every two clocks. FIG. 4A shows an instruction sequence of an inner product calculation and FIG. 4B shows execution timing of this instruction sequence. In this timing chart, F, D, E1, E2 and WB show respective stages of a pipeline, namely, an F/instruction fetch stage, a D/instruction decode stage, E1, E2/ arithmetic executing stages, and a WB/write back stage.
In a second constructional method of the conventional floating point multiply-add calculation unit, independent multiplier and adder-subtracter are mounted and a multiply-accumulation operation is realized by longitudinally connecting these arithmetic units to each other, or bypassing calculation results as an operand. There are a method for providing a dedicated multiply-add instruction and a method for realizing the multiply-accumulation operation by multiplying and adding calculations using bypass. FIG. 5. shows a block diagram of a mantissa arithmetic unit of the second conventional floating point multiply-add calculation unit. The multiplier is constructed by multiplication trees 501, 502 for calculating a product of first and second operands, an adder 505 for finally adding partial products to each other, a normalizing circuit 507 and a leading zero anticipation circuit 506. The adder-subtracter is constructed by a shifter 503 for aligning digits of the operands with each other, an adder 505a for calculating a sum, a normalizing circuit 507a and a leading zero anticipation circuit 506a. 
In such a construction, much time is required in comparison with the first constructing method until results of the multiply-accumulation operation are obtained. The floating point arithmetic unit mounted to a general MPU is designed such that 2 to 5 cycles are required to make multiplying, adding and subtracting calculations. For example, assuming that both the multiplying calculation and the adding and subtracting calculations can be executed by two clocks, an instruction can be issued every two cycles, but four clocks are required to obtain the results of a multiply-add. FIG. 6A shows an instruction sequence of an inner product calculation and FIG. 6B shows execution timing of this instruction sequence.
A multiply-accumulation operation having a dependent relation is required to execute the inner product. As explained in FIGS. 4A and 4B or FIGS. 6A and 6B, no multiply-accumulation operation having the dependent relation can be continuously executed when the instruction sequence for calculating the inner product is executed by using the first or second construction.
In such a case, as generally shown by FIGS. 7A and 7B, an independent instruction is executed by a scheduling technique of instructions and an arithmetic latency is hidden. However, when there is no independently executable instruction, the arithmetic unit must wait for termination of calculation results. In particular, when the latency is large as in the second conventional construction, this tendency is increased.
When no multiply-accumulation operation having a dependent relation can be continuously executed, the following problems are caused.
(1) No overhead can be hidden when there is no executable calculation between continuous multiply-accumulation operations having a dependent relation.
(2) A technique such as scheduling, etc. is required and programming is difficult.
(3) Many registers are required since intermediate values are stored.
As mentioned above, different from an integer multiply-add calculation unit, the conventional floating point multiply-add calculation units have much latencies and no multiply-accumulation operation having a dependent relation can be executed every clock.
In consideration of these problem points, an object of the present invention is to realize three items of (1) data are efficiently transferred between a memory unit of graphic data and a graphic translate engine, (2) an inner product calculation caused by a matrix calculation for performing a perspective transformation and a divisional calculation by xe2x80x98depthxe2x80x99 are efficiently executed, and (3) clamping processing of red (R), green (G) and blue (B) brightnesses in light irradiating processing is executed at high speed.
Another object of the present invention is to provide a floating point multiply-add calculation unit capable of shortening a processing time of a continuous multiply-accumulation operation and particularly starting execution of a dependent multiply-accumulation operation and terminating the dependent multiply-accumulation operation every clock.
To achieve the above objects, there is provided a graphic translate engine for performing a predetermined geometrical arithmetic processing with respect to vertex data of a figure stored to an external memory unit and represented by homogeneous coordinates, the graphic translate engine comprising: an internal memory section divided into plural memory blocks and capable of inputting and outputting data every memory block, and inputting predetermined vertex data from the external memory unit to each memory blocks and holding these vertex data, and outputting the vertex data by switching a connection destination to a data holding section; the data holding section for temporarily storing one portion of the vertex data stored to each memory block of the internal memory section; and an arithmetic section for inputting the vertex data stored to this data holding section and generating graphic data by performing predetermined processing of the vertex data; wherein each memory block of the internal memory section inputs the graphic data generated in the arithmetic section and outputting the graphic data by switching the connection destination to the external memory unit.
In the construction of the above invention, the internal memory section is divided into plural memory blocks able to be independently accessed. Some of these memory blocks are connected to the external memory unit so that graphic data are transferred at high speed. Some of the memory blocks unconnected to the external memory unit are connected to the data holding section and the arithmetic section so that required processing with respect to the graphic data stored to the memory blocks is performed. When the required processing and the data transfer are terminated, the memory blocks connected to the data holding section and the arithmetic section among the plural memory blocks are next connected to the external memory unit and transfer the graphic data at high speed. In contrast to this, the memory blocks connected to the external memory unit and transforming the graphic data at high speed by a data transfer device are connected to the data holding section and the arithmetic section, and performs required processing with respect to the graphic data stored to the memory blocks. Thus, the memory blocks are exclusively connected to the external memory unit, or the data holding section and the arithmetic section and can execute transfer of a large amount of data and arithmetic processing in parallel with each other at high speed.
To achieve the above objects, there is also provided a graphic translate engine comprising: multiply-add calculation units for making a multiply-accumulation operation of x, y and z corresponding to x, y and z of at least a homogeneous coordinate system; at least one divider; first, second and third register files for storing vertex data of a figure corresponding to x, y and z of at least the homogeneous coordinate system; a first bus network for connecting the multiply-add calculation units, the divider and the register files to each other, and supplying first operand data to the multiply-add calculation units and the divider; a second bus network for connecting the multiply-add calculation units, the divider and the register files to each other, and supplying second operand data to the multiply-add calculation units and the divider; and a third bus network for connecting the multiply-add calculation units, the divider and the register files to each other, and writing back calculation results of the multiply-add calculation units and the divider to the register files; wherein each of first reading ports of the first, second and third register files is connected to corresponding input terminals of the first operand of the first, second and third multiply-add calculation units and the divider by the first bus network; each of second reading ports of the first, second and third register files is connected to an input terminal of the second operand of each of the first, second and third multiply-add calculation units and an input terminal of the second operand of the divider by the second bus network including a crossbar switch; the input terminals of the second operand of each of the first, second and third multiply-add calculation units and the divider and the respective second reading ports of the first, second and third register files can be connected to each other in a mutual connection for providing one-to-one correspondence of the registers and the arithmetic units exclusively combined with each other and a one-to-multiple mutual connection for connecting a specific register to plural arithmetic units; output terminals of the first, second and third multiply-add calculation units and the divider are connected to respective writing ports of the first, second and third register files; and at least one of the output terminals of the first, second and third multiply-add calculation units and the output terminal of the divider can be exclusively connected to any writing port of the first, second and third register files, and a writing operation to a predetermined address of the registers can be performed.
In the construction of the above invention, graphic vertex data are inputted to the corresponding first, second and third arithmetic units from the first, second and third register files by using the first and second bus networks so that a required calculation is made. The graphic vertex data are written back to the corresponding first, second and third register files by using the third bus network. Thus, an inner product calculation caused by a matrix calculation for performing a perspective transformation and a divisional calculation by xe2x80x98depthxe2x80x99 can be efficiently executed.
In a preferred embodiment of the present invention, the graphic translate engine further comprises: a first bypass network for directly connecting the output terminals of the first, second and third multiply-add calculation units and the divider to the input terminals of the first operand of the first, second and third multiply-add calculation units and the divider, and directly supplying output results of the first, second and third multiply-add calculation units and the divider to the input terminals of the second operand of the first, second and third multiply-add calculation units and the divider as operands of the first, second and third multiply-add calculation units and the divider before calculation results are written back to the register files, or in parallel with writing back processing; and a second bypass network for directly connecting the output terminals of the first, second and third multiply-add calculation units and the divider to the first and second bus networks, and directly supplying output results of the first, second and third multiply-add calculation units and the divider to the input terminals of the first or second operand of the first, second and third multiply-add calculation units as operands of the first, second and third multiply-add calculation units and the divider in parallel with processing for writing back calculation results.
To achieve the above objects, there is also provided a graphic translate engine comprising: multiply-add calculation units for making a multiply-accumulation operation of x, y, z and w corresponding to x, y, z and w of at least a homogeneous coordinate system; at least one divider; first, second, third and fourth register files for storing vertex data of a figure corresponding to x, y, z and w of at least the homogeneous coordinate system; a first bus network for connecting the multiply-add calculation units, the divider and the register files to each other, and supplying first operand data to the multiply-add calculation units and the divider; a second bus network for connecting the multiply-add calculation units, the divider and the register files to each other, and supplying second operand data to the multiply-add calculation units and the divider; and a third bus network for connecting the multiply-add calculation units, the divider and the register files to each other, and writing back calculation results of the multiply-add calculation units and the divider to the register files; wherein each of first reading ports of the first, second, third and fourth register files is connected to corresponding input terminals of the first operand of the first, second, third and fourth multiply-add calculation units and the divider by the first bus network; each of second reading ports of the first, second, third and fourth register files is connected to an input terminal of the second operand of each of the first, second, third and fourth multiply-add calculation units and an input terminal of the second operand of the divider by the second bus network including a crossbar switch; the input terminals of the second operand of each of the first, second, third and fourth multiply-add calculation units and the divider and the respective second reading ports of the first, second, third and fourth register files can be connected to each other in a mutual connection for providing one-to-one correspondence of the registers and the arithmetic units exclusively combined with each other and a one-to-multiple mutual connection for connecting a specific register to plural arithmetic units; output terminals of the first, second, third and fourth multiply-add calculation units and the divider are connected to respective writing ports of the first, second, third and fourth register files; and at least one of the output terminals of the first, second, third and fourth multiply-add calculation units and the output terminal of the divider can be exclusively connected to any writing port of the first, second, third and fourth register files, and a writing operation to a predetermined address of the registers can be performed.
In a preferred embodiment of the present invention, the graphic translate engine further comprises: a first bypass network for directly connecting the output terminals of the first, second, third and fourth multiply-add calculation units and the divider to the input terminals of the first operand of the first, second, third and fourth multiply-add calculation units and the divider, and directly supplying output results of the first, second and third multiply-add calculation units and the divider to the input terminals of the second operand of the first, second, third and fourth multiply-add calculation units and the divider as operands of the first, second, third and fourth multiply-add calculation units and the divider before calculation results are written back to the register files, or in parallel with writing back processing; and a second bypass network for directly connecting the output terminals of the first, second, third and fourth multiply-add calculation units and the divider to the first and second bus networks, and directly supplying output results of the first, second and third multiply-add calculation units and the divider to the input terminals of the first or second operand of the first, second, third and fourth multiply-add calculation units as operands of the first, second, third and fourth multiply-add calculation units and the divider in parallel with processing for writing back calculation results.
To achieve the above objects, there is further provided a floating point arithmetic unit comprising: sign part judging means for inputting a sign part of a normalized floating point number represented by three fields of the sign part, an exponent part and a mantissa, and judging on the basis of a value of this sign part whether the floating point number is positive or negative; and constant generating means for outputting the floating point number showing xe2x80x980xe2x80x99 when the floating point number is negative as a judging result of this sign part judging means.
In a preferred embodiment of the present invention, the floating point arithmetic unit further comprises exponent part judging means for inputting the exponent part of the normalized floating point number represented by the three fields of the sign part, the exponent part and the mantissa, and judging whether or not a value of this exponent part is equal to or greater than a first predetermined constant; and the constant generating means outputs the floating point number showing a second predetermined constant when it is judged as a judging result of the exponent part judging means that the value of the exponent part is equal to or greater than the first predetermined constant and the floating point number is positive as the judging result of the sign part judging means.
Accordingly, clamping processing of R, G and B brightnesses in light irradiating processing can be executed at high speed.
In a preferred embodiment of the present invention, the first predetermined constant and the second predetermined constant is 1.
To achieve the above objects, there is provided a graphic translate engine for performing a predetermined geometrical arithmetic processing with respect to vertex data of a figure stored to an external memory unit and represented by homogeneous coordinates, the graphic translate engine comprising the above floating point arithmetic unit.
To achieve the above objects, there is provided a floating point multiply-add calculation unit for inputting first, second and third operands and calculating a multiply-add of these operands, the floating point multiply-add calculation unit comprising: multiplying means for calculating a product of the first and second operands; digit-aligning means for inputting a shifting mount as information for a digit alignment and digit-aligning the third operand or a first preceding calculation result and a multiplying result calculated by the multiplying means; adding means for calculating a sum of the multiplying result and a digit-aligning result of the digit-aligning means; normalizing means for normalizing a mantissa of a multiply-add calculation result of the adding means; exponent part arithmetic means for calculating an exponent of the product of the first and second operands; shifting amount calculating means for calculating the shifting amount of the third operand or a second preceding calculation result and outputting the shifting amount to the digit-aligning means; and exponent part normalizing arithmetic means for normalizing an exponent part of the multiply-add calculation result; wherein when a second multiply-accumulation operation is continuously executed after a first multiply-accumulation operation, at the executing time of the second multiply-accumulation operation, the digit-aligning means inputs the multiply-add calculation result of the first multiply-accumulation operation made by the adding means as the first preceding calculation result, and the digit-aligning means performs digit-aligning processing with the product of the first and second operands of the second multiply-accumulation operation; and the shifting amount calculating means calculates the difference between an exponent of the product of the first and second operands in the second multiply-accumulation operation and the value of an exponent part of the first multiply-accumulation operation, and sets the difference to a shifting amount.
In a preferred embodiment of the present invention, the adding means transmits a calculation intermediate value of the first multiply-accumulation operation to the digit-aligning means as an operand of the second multiply-accumulation operation prior to completion of the first multiply-accumulation operation so that the second multiply-accumulation operation is started without waiting for termination of the first multiply-accumulation operation; and the digit-aligning means receives the intermediate value of the first multiply-accumulation operation as the operand of the second multiply-accumulation operation after the second multiply-accumulation operation is started.
In the construction of the above invention, when a multiply-accumulation operation is continuously executed, the result of a preceding multiply-accumulation operation is fed back as a third operand of the multiply-accumulation operation executed at present without normalizing this result, and a digit alignment is executed. The digit alignment is performed as the difference between an exponent part of the product of the first and second operands in the multiply-accumulation operation executed at present and an exponent part (unnormalized) in the preceding multiply-accumulation operation.
Accordingly, in accordance with the above construction of the present invention, it is possible to shorten a processing time of the continuous multiply-accumulation operation and particularly start execution of a dependent multiply-accumulation operation and terminate the dependent multiply-accumulation operation every clock.
There is also provided a graphic translate engine for performing a predetermined geometrical arithmetic processing with respect to vertex data of a figure stored to an external memory unit and represented by homogeneous coordinates, the graphic translate engine comprising the above floating point multiply-add calculation unit.
The nature, principle and utility of the invention will become more apparent from the following detailed description when read in conjunction with the accompanying drawings.