1. Field of the Invention
The present invention relates to a graphic image processing apparatus, more specifically relates to the technical field of arrangement and interconnection of a built-in memory especially in the case where a DRAM or other memory and a logic circuit are provided together.
2. Description of the Related Art
Computer graphics are often used in a variety of CAD (computer aided design) systems and amusement machines. Especially, along with the recent advances in image processing techniques, systems using three-dimensional computer graphics are becoming rapidly widespread.
In three-dimensional computer graphics, the color value of each pixel is calculated at the time of deciding the color of each corresponding pixel. Then, rendering is performed for writing the calculated value to an address of a display buffer (frame buffer) corresponding to the pixel.
One of the rendering methods is polygon rendering. In this method, a three-dimensional model is expressed as a composite of triangular unit graphics (polygons). By drawing using the polygons as units, the colors of the pixels of the display screen are decided.
In polygon rendering, coordinates (x, y, z), color data (R, G, B), homogeneous coordinates (s, t) of texture data indicating a composite image pattern, and a value of the homogeneous term q for the respective vertexes of the triangle in a physical coordinate system are input and processing is performed for interpolating these values inside the triangle.
Here, coordinates in a UV coordinate system of an actual texture buffer, namely, texture coordinate data (u, v), are comprised of the homogeneous coordinates (s, t) divided by the homogeneous term q to give xe2x80x9cs/qxe2x80x9d and xe2x80x9ct/qxe2x80x9d which in turn are multiplied by texture sizes USIZE and VSIZE, respectively.
FIG. 11 is a view of the system configuration of the basic concept of a three-dimensional computer graphic system.
In the three-dimensional computer graphic system, data for drawing a graphic image is given from a main memory 2 of a main processor 1 or an I/O interface circuit 3 for receiving external graphic data to a rendering circuit 5 having a rendering processor 5a and a frame buffer 5b via a main bus 4.
The rendering processor 5a is connected to a frame buffer 5b intended to hold data for display and a texture memory 6 for holding texture data to be applied on the surface of a graphic element to be drawn (for example, a triangle).
The rendering processor 5a is used to perform the processing for drawing a graphic element with a texture applied to its surface in the frame buffer 5b for every graphic element.
The frame buffer 5b and the texture memory 6 are generally composed by a dynamic random access memory (DRAM).
In the system shown in FIG. 11, the frame buffer 5b and the texture memory 6 are configured as physically separate memory systems.
Recently, it has become possible to provide a DRAM and a logic circuit together. Looking at graphic drawing image processing apparatuses, as shown in FIG. 12, there are ones attempting to build a DRAM or other large capacity memory 7a on the same semiconductor chip 7 as a drawing use logic circuit 7b while keeping the previous structure of use of an external memory as it is.
In this case, a DRAM core having an equivalent control mechanism as a general-purpose DRAM is simply arranged next to the prior graphic drawing image processing logic circuit and the two are interconnected by a single path.
There are only the above types in the case of graphic drawing image processing apparatuses.
Below, although the technical field is different from that of a graphic drawing image processing apparatus, the trends in the field of microprocessors will be described.
In the past, it has been proposed to provide a microprocessor and a memory on a single chip. Proposals have also been made regarding the arrangement of the memory on the chip.
For example, in a PPRAM (ISSCC97/SESSION14/Parallel Processing RAM), as shown in FIG. 13, DRAMs 8a-1 to 8a-4 serving as main memories and microprocessors (P) 8b-1 to 8b-4 are built in on the same semiconductor chip 8.
Note that, in FIG. 13, reference numerals 8c-1 to 8c-4 indicate memory controllers (Mem CTL) of the DRAMs 8a-1 to 8a-4, and 8d-1 to 8d-4 indicate caches.
In this semiconductor chip 8, the DRAMs 8a-1 to 8a-4 serving as the main memories are arranged in only one direction with respect to the microprocessors 8b-1 to 8b-4.
Also, FIG. 13 shows a configuration wherein a plurality of microprocessors 8b-1 to 8b-4 access single DRAMs via the caches 8d-1 to 8d-4.
Turning to the problems to be solved by the invention, in the above conventional so-called built-in DRAM system, however, when a frame buffer memory and a texture memory are separated into different memory systems, there is a disadvantage that the frame buffer emptied due to a change of the display resolution cannot be used for the texture. Alternatively, when the frame memory and the texture memory are physically combined, the overhead of the page exchange of the DRAM etc. becomes large at the time of simultaneous success of the frame memory and the texture memory, so there is a disadvantage that the performance has to be sacrificed.
Also, with a method of interconnection wherein a DRAM core having a control mechanism equivalent to a general-purpose DRAM is arranged next to a graphic image processing logic circuit and the two are connected by a single path, the bandwidth for accessing is not improved at all in spite of the trouble of building in the DRAM and becomes a bottleneck in system performance.
Furthermore, a built-in main memory type microprocessor has the following disadvantages:
Namely, the semiconductor chip 8 has four units of the same functional configuration aligned with each other and transfers data through the memory controllers. The bandwidths of the transfer are determined by the path widths of the memory controllers and the operating speeds. The fastest path is one cutting straight across the chip. The operating speed is determined by the longest path. Therefore, improvement of the operating speed becomes difficult. Long paths naturally occupy a greater area in the layout.
The trend has been for the speed of microprocessors to double every 18 months and for the memory capacity to also double about every 18 months.
In spite of this situation, the access time increases about 7% per year. How to make the access time faster is now becoming the key to improving the system performance.
In the above conventional method, the larger the chip, the longer the critical path and therefore the more the operating speed ends up being hampered.
Accordingly, the access time between DRAMs is left unimproved, so the merits of building in DRAMs do not appear that much.
An object of the present invention is to provide an image processing apparatus capable of effectively utilizing a storage circuit provided together with a logic circuit and enabling an increase of the operating speed and reduction of the power consumption without causing a deterioration of performance.
According to a first aspect of the present invention, there is provided an image processing apparatus comprising a storage circuit divided into a plurality of storage modules, each storage module storing image data of different pixels and a logic circuit for performing predetermined processing on the image data based on the stored data of the storage circuit, the storage circuit and the logic circuit being both accommodated on one semiconductor chip, and the plurality of divided storage modules arranged at peripheral portions of the logic circuit.
According to a second aspect of the invention, there is provided an image processing apparatus for performing rendering by receiving polygon rendering data including three-dimensional coordinates (x, y, z), R (red), G (green), and B (blue) data, homogeneous coordinates (s, t) of texture, and a homogeneous term q for vertexes of a unit graphic; comprising a storage circuit divided into a plurality of storage modules, each storage module storing display data of different pixels and texture data required by at least one graphic element and a logic circuit comprising at least an interpolation data generating circuit for performing interpolation on the polygon rendering data of the vertexes of the unit graphic to generate interpolation data of pixels positioned inside the unit graphic and a texture processing circuit for dividing the homogeneous coordinates (s, t) of texture included in the interpolation data by the homogeneous term q to generate xe2x80x9cs/qxe2x80x9d and xe2x80x9ct/qxe2x80x9d, using texture addresses in accordance with the xe2x80x9cs/qxe2x80x9d and xe2x80x9ct/qxe2x80x9d to read texture data from the storage circuit, and performing processing for applying the texture data to the surface of the graphic elements of the display data, and the storage circuit and the logic circuit being both accommodated on one semiconductor chip, and having the plurality of divided storage modules arranged at peripheral portions of the logic circuit.
Preferably, the logic circuit is divided into a plurality of pixel processing blocks corresponding to the storage modules and each corresponding pixel processing block is closely arranged to each storage module.
Preferably, further provision is made of a secondary memory capable of storing stored data of a storage module and the secondary memory is closely arranged to the storage module.
Preferably, a pixel processing block performs at least one stage of pipeline processing therein.
Preferably, the storage modules are arranged at peripheral portions of the logic circuit so as to surround the logic circuit and wherein input/output terminals are arranged at the inside edges facing the logic circuit.
Preferably, the plurality of pixel processing blocks, even if for modules having the same function, are changed in the positions of their terminals for taking out paths so as to enable paths to be optimally laid to pixel processing blocks using paths from the storage modules.
Preferably, there is further provided a control block equivalently connected to all of the storage modules for controlling the operations of the above plurality of storage modules and that control block is arranged close to a center point surrounded by the storage modules.
Preferably, the storage circuit is accessed based on a row address and a column address; the logic circuit is divided into a plurality of pixel processing blocks corresponding to the storage modules, a corresponding pixel processing block being closely arranged at each storage module; there is a secondary memory capable of storing the stored data of a storage module, which secondary memory is arranged close to a storage module; the storage module is arranged so that its longitudinal direction is the column direction of a core; and the pixel processing block and the secondary memory are arranged close to each other on the same side of the long side of the storage module.
Explained from another angle, in the present invention, the storage circuit is composed of a plurality of independent modules. Due to this, the ratio of valid data held in a bit line in one access increases comparing with the case where accesses have to be made simultaneously.
The plurality of divided storage modules are arranged at the peripheral portions of the logic circuit portion for carrying out graphic drawing processing etc.
As a result, the distances from the respective storage modules to the logic circuit portion become uniform and the length of the longest path interconnection is shortened compared with the case where the modules are all arranged in one direction. Therefore, the operating speed as a whole is improved.
Also, a function block for controlling pixel processing in the graphic drawing is arranged close to each of the storage modules of the storage circuit.
Therefore, read/modify/write processing, which is carried out for an extremely large number of times in graphic processing, can be performed in a very short interconnection region. Therefore, the operating speed is strikingly improved.
At each storage module, a secondary memory is closely arranged to the module.
Due to this, even when data is transferred from a storage circuit to a secondary memory by a path having a very wide width, there is little effect of so-called cross talk. Also, since the interconnection length is naturally short, the operating speed is improved. Further, the area occupied by the interconnections becomes small as well.
By having a function block for controlling the pixel processing in the graphic drawing perform at least one stage of pipeline processing therein, even if the distance to a block carrying out other graphic processing arranged at the center becomes long on an average, it is possible to eliminate the effect on the through-put for processing data and therefore the processing speed is improved.
Further, the input/output terminals at the modules arranged at the peripheral portions of the logic circuit portion for carrying out the graphic drawing processing etc. so as to surround the same are arranged at the inner sides facing the logic circuit portion.
Due to this, the interconnection region is orderly and the average interconnection length becomes shorter.
Also, a plurality of function blocks for controlling the pixel processing, even if they are for modules having the same function, are changed in the positions of their terminals for taking out paths so as to enable paths to be optimally laid to function blocks using paths from the modules.
Due to this, even if the same in function, the terminals of the blocks can be arranged at the optimal positions for the locations of arrangement of the blocks, so the average interconnection length becomes shorter.
Also, the block having the largest number of interconnections among blocks equally connected to all of the storage modules is arranged close to the center point surrounded by the storage circuits.
As a result, the area occupied by the interconnections becomes smaller and the longest interconnection length becomes shorter. Therefore, the operating speed can be simultaneously improved as well.
When, for every module, a function block for controlling the pixel processing in the graphic drawing and a secondary memory are closely arranged to the storage module, the storage modules are arranged so that their longitudinal directions becomes the same as the column direction of a core of the storage circuit (for example, DRAM).
As a result, comparing with arrangement in the row direction, by just specifying the row address, the one row""s worth of data corresponding to that row address can be loaded into the secondary memory at one time, that is, the number of bits is dramatically increased.
The pixel processing block and the secondary memory are closely arranged to each other on the same side of a longitudinal side of the storage module.
As a result, data to the pixel processing block and the secondary memory can use the same sense amplifier. Therefore, the increase of the area of the core of the storage circuit can be kept to a minimum and two ports become possible.