1. Field of the Invention
The present invention relates to a three-dimensional (3D) graphic accelerator, and more particular to a buffer structure to retain consistency in a 3D graphic accelerator processing primitives in parallel inside of a rendering processor.
2. Description of the Related Art
3D graphics is a most critical part for constructing an environment for multimedia, and necessitates a 3d graphic accelerator.
The 3D graphic accelerator needs to undergo a quite complicated computing procedure in order to display 3D graphics, i.e., converting a software-wise calculation to a hardware-wise calculation.
Rendering chips in general of a 3D graphic accelerator mounted on PCs have a structure of processing in high speed with respect to a single primitive. Therefor, it consumes a considerable time to process a great number of primitives.
For this reason, a rendering chip structure of high speed, which can simultaneously process a plurality of primitives by using a parallelism of the primitives, has been recently suggested.
FIG. 1 is a block diagram illustrating the 3D graphic processing steps according to the conventional technology.
Referring to FIG. 1, if a 3D applicable software 1 is transferred to a 3D graphic accelerator 3 through an application program interface (PI) 2, the 3D graphic accelerator is transferred to a display 4 after performing a real-time hardware acceleration.
At this stage, a rendering processor 3b in most of the 3D graphic accelerator 3 mainly use primitives of triangular shape for a high-speed processing because the triangular shape is easy to be processed in hardware-wise.
The primitives are 3D data inputted to the 3D graphic accelerator 3, mainly composed of dots, lines and polygons.
However, processing a plurality of primitives by using the parallelism thereof in the course of 3d graphic processing poses a problem of inconsistency when the primitives overlap on a screen.
The following is a description of the parallelism and inconsistency of the primitives made with reference to an embodiment of the rendering chip structure according to the conventional technology.
When each primitive does not have any overlapping region on a coordinate of a screen, parallel processing can be performed irrespective of the inputted order of the primitives to a rendering processor. This is referred to as an out-of-order execution.
FIG. 2 is a diagram illustrating five primitives displayed on a single frame according to an embodiment of the conventional technology.
The five triangular primitives are inputted to the rendering processor in the order from triangle No. 1 to the triangle No. 5. Referring to FIG. 2, a first region comprises triangle Nos. 1 to 3, and a second region comprises triangle Nos. 4 and 5. Here, the first region does not overlap the second region. In the region 1, the triangle No. 1 does not overlap the triangle No. 2, while the triangle No. 3 overlaps the triangle Nos. 1 and 2. In the region 2, the triangle No. 4 overlaps the triangle No. 5.
FIGS. 3(a) and 3(b) are diagrams dividing the triangles in FIG. 2 into two so as not to be overlapped one another.
If the rendering processor processes the plurality of primitives in parallel, the triangle Nos. 3 and 5 in FIG. 3(a) or the triangle Nos. 1, 2 and 4 in FIG. 3(b) do not overlap one another. Thus, those triangles can be processed in parallel irrespective of the inputted order.
However, processing of FIGS. 3(a) and 3(b) needs to be performed in sequential order because of overlapping. If the processing is performed in parallel, the final value may not be correct with respect to the overlapping regions. The following is an explanation of that occasion made with reference to an embodiment.
In FIG. 2, assume that the depth of the triangle No. 4 by reference to a depth of a pixel A in the overlapping region is 50, and that the depth of the triangle No. 5 by reference to the depth of the pixel A in the overlapping region is 20.
Herein, the pixel A has a maximum depth value that can be represented by a number of bits among the values stored in the memory as a background value. Hereinafter, the depth value of A will be referred to as “MAX.”
If the triangle Nos. 4 and 5 are rendered and displayed on a screen, the overlapping region between the triangle Nos. 4 and 5 must be displayed to have a final value of 20 as a depth value of the triangle No. 5 by reference to the depth of the pixel A.
To be specific, if the triangle No. 4 is rendered with respect to the pixel A, the depth value 50 of the triangle No. 4 is compared with MAX value, which is a background value so as to store the less value 50 in the memory. Also, if the triangle No. 5 is rendered, the depth value 20 of the triangle No. 5 is compared with the depth value 50, the less value 20 is stored to be 20 in the memory. Therefore, the depth value of the triangle 5 is ultimately stored to be 20 in the memory.
However, a problem is posed when the triangle Nos. 4 and 5 are processed in parallel.
If the depths of the triangle No. 4 the triangle No. 5 are simultaneously compared with respect to the pixel A, the depth value 50 of the triangle No. 4 is compared with the background value MAX. Then, the depth value to be stored in the memory is determined to be 50. At the same time, the depth value 20 of the triangle No. 5 is compared with the background value MAX. Then, the depth value to be stored in the memory is determined to be 20. Here, a conflict is generated between the depth values 20 and 50 so as to be stored in the memory.
If the depth value 20 is first determined, and the depth value 50 is determined later, the depth value to be stored in the memory is defined to be 50. In that case, incorrect outcome is generated. This problem is called a “consistency problem.”
To resolve the consistency problem generated when the primitives overlap on a screen due to the parallel structure, a separate unit is required for checking and management of the overlapping regions. A superscalar method used by S3 Company has been suggested for this.
FIG. 4 is a block diagram illustrating an overall structure of 3D rendering processor using a superscalar method recently published by S3 Company according to the conventional technology, which can simultaneously operate n number of rendering accelerators in parallel.
Referring to FIG. 4, the rendering processor comprises a fetch unit 10 for receiving and transmitting the primitives to be processed to a vacant region of the buffer, an issue unit 20 for allotting and managing so as to retain consistency in processing the plurality of primitives transmitted from the fetch unit 10 in parallel, a rendering accelerator 30 for rendering by means of a texture cache after receiving the plurality of primitives allotted and managed by the issue unit 20, and a memory interface unit 40 for processing the memory command by using the command defined by the rendering accelerator 30.
The following is a description of an operation of the rendering processor constituted above.
The fetch unit 10 brings the primitives to be processed to the rendering accelerator 30. If a first buffer of the issue unit 20 has a vacant region, the primitives are transferred to the first buffer of the issue unit 20.
The information on the primitives allotted to the accelerator 30 for rendering is transferred from the first buffer to the second buffer and is stored in the second buffer.
The issue unit 20 checks if there exists any overlapping region by using the information on the primitives, which have not yet been inputted to the rendering accelerator 30 but are buffered by a first buffer among the primitives transmitted from the fetch unit 10, as well as the information on the primitives of the second buffer, which are rendered by the rendering accelerator 30.
As a result of the checking, it is determined whether or not to process in parallel according to the overlapping region. Depending on the determination, the corresponding primitives are rendered by the respective rendering accelerator 30.
Once the rendering of the primitives allotted to each rendering accelerator 30 is completed, information on the primitives buffered in the issue section 20 is re-adjusted.
FIG. 6 is a block diagram illustrating a structure of a register used as a buffer in an issue unit of FIG. 4. Referring to FIG. 6, a candidate buffer is equivalent to the first buffer, while a destination reservation station and a source reservation station are equivalent to the second buffer.
The structures of the register shown in FIG. 6 have information on the triangles either waiting at the issue unit so as to be rendered by the rendering accelerator or being rendered by the rendering accelerator.
Accordingly, the issue unit 20 computes whether or not there exists any overlapping region based on the above information, and controls the rendering performed in parallel according to the computed result. In this regard, it is quite difficult to compute the overlapping region with respect to the primitives buffered by the second buffer without being inputted to the rendering accelerator 30 as well as to the primitives buffered by the first buffer.
In other words, most of the primitives are based on triangles for simplification of the rendering, and it is difficult to compute accurately in hardware-wise whether or not there exists any overlapping region by means of the coordinate value of the triangles.
Accordingly, the calculation of the overlapping region between the plurality of primitives is generally made by forming a rectangular bounding box outside of the triangle, as shown in FIG. 5.
The reason for computing the overlapping region by forming a rectangle outside of the triangle is that, whereas maximum and minimum values of a primitive can be computed in a line unit based only on the coordinate value of two vertexes diagonally lined in case of a rectangle, maximum and minimum values of a primitive need to be computed for the positions fo the lines connecting each vertex as well in addition to the three vertexes in case of a triangle. Thus, much more amount of maximum and minimum values of a primitive needs to be computed in a line unit in case of a triangle than in case of a rectangle. Subsequently, much more amount of calculation as to an existence of any overlapping region is required.
The 3D rendering processor using the superscalar method performs a checking the overlapping by forming a rectangular bounding box outside of a triangular primitive to be rendered.
However, the rendering processor of a 3D graphic accelerator according to the conventional technology poses the following problems.
First, since the rendering processor using the superscalar method computes an overlapping region by using a bounding box, the calculation is made as if an overlapping region exists in a primitive even though no overlapping region exists in fact. This is due to the bounding box generating an overlapping region. As a consequence, the performance of rendering is deteriorated.
Second, if any overlapping region exists in one triangular primitive, the rendering must be performed in a sequential order to region other than the overlapping region. Therefore, the performance of rendering is also deteriorated.
Third, the overall design becomes complex due to the plurality of buffers and complicated control inside of the issue unit.