This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 11-294253, filed Oct. 15, 1999, the entire contents of which are incorporated herein by reference.
The present invention relates to a graphic computing apparatus for drawing high-quality three-dimensional computer graphics (CG) in real time.
A system such as a game machine using real-time three-dimensional (3D) CG is required to execute a graphic process that receives and draws data called a primitive, which represents a unit shape of the surface of an object present in a 3D space, i.e., a 3D object (to be simply referred to an object hereinafter). In order to execute this process at high speed, a graphic computing apparatus implemented as hardware is used.
In a conventional graphic computing apparatus, a plane polyhedron called a polygon is used as a primitive of an object, and undergoes drawing to express a 3D space. More specifically, the conventional graphic computing apparatus is roughly comprised of three elements, i.e., a xe2x80x9cgeometry processorxe2x80x9d, xe2x80x9crasterization processorxe2x80x9d, and xe2x80x9cframe memoryxe2x80x9d, and processes are done in a pipeline manner.
The geometry processor executes coordinate conversion and a lighting process of a polygon as a primitive in units of vertexes. The geometry processor also computes texture coordinates corresponding to vertexes as needed, but does not read any texture image itself from the frame memory. The geometry processor obtains screen coordinate values, colors, and texture coordinate values of the vertexes of a polygon as processing results and passes them to the rasterization processor.
The rasterization processor executes a process for drawing a polygon on the frame memory in units of pixels. The color of each pixel is determined by linear interpolation of colors assigned to individual vertexes using a method called smooth shading. The rasterization processor uses a scheme for hiding (not drawing) an object which is hidden or occluded by another object by a hidden-surface removal algorithm called Z-buffering using a Z buffer assured on the frame memory, upon drawing. Furthermore, the rasterization processor uses a technique called texture mapping for mapping a two-dimensional (2D) picture using a texture image stored in the frame memory upon executing a drawing process in units of pixels.
In the texture mapping process, the positions of corresponding texture image elements in a texture image region on the frame memory are obtained in units of pixels on the basis of the texture coordinate values from the geometry processor, and color data at those positions are read from the texture image region, and undergo an arithmetic process with colors in units of pixels determined by linear interpolation mentioned above, thus determining colors to be written in the frame memory. Conventionally, arithmetic sections in units of pixels in the texture mapping process are built in the rasterization processor as a hardware circuit, and can only execute a very simple arithmetic process.
In actual system arrangements, for example, the process of the geometry processor is implemented by a program of a CPU, the geometry processor is included in the CPU, the geometry processor and rasterization processor are formed by a single LSI, or the rasterization processor and frame memory are formed by a single LSI. In any of these arrangements, however, the process from the geometry processor to the rasterization processor is basically done by a one-way pipeline process.
On the other hand, as a still advanced 3D CG technique, a parallel type graphics architecture based on a pixel computing scheme is known. As an example of this architecture, Pixel Flow/Pixel Plane disclosed in Molnar, S. et al., xe2x80x9cPixel Flow: High-Speed Rendering Using Image Compositionxe2x80x9d, Computer Graphics (Proc. of SIGGRAPH ""92), Vol. 26, No. 2, pp. 231-240 (reference 1), U.S. Pat. No. 4,590,465 (reference 2), U.S. Pat. No. 4,783,649 (reference 3), and the like is known.
This Pixel Flow/Pixel Plane is characterized in that SIMD processors assigned in units of pixels execute exchangeable programs upon rasterizing a polygon to determine colors by complicated procedural arithmetic operations in units of pixels and to write them in the frame memory, thus achieving elaborate picture expression. However, since processes must be done in units of pixels, arithmetic operations using many SIMD processors are required to draw a large polygon which has only simple surface properties, and a large number of SIMD processors are required to implement such process at high speed, resulting in a bulky system. Also, this technique can hardly implement displacement mapping in which the surface position of an object is displaced.
Real-time 3D CG such as a game or the like is required to display pictures with the highest possible quality within a limited time called a frame time represented by {fraction (1/60 )} sec so as to display animation that moves smoothly.
The balance between high speed and high quality of image generation is the most important point for application software creators of, e.g., games and the like, and a graphic computing apparatus for real-time 3D CG is required to have an arrangement with which the application creators can freely control the speed and image quality.
However, in the conventional graphic computing apparatus, since a flexible vertex process as a procedural process in the geometry processor and a texture process in the rasterization processor using the frame memory are independently shared and expressions that can be achieved by the respective portions are fixed, the control method of the speed and image quality is limited.
As a technique required to provide higher-quality pictures than conventional ones in real-time 3D CG, techniques currently used to generate very high-quality pictures in the fields of xe2x80x9cnon-real-time 3D CGxe2x80x9d such as movies and the like are known. These techniques include:
(1) a scheme for displaying objects such as persons, living bodies, and the like with high reality by modeling based on curved surface definition;
(2) displacement mapping for displacing the surface shape of each object;
(3) a scheme for drawing by computing shadows to make the layout of objects in a space easy to understand;
(4) image-based rendering for generating 3D CG by arithmetic operations from actually sensed images; and
(5) a non-photo-realistic rendering scheme for generating a sketch-style picture, illustration-style picture, and the like by procedural shading.
In the field of xe2x80x9cnon-real-time CGxe2x80x9d, the time upon displaying pictures on a screen is determined, but the image generation processing time is not limited when pictures to be displayed are obtained one by one by computations. Hence, in order to implement these schemes in real-time 3D CG, a mechanism for executing graphic processes at higher speed is required.
However, in the structure of the conventional graphic computing apparatus, since the vertex process in a geometry section and the texture process in a rendering section are separated and shared by the geometry and rendering units, and possible expressions in the individual processors are fixed, elaborate, real pictures cannot be efficiently drawn using the aforementioned schemes.
As an example to which the aforementioned schemes in the xe2x80x9cnon-real-time CGxe2x80x9d field can be applied, a REYES architecture proposed by Robert L. Cook et al., xe2x80x9cThe Reyes Image Rendering Architecturexe2x80x9d, Computer Graphics (Proc. of SIGGRAPH ""87), Vol. 21, No. 4, pp. 95-102 (reference 4) is known. This architecture is implemented by software, and is commercially available as xe2x80x9cPHOTOREALISTIC RENDERMANxe2x80x9d software from Pixar Animation Studios, USA. This architecture divides an input primitive into polygons called micropolygons equal to or smaller than the-pixel size, and programmably executes elaborate processes including displacement mapping in units of vertexes of micropolygons.
However, this REYES architecture attaches importance on creation of very high-quality pictures. Hence, this architecture requires a long time for arithmetic operations since it is not devised to shorten the drawing time, which is strictly required in real-time 3D CG, and is not suitable for real-time hardware. Especially, since all primitives are basically processed by dividing them into small micropolygons equal to or smaller than the pixel size, a huge number of micropolygons are generated (for example, in the example described in reference 4, the number of micropolygons is 6.8 millions, resulting in poor adaptability to real-time hardware.
It is an object of the present invention to provide a graphic computing apparatus which allows an application creator to freely control the speed and image quality and can implement a high-quality image generation scheme used in non-real-time CG in real time.
The present invention provides a graphic computing apparatus comprising a shape divider which divide a unit shape of a surface of an object present in a three-dimensional space into a plurality of subpolygons arranged two-dimensionally and having an arbitrary size, to generate a subpolygon mesh, a vertex processor which computes parameters required for drawing in units of pixels with respect to subpolygons for each vertex of the subpolygon mesh generated by the shape divider, a rendering processor which computes drawing data in units of pixels on the basis of the parameters computed by the vertex processor and picture data for texture mapping, and a frame memory which stores the drawing data as picture data together with at least data for texture mapping required for the rendering processor to compute the drawing data.
More specifically, each subpolygon mesh generated by the shape divider has a 2D structure of subpolygons, and the vertex processor computes lighting and the like in units of 3D vertexes of respective subpolygons. Upon dividing in the shape divider, since the size of each subpolygon obtained by breaking up a primitive can be designated by various methods, the number of subpolygons is controlled to control the computation time and picture quality.
Since the vertex processor can programmably process in units of vertexes of subpolygons, processes finer than the vertex unit of a primitive can be done. More specifically, drawing can be done for a primitive to which displacement mapping is applied.
The apparatus further comprise a frame memory readout route which reads out data at least for texture mapping held by the frame memory and transfers the readout data to the vertex processor. The vertex processor reads data at least for texture mapping corresponding to the vertexes of polygon meshes via the frame memory readout route, and computes parameters required for drawing in units of pixels of subpolygons in units of vertexes of polygon meshes using the read data.
By adding the frame memory readout route from the image memory to the vertex processor, arithmetic operations in units of vertexes can use texture data and picture data such as depth map data, allows coarse, high-speed texture mapping and shading for coarsely divided subpolygon meshes, and allows high-quality texture mapping and shading equal to or smaller than a pixel unit for subpolygon meshes with a size smaller than a pixel. A cache may be added to this frame memory data readout route, thus reducing the number of times of access to the frame memory and further improving the processing speed.
The vertex processor has a plurality of processing elements which respectively make arithmetic operations for computing parameters required for drawing in units of pixels of subpolygons in units of vertexes of polygon meshes, and simultaneously make arithmetic operations for a plurality of vertexes in accordance with an identical program.
Furthermore, these plurality of processing elements repeat a process for simultaneously making arithmetic operations for each row of a subpolygon mesh in correspondence with the number of rows of the subpolygon mesh. That is, upon executing the vertex process for a subpolygon mesh as a 2D structure in the vertex processor, the plurality of processing elements which are arranged linearly are assigned each row of a subpolygon mesh, and make arithmetic operations in units of rows, thus improving the use efficiency of the processing elements, and improving the total processing speed.
The plurality of processing elements linearly line up, and neighboring processing elements in the lineup direction are connected via data transfer routes, and all the processing elements simultaneously transfer at least some of internal data to neighboring processing elements in the arrangement direction. In this manner, when the vertex processor uses arithmetic operation results for neighboring vertexes, a normal vector or the like can be easily computed.
Furthermore, by providing a triangle strip construction section which constructs arithmetic operation results in units of vertexes by the plurality of processing elements into a successive triangle strip, and transfers it to the rendering processor that executes frame painting, the processing efficiency can be improved.
According to the graphic computing apparatus of the present invention, various high-quality drawing processes which are hard for the conventional graphic computing apparatus to implement can be flexibly done, high-speed drawing as in the conventional graphic computing apparatus can be achieved, and their tradeoff can be easily controlled.
Another graphic computing apparatus according to the present invention has a plurality of vertex processors and an exchanger which arbitrarily exchanges and connects the output of the shape divider and the inputs of the plurality of vertex processors. In this way, parallel processes of a plurality of subpolygon meshes can be done to improve the processing efficiency, and the total processing time can be shortened.
Still another graphic computing apparatus according to the present invention has a plurality of shape dividers, a plurality of vertex processors, and an input distributor which distributes primitive data to the plurality of shape dividers. With this arrangement, since parallel processes which divide a plurality of primitives into subpolygon meshes can be done, the processing efficiency can be improved, and the total processing time can be shortened.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.