The present invention relates to a data processor having various modules, and more specifically to a graphic module for processing three dimensional, 3D, images.
During the recent years, the demand for high quality 3D graphics and animation has been steadily increasing, as the cost for computer hardware is lowering. Two important design constraints in implementing 3D graphics architecture is memory bandwidth and system latency. The third factor is memory cost.
The bandwidth requirements to implement a 3D graphics system depends on the complexity of the system. Typically, 3D graphic systems include multiple modules in a pipe-lined architecture, such as geometry transformation, lighting transformation, shading or rasterizing by interpolation, texture mapping and texture filtering.
The geometry transformation is the process of transforming the model of a three dimensional object in a three dimensional space into a two dimensional screen space. This process includes the steps of defining the three dimensional model by a plurality of polygons, such as triangles, and transforming these polygons into a two dimensional space.
The geometry of lighting transformation or lighting, is the process of representing the intensity of light reflections from the three dimensional model to a two dimensional screen space. A three dimensional model receives light from a plurality of light sources and colors. It is necessary to mathematically express color reflections and lighting. The reflection intensity depends among other things on such parameters as distance of the light source, angle of light incidence, luminance and color effects of the light sources. Typically, the transformation is accomplished only at the three nodes of each triangle and not the pixels within each triangle. Once a triangle and its lighting at three nodes is defined in the two dimensional screen space, the pixels within the triangle are defined by an interpolation process, wherein the three nodes function as boundary conditions. This shading by interpolation technique is also referred to as Guraugh shading.
The texture mapping process provides a mechanism to represent the texture of a three dimensional model. Thus, a texture space is defined in a two dimensional space by two coordinates referred to as horizontal coordinate (u) and a vertical coordinate (v). Each pixel in the texture space is called a texel. Information relating to each texel is stored in an external memory, which can be mapped to the nodes of a corresponding triangle in response to a fetch texel command. The texel color is then blended with the shading color described above resulting in the final color for each node of each triangle. Again a shading by interpolation is employed to find the shades of pixels inside each triangle.
As mentioned above, conventional microprocessor based systems employing 3D graphics processing have experienced bandwidth limitations. For example, a microprocessor, such as X-86 is coupled to a 3D graphics chip via a PCI bus. An external memory stores information relating to the 3D model. The microprocessor performs the geometry and lighting calculations and transfers the results, which are information relating to the nodes of each triangle, via the PCI bus to the 3D graphics chip.
The 3D graphics chip includes a slope calculator that measures the slope of each side of the triangle. An interpolator calculates the shading colors of each pixel within a triangle based on the measured slopes. A texturing unit measures the textures of each pixel within the triangle based on the measured slopes and on the information stored in the texture map.
A separate frame buffer memory is employed to store the texture maps described above. Each texture map corresponds to the texture of an element used in the image. Furthermore, the frame buffer memory includes a separate buffer space referred to as a Z-buffer. The Z-buffer is employed to remove the hidden parts of a triangle when it is not intended to be displayed. Thus, when a plurality of objects are overlapped, invisible planes need to be removed in order to determine which edges and which planes of which objects are visible and display only the visible planes. Conventionally, various algorithms are employed for removing invisible planes as described in Fundamentals of Interactive Computer Graphics, J. D. Foley and A. Vandam (Addisson Wesley 1982), and incorporated herein by reference.
The Z-buffer stores, a Z-value, or a depth value of each pixel that needs to be displayed on a screen. Then, a Z-value of each point having x, y coordinate values, within a triangle is calculated and the obtained calculation result is compared with the Z-value corresponding to x,y coordinates. When the Z-value of a point is larger than the stored Z-value that point is considered to be hidden.
Thus, the microprocessor based system described above divides the graphics processing functions between the microprocessor and the 3D graphics chip. The microprocessor does the geometry and lighting steps, and provides triangle data to the graphics chip via the PCI bus. A typical graphics processing operation requires the processing of 1M triangles/sec. Each triangle contains about 50-60 bytes of information. This information includes, the x,y,z coordinates of each node, the color values, R,G, B, the texture values, u,w for each of the three nodes. Thus when each coordinate and color and texture value is represented by 4 bytes of information, the three nodes of each triangle may be defined by 96 (32xc3x973) bytes of information This relates to a data transfer of 96 M bytes/sec from the microprocessor to the 3D graphics chip. Thus, the PCI bus may experience severe bottle necking.
Another bandwidth limitation in implementing 3D graphics processing is data transfers from frame buffers to the 3D graphics chip. Usually, a typical model space may include 2-4 objects overlapping in each area. Thus, shading and texturing may be done 2-4 times in conjunction with Z buffering. For a 60 frame per second display, the data transfer rate between the frame buffer and 3D graphics chip is about 720 M bytes/sec (96 bytes/trianglexc3x971024 pixels/linexc3x97768 linesxc3x973 (shadings)xc3x9760 frames/sec), without the Z-buffering. For Z-buffering, this transfer rate is twice as high (1440 M bytes/sec) because of the read and write operation involved in Z-buffering. Texel fetching also requires a data transfer rate of 360 Mbytes/sec. Such data transfer rates are not feasible with the current memory technology. Thus, current 3D graphics arrangements employ substantially lower resolutions, which would not lead to realistic images.
Thus, there is a need to reduce the bandwidth delays associated with transfer of data from a microprocessor to a 3D graphics chip and bandwidth delays associated with transfer of data from an external memory to the microprocessor local memory.
In accordance with one embodiment of the invention in an integrated circuit, a multimedia processor for performing three dimensional graphics processing includes a microprocessor that generates triangle set-up information corresponding to a plurality of triangles that define a three dimensional object displayed on a screen. The screen is defined by a plurality of bins having a predetermined number of pixels. A data cache is coupled to the microprocessor configured to store the set-up information. A three dimensional triangle rasterizer is coupled to the data cache so as to perform bin allocation to the triangles for the purpose of identifying all bins that intersect with a triangle on the screen.
In accordance with another embodiment of the invention, the data cache includes a tile index buffer, that stores information relating to each one of the bins The three dimensional triangle rasterizer includes a binning unit that provides tile data information to a local memory unit. It also includes a screen coordinate interpolator, that provides the coordinates of intersecting pixels along the sides of each triangle that cross a span line as defined by the binning unit. The binning unit provides a tile data information corresponding to the identification of each triangle in a bin. The binning unit divides each one of the triangles into an upper and a lower sub triangles along a horizontal line crossing the middle vertex of each one of the triangles. Thereafter the binning unit identifies the bins in which each one of the er triangles is located by employing the condition
X=[min 2(CrossXAC, CrossXAC+dxdy AC), max 2 (CrossXAB, Cross XAB+dxdyAB)]
wherein, Cross XAC is the x coordinate of the cross point between the edge AC of a triangle ABC, and the next span, and, Cross XAB is the x coordinate of the cross point between the edge AB and the next span.
Furthermore, the binning unit identifies the bins in which each one of the lower triangles is located by employing the condition
X=[min 2(Cross XAC, Cross XAC+dxdyAC), max 3 (Cross XAB, Bx, Cross XBC)], 
wherein Cross XAC is the x coordinate of the cross point between the edge AC of a triangle ABC, and the next span, CrossXBC is the x coordinate of the cross point between BC and the next span.
A memory unit is coupled to the data cache to store the tile index information. A data streamer is coupled to the memory unit and the data cache so as to transfer data.