1. Field of the Invention
The present invention relates to graphic processors, and more particularly, to a graphic processor including a primary bus such as a PCI (Peripheral Component Interface) bus and a secondary bus, a geometry engine (geometric operation unit) connected to these buses to execute geometric operations, and a device connected to the secondary bus such as a rendering controller (hereinafter referred to as an xe2x80x9cRCxe2x80x9d) to generate images for display based on the result of geometric operation.
2. Description of the Background Art
In recent years, so-called 3D processing has been used in a very wide range of applications, according to which the three-dimensional geometric configuration of an object is calculated using a graphic processor, lighting processing is further executed to the surface of the configuration of the object thus obtained, and texture is attached for display. Generally used, conventional general-purpose CPUs (Central Processing Units) do not have enough ability to execute these processings, which forms a bottleneck in processing. These graphic processors therefore typically include a geometry engine specifically used for complex transformations, lighting calculation and clipping calculation often used in 3D graphic processing.
Referring to FIG. 1, a conventional graphic processor 240 includes a CPU 52 to execute a main routine in graphic processing and other control programs, a main memory 56 to store various programs to be executed by CPU 52 and data, a core logic 252 to execute input/output of data to/from main memory 56 under the control of CPU 52, a primary PCI bus 58 to which core logic 252 is connected, a geometry engine 254, which is, one of agents connected to primary PCI bus 58, a secondary bus 64 connected to the output side of geometry engine 254, and a rendering controller 66 connected to secondary bus 64 to execute rendering processing to graphic object data resulting from the calculation of geometry engine 254.
Graphic processor 240 executes various processings as described above, and the clipping processing will be briefly described as one example of the processings. The clipping processing is executed to separate a part of a graphic object without the range of display (clipping plane) from graphic operation. For example, now assume that there is a line formed of vertices V1 to V9 as shown in FIG. 2. Among these vertices, vertex V4 is outside the clipping plane. Vertex V4 is therefore clipped, and the intersecting points of segments V3V4 and V4V5 and the right side of the clipping place are to be processed as new vertices V4xe2x80x2 and V4xe2x80x3, respectively. Further in FIG. 2, if vertices V6 and V7 overlap, one of these vertices, V7, for example, is discarded, and only vertex V6 is to be processed. Herein, this will be called xe2x80x9creductionxe2x80x9d. A graphic corresponding to data pieces obtained as a result of the clipping and reduction processings is shown in FIG. 3.
These data pieces are prepared on main memory 56 by a device driver for geometry engine 254 operating on CPU 52, and transferred to geometry engine 254 from main memory 56 by DMA (Direct Memory Access) transfer. A DMA sequence at this time is given in FIG. 5 by way of illustration. As shown in FIG. 5, when DMA is triggered (260), data for the entire line is transferred to geometry engine 254 (262), the DMA is reset upon the end of the transfer (264) and then the next processing is executed.
FIG. 4 gives an example of activities observed on primary PCI bus 58, in geometry engine 254 and on secondary engine 64, respectively, at this time.
Graphic processors including such a geometry engine may be sometimes more targeted for extremely high ability of geometric operation rather than being less costly and sometimes more targeted for less costly construction rather than the ability, depending on their applications. In other words, there is a tradeoff between the cost and performance. Hence, a graphic processor of this kind having flexibility to address such a trade off is preferred.
It is therefore one object of the present invention to provide a scalable graphic processor capable of arbitrarily adjusting its operation performance depending upon applications and a data processing method in such a graphic processor.
Another object of the present invention is to provide a scalable graphic processor having a plurality of geometry engines and capable of adjusting its graphic operation performance based on the number of geometry engines, and a data processing method in such a graphic processor.
Yet another object of the present invention is to provide a scalable graphic processor having a plurality of geometry engines and capable of adjusting its graphic operation performance by allocating graphic operation to the geometry engines, and a data processing method in such a graphic processor.
An additional object of the present invention is to provide a scalable graphic processor having a plurality of geometry engines and capable of adjusting its graphic operation performance by allocating graphic operations to the geometry engines and of correctly combining outputs from the geometry engines, and a data processing method in such a graphic processor.
A graphic processor according to the present invention includes first and second buses and a plurality of geometric operation units having an output connected to the second bus. An input of at least one of the plurality of geometric operation units is connected to the first bus. The graphic processor further includes a circuit to allocate a plurality of ordered data blocks formed of data to be operated to the plurality of geometric operation units, and the plurality of geometric operation units each include an output buffer to store a result of processing by the allocated data blocks, and an arbitration circuit to arbitrate the order of output to the second bus with other geometric operation units and to output data resulting from processing onto the second bus in an order corresponding to an order of the plurality of ordered data blocks of the data to be operated.
Since graphic operation processings can be executed in parallel using the plurality of geometric operation units, the operation performance is improved, and the performance of the processor can be scalably adjusted based on the number of geometric operation units. Outputs from the plurality of geometric operation units are provided in a correct order using the arbitration circuit.
Preferably, the graphic processor further includes a main memory device connected to the first bus, the circuit for allocation includes a direct memory access circuit provided in a geometric operation unit having an input connected to the first bus to transfer a data block provided on the first bus from the main memory to the plurality of geometric operation units based on a destination address included in the provided data block.
By inserting a destination address in a data block, desired data may be transferred to a target geometric operation unit by DMA transfer. Data may be allocated to a plurality of geometric operation units without having to use special hardware specific for allocating the data.
More preferably, the plurality of data blocks each has a toggle bit indicating the end of valid information in the data block.
By setting a toggle bit at the end of a data block, a geometric operation unit may be aware of the end of a data block to be processed by itself and take an appropriate operation.
More preferably, the plurality of geometric operation units each output a result of operation to the output buffer each time a toggle bit set in a data block is encountered and sets a toggle bit indicating the end of data at the end of the data in the output buffer.
By setting a toggle bit at the end of data in the output buffer, output may be temporarily withheld at the time of outputting the data in the output buffer, and the data output order may be adjusted by appropriate arbitration.
More preferably, the arbitration circuit surrenders the access right of the second bus to another geometric operation unit each time a toggle bit in data in the output buffer is encountered.
Since the access right of the second bus is transferred from one geometric operation unit to another geometric operation unit each time a toggle bit in an output buffer is encountered, the access authority may be transferred for each boundary between data blocks and outputs may be provided in an order corresponding to the original order of input data blocks.
According to another aspect of the present invention, a data processing method in a graphic processor includes the steps of dividing data to be operated upon into a plurality of data blocks, allocating the plurality of data blocks to a plurality of geometric operation units through a first bus, processing the allocated data blocks in the plurality of geometric operation units, and arbitrating the order of output among the plurality of geometric operation units thereby outputting data resulting from processing onto a second bus in an order corresponding to the sequence of the plurality of data blocks in the data to be operated.
Since graphic operation processings may be executed in parallel using the plurality of geometric operation units, the operation performance is improved and the performance of the processor can be scalably adjusted based on the number of geometric operation units. A result of processing based on divided processings by the plurality of geometric operation units is obtained on the second bus in a correct order.
According to yet another aspect of the present invention, a geometry engine includes: a geometry operation unit for performing geometric operations; an allocation circuit receiving a plurality of data blocks through a first bus sequentially, for allocating the plurality of data blocks to said geometry operation unit and another operation device, said geometry operation unit processing a data block allocated by said allocation circuit to output a result of processing corresponding to the allocated data block; an output buffer for storing the result of processing; and an arbitration circuit for arbitrating the order of output to a second bus with said another operation device and for outputting the result of processing stored in said output buffer to the second bus in an order corresponding to the sequence of the plurality of data blocks.
The foregoing and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.