The present invention relates to computer systems, and more particularly to a method and system for providing a hardware sort which is efficient and applicable to computer to, graphics system.
Many computer systems must sort items based on the value of a key in order to achieve certain functions. Many such computer systems conventionally employ a software sort. For example, computer graphics systems may utilize a software sort in order to render an image. In current computer graphics systems, images of three-dimensional objects can be depicted on a two-dimensional display. The display typically includes a number of pixels arranged in a grid. To render an image, the image is typically broken into polygons. Each polygon may cover one or more pixels in the display. In order to give the illusion of depth, computer graphics systems use each polygon""s xe2x80x9cz value,xe2x80x9d the distance of each polygon to the viewing plane. In particular, the polygons are ordered based on each polygon""s z value. Thus, the key for such a sort is the z value. Once the polygons are sorted according to their z values, the computer graphics system can correctly blend the colors of translucent polygons and opaque polygons that can be seen through the translucent polygons to achieve the proper color to be displayed for each pixel.
In a conventional computer graphics system, the software sort occurs when a display list is generated through an application. The display list orders portions of three-dimensional objects, i.e. polygons, based on a key, typically the z value. The display list typically orders translucent polygons from back to front. Thus, the display list sorts translucent polygons. Although they may appear on the display list, opaque polygons are typically sorted using a conventional Z buffer.
Placing the polygons in the order prescribed by the display list allows the computer system to properly depict the images of the three-dimensional objects on the display. Hardware in the computer graphics system utilizes the display list, a frame buffer, and a z buffer to render the three-dimensional objects in the order dictated by the display list. The frame buffer and z buffer describe a portion of the three-dimensional object that is to be rendered. The frame buffer includes data such as color and alpha values for the polygon, while the z buffer includes the corresponding z values. The conventional computer graphics system provides the polygons described in the frame and z buffers to the display screen in the order prescribed by the display list. Thus, the display list generated by software is used to render the three-dimensional objects.
Although conventional computer graphics systems are capable of depicting three-dimensional objects, the software sort that generates the display list can be relatively slow. If the software sort is optimized, the sort time can be reduced to a limited extent. However, development time for the software sort is significantly increased. Moreover, changes to the display list and the software sort creating the display list may be difficult to implement. Finally, since the hardware requires a display list in order to properly render the objects, the computer system is limited to using those applications which provide a sorted display list. Without the display list and the attendant software sort, the computer system may not be able to properly depict three-dimensional objects.
A method and system for performing a hardware sort is described in co-pending U.S. patent application Ser. No. 09/062,872 entitled xe2x80x9cMethod and System for Providing a Hardware Sort in a Graphics Systemxe2x80x9d filed on Apr. 20, 1998 and assigned to the assignee of the present application. Applicants hereby incorporate by reference the above-mentioned co-pending patent application. The hardware sort described in the above-mentioned co-pending application can be used to sort polygons for rendering on a display.
FIG. 1 is a block diagram of one embodiment of a hardware sorter 10 described in the above-mentioned co-pending application. The hardware sorter 10 is used by a computer graphics system which preferably renders a graphical image pixel-by-pixel. However, the system 10 can be used in another computer system for other purposes or in a computer graphics system which does not render an image pixel-by-pixel. The hardware sorter 10 sorts based on a particular key associated with a particular item. The key value is the z-value for a fragment. The fragment for a particular polygon includes data for the portion of the polygon that intersects a particular pixel. Such a polygon is termed an intersecting polygon for the particular pixel. More than one intersecting polygon can intersect a particular pixel. Although the hardware sorter 10 is described as sorting based on a z value, nothing prevents the hardware sorter 10 from sorting based on another key or accepting other types of data. Thus, the hardware sorter 10 is applicable to other systems requiring a sort, such as a router in a network.
The hardware sorter 10 includes a plurality of sort cells 11. Note that although only four sort cells 11 are depicted, nothing prevents the hardware sorter 10 from having another number of sort cells 11. In an embodiment disclosed in the above-mentioned co-pending application, the number of sort cells 11 is at least equal to the number of items to be sorted. Thus, in one embodiment, the number of sort cells 11 is the same as the number of processors which are used to process the fragments for intersecting polygons of a particular pixel in parallel. As disclosed in the above-mentioned co-pending application, the number of sort cells is typically sixteen. However, nothing prevents the use of another number of sort cells 11.
The hardware sorter 10 further includes a new input line 12 for providing a new fragment in parallel to each of the sort cells 11 via new input 12. Each sort cell 11 also includes an output 13. The output 13 of a sort cell 11 is coupled to an input of a next sort cell 11. The output 13 of the last sort cell 11 is not coupled to another sort cell 11. Instead, the output 13 of the last sort cell 11 provides the output of the hardware sorter 10.
The hardware sorter 10 generally functions as follows. Each sort cell 11 may have a fragment which corresponds to it (xe2x80x9ccorresponding fragmentxe2x80x9d). Each corresponding fragment includes a corresponding z value, which is used to sort the fragment, and corresponding data, such as color and alpha values for the corresponding fragment. A new fragment, including the new z value, is broadcast to each of the plurality of sort cells 11. Generally, if the new fragment is the first fragment for a pixel, the first fragment is also placed in the first sort cell 11. Where the new fragment is a first fragment for a pixel when the hardware sorter 10 is empty, the first fragment is placed in the first sort cell 11. This may be accomplished by indicating that data in other sort cells 11 is invalid.
The new z value for the new fragment is compared to the corresponding z value in each sort cell 11. Preferably, this function is accomplished using a comparator (not shown). Based on this comparison, each sort cell 11 retains the corresponding fragment, accepts the new fragment, or accepts the fragment corresponding to a previous sort cell 11. If the corresponding fragment is to be retained, then the sort cell 11 keeps the corresponding fragment. If the corresponding fragment is not to be retained, then it is determined whether the sort cell 11 is to take the fragment corresponding to a previous sort cell 11. If the sort cell 11 is to accept this fragment, the sort cell 11 takes the fragment corresponding to the previous cell and passes its corresponding fragment to be accepted by the next sort cell 11. If the corresponding fragment from the previous sort cell 11 is not to be taken by the sort cell 11, the sort cell 11 takes the new fragment and passes its corresponding fragment to be accepted by the next cell. As a result, the new fragment is inserted into the hardware sorter 10 in the appropriate sort cell 11. This process continues to sort all of the fragments provided to the hardware sorter.
For example, the corresponding fragment may be retained by a sort cell 11 if the corresponding z value for the corresponding fragment is greater than the new z value for the new fragment. The sort cell 11 which accepts the new fragment passes its corresponding fragment to the next sort cell 11. Sort cell(s) 11 which are higher (before) the sort cell 11 accepting the new fragment remain unchanged. The next sort cell 11 receives the corresponding fragment from the previous cell and passes its own corresponding fragment to a next cell in the sorter 10. This occurs even if the z value for the corresponding fragment in the next sort cell 11 is less than the new z value. As a result, the fragments are ordered from lowest to highest z-value by the hardware sorter 10.
Although the hardware sorter 10 functions, one of ordinary skill in the art will realize that the hardware sorter 10 is efficient for sorting only a relatively small number of items. When the number of items grows large, for example beyond approximately 64 items, the hardware sorter 10 becomes costly. In particular, each sort cell is of moderate cost. Using only a few sort cells, for example between eight and sixteen cells, yields an acceptable cost to the system. However, sorting more items multiplies the cost in a linear fashion. Thus, the cost becomes unacceptable cost for larger numbers of items, for example 1024 items. Consequently, for larger numbers of items to be sorted, a different sorting mechanism is desired.
One conventional mechanism for allowing the hardware sorter 10 to sort a larger number of items is to use a number of hardware sorters 10 in parallel, then to sort and combine the results of the hardware sorters 10. For example, suppose that a maximum of two hundred and fifty-six items is to be sorted and that each hardware sorter 10 can sort sixteen items. In such a case, sixteen hardware sorters 10 could be operated in parallel. The output of each hardware sorter 10 would be sorted. However, the outputs of one hardware sorter 10 would not be sorted with respect to the output of another hardware sorter 10. Thus, the outputs of the hardware sorters 10 would then be sorted to provide all of the up to two hundred and fifty-six items in order.
Although operating a number of hardware sorters 10 in parallel could provide a hardware sort of a larger number of items, one of ordinary skill in the art will readily realize that such a system would be inefficient. Although each hardware sorter 10 is efficient, multiple hardware sorters 10 would require a large number of gates and consume a relatively large amount of space on a chip. Consequently, implementing a number of hardware sorters 10 in parallel may be an inefficient use of silicon.
Another conventional mechanism for allowing the hardware sorter 10 to sort a larger number of items would be to provide a conventional tree of first-in-first-out buffers (xe2x80x9cFIFOsxe2x80x9d). FIG. 2 depicts a block diagram of such a conventional tree 18 coupled to a hardware sorter 10. For example, suppose a maximum of two hundred and fifty six items were to be sorted. Also suppose that the hardware sorter 10 can efficiently sort sixteen items. The first stage 20 in the conventional tree 18 would include sixteen FIFOs 22, with each FIFO 22 capable of holding sixteen items. Each time the hardware sorter 10 completed sorting sixteen of the possible two hundred and fifty six items, the hardware sorter 10 would provide the sixteen items to one of the FIFOs 22 in the first stage 20. Thus, the contents of each of the FIFOs 22 in the first stage 20 would be sorted. In the second stage 30 of the conventional tree 18, eight FIFOs 32, each capable of holding thirty-two items, would be provided. Each FIFO 32 in the second stage 30 would receive inputs from two FIFOs 22 in the first stage 20. The contents of the two FIFOs 22 in the first stage would be sorted prior to being combined in the FIFO 32 in the second stage 30. Thus, the contents of each of the FIFOs 32 in the second stage 30 would also be sorted. The third stage 40 of the conventional tree 18 would contain four FIFOs 42, each capable of holding sixty-four items. The fourth stage 50 of the conventional tree 18 would contain two FIFOs 52, each capable of holding one hundred and twenty-eight items. The final stage 60 of the conventional tree 18 would hold one FIFO 62 capable of holding two hundred and fifty-six items. Between each stage 30, 40, 50 and 60, the contents of two FIFOs of the previous stage 20, 30, 40 and 50 would be sorted and combined. Thus, the contents of each FIFO in each stage are sorted. Before the fifth stage 60, the contents of the two FIFOs 52 in the fourth stage 50 are sorted and combined in order in the FIFO 62 of the fifth stage 60. Consequently, the last FIFO 62 of the conventional tree 18 holds all two hundred and fifty-six items in the desired order.
Although the conventional tree 18 can provide a hardware sort of a higher number of items, one of ordinary skill in the art will readily realize that the conventional tree. 18 is not an efficient implementation of a sort. In particular, a FIFO which contains a smaller number of items, such as a FIFO 22, is not efficient. Each FIFO has a set overhead that is relatively independent of the size of the FIFO. This overhead takes up a larger portion of the FIFO for smaller FIFOs. The number of items that the FIFOs in the conventional tree 18 can hold grows geometrically from the initial number of items to be sorted, which could be as low as one. Consequently, the FIFOs in the early stages of the conventional tree 18 are relatively inefficient.
Accordingly, what is needed is a more efficient system and method for sorting items which does not require a sort performed by software. It would also be beneficial if the system and method could be implemented in a computer graphics system for providing a two dimensional image of three-dimensional objects. The present invention addresses such a need.
The present invention provides a method and system for sorting a number of items in a computer system. The sort is based on a plurality of values of a key. Each item has a value of the plurality of values. The method and system comprise providing a plurality of stages, providing at least one switch coupled between the plurality of stages, and providing a final switch coupled to a last stage of the plurality of stages. Each of the plurality of stages has a pair of first-in-first-out buffers (FIFOs). The pair of FIFOs in a stage of the plurality of stages stores twice as many of the number of items as the pair of FIFOs in a previous stage of the plurality of stages. Each of the at least one switch is for merging and sorting a first portion of the number of items from the pair of FIFOs in the previous stage based on the key and providing the first portion of the plurality of items to a first FIFO of the pair of FIFOS of the stage in order. Each of the at least one switch is also for merging and sorting a second portion of the number of items from the pair of FIFOs in the previous stage based on the key and providing the second portion of the number plurality of items to a second FIFO of the pair of FIFOs of the stage in order. The final switch is for merging and sorting a third portion of the number of items to provide the number of items in order.
According to the system and method disclosed herein, the present invention provides a more efficient hardware sort.