1. Field of the Invention
This invention relates generally to the field of pipelined computation and, more particularly, to a protocol and corresponding hardware mechanism for guaranteeing a proper ordering of data processed and/or transmitted through separate paths in a hardware device.
2. Description of the Related Art
A host computer may send data (e.g. graphics data) to a hardware device (e.g. a graphics accelerator). The hardware device may include a system of processing units. The processing units may be organized into paths or pipelines. Some of the paths may diverge. Thus, the data items received by the hardware device may get sent down different paths (e.g. some data items going down one path, and other data items going down another path and so on). At some point, the paths may merge (i.e. rejoin). Because the different paths may have different latencies, the stream of data items may have a different order after the merge point than prior to diverging. This disturbance of ordering may have adverse effects on system performance (e.g. the visual quality of the video output generated by a hardware accelerator).
One possible mechanism for preserving the order of data items after the merge point (relative to the order they had prior to the point of divergence of the paths) may be referred to as “waiting for idle”. The host software may (a) write an attribute appropriate for a first data item targeted for a first path, (b) write the first data item, and (c) poll a “busy” bit. When the hardware device is not busy (i.e. idle), the host computer may change attributes, i.e. may write a second attribute value appropriate for a second data item targeted for a second path.
Each processing element in a path may generate a status bit which indicates whether it is busy or idle. It may require complex logic to ensure that a path always reports busy whenever any data is in transit anywhere in the path. This complex logic may cause timing problems that limit performance. The status bit of each processing unit in a path may have a logical OR connection with a centralized busy reporting register. These connections to the centralized register take up chip space (or board space) which could have been used for other purposes. Thus, the “waiting for idle” method is inefficient. Therefore, there exists a need for a more efficient mechanism of preserving the order of data items after a data merge point in a hardware device (relative to the order the data items had prior to a point of divergence).