There is an ever-present need for enhanced performance of computing systems. Increasing computational intensity of higher-level software applications is a driving factor for the need to provide faster and more efficient lower-level computing systems for carrying out the computations. Consider, for example, the computing system shown in FIG. 1. The system illustrates a host computer 110 and a computing subsystem 120. The subsystem 120 may be any of a variety of subsystems that are utilized to assist in the computation or execution of applications and programs that are being executed by the host computer 110.
As a particular illustration, in many situations, a host computer 110 executes a single-threaded application (STA) 112, which consists of a linear sequence of state-sequenced instructions. These instructions are often arranged in a command buffer 114 in linear fashion for communication to a subsystem 120 for processing. Frequently, the communication mechanism for communicating the state-sequenced information from the host computer 110 to the subsystem 120 includes a direct memory access (DMA) transfer. As is known, there are tradeoffs involved in structuring DMA transfers of this nature. In this regard, as larger amounts of data are grouped for the DMA transfer, then longer periods of time pass in which the subsystem 120 may remain idle, while it is awaiting receipt of the data. Conversely, if the data is broken up into many smaller chunks or segments, then more overhead is expended in setting up and taking down the DMA transfers. Strategies and methods for balancing these tradeoffs and implementing such DMA transfers are well known.
In many systems, a bottleneck occurs between the host computer 110 and the subsystem 120 (or 130), where the bandwidth on the communication channel between the host computer and subsystem is smaller than the respective bandwidths or processing capabilities of the host computer and subsystem. In this regard, DMA transfers to a subsystem are often limited by the bandwidth provided by industry standard interfaces, such as PCI (peripheral component interconnect) or AGP (accelerated graphics port).
Accordingly, there is a desire to provide improved systems having enhanced performance to overcome these and other shortcomings of the prior art.