1. Field of the Invention
This invention relates in general to computer buses, and more particularly to a method, apparatus and program storage device for managing dataflow through a processing system.
2. Description of Related Art
A conventional computer system typically includes one or more central processing units (CPUs) and one or more memory subsystems. Computer systems also include peripheral devices for inputting and outputting data. Some common peripheral devices include, for example, monitors, keyboards, printers, modems, hard disk drives, floppy disk drives, and network controllers.
One of the important factors in the performance of a computer system is the speed at which the CPU operates. Generally, the faster the CPU operates, the faster the computer system can complete a designated task. One method of increasing the speed of a computer is using multiple CPUs, commonly known as multiprocessing. However, the addition of a faster CPU or additional CPUs can result in different increases in performance among different computer systems. Although it is the CPU that executes the algorithms required for performing a designated task, in many cases it is the peripherals that are responsible for providing data to the CPU and storing or outputting the processed data from the CPU. When a CPU attempts to read or write to a peripheral, the CPU often “sets aside” the algorithm that is currently executing and diverts to executing the read/write transaction (also referred to as an input/output transaction or an I/O transaction) for the peripheral. As can be appreciated by those skilled in the art, the length of time that the CPU is diverted is typically dependent on the efficiency of the I/O transaction.
Although a faster CPU may accelerate the execution of an algorithm, a slow or inefficient I/O transaction process associated therewith can create a bottleneck in the overall performance of the computer system. As the CPU becomes faster, the amount of time executing algorithms becomes less of a limiting factor compared to the time expended in performing an I/O transaction. Accordingly, the improvement in the performance of the computer system that could theoretically result from the use of a faster CPU or the addition of additional CPUs may become substantially curtailed by the bottleneck created by the I/O transactions. Moreover, it can be readily appreciated that any performance degradation due to such I/O bottlenecks in a single computer system may have a stifling affect on the overall performance of a computer network in which the computer system is disposed.
As CPUs have increased in speed, the logic controlling I/O transactions has evolved to accommodate these transactions. Thus, most I/O transactions within a computer system are now largely controlled by application specific integrated circuits (ASIC). These ASICs contain specific logic to perform defined functions. For example, Peripheral Component Interconnect (PCI) logic is instilled within buses and bridges, which govern I/O transactions between peripheral devices and the CPU.
Peripheral component interconnect (PCI) provides for communicating between a host computer, systems memory and various devices or adapters, such as devices on the bus, plug-in cards, or integrated adapters. A PCI bus system typically interconnects a large number of electronic devices. The system must maintain, manage and communicate bidirectional data from one device to another device or several devices at once. Each device may output different voltage levels while maintaining capability to read data on the bus. One reason for the difficulty of continuously increasing bus speeds to match the continuously increasing processor speeds is that input/output buffers coupled to the busses must often operate across a wide variety of operating conditions. For instance, the performance of an input/output buffer changes with respect to conditions such as process, voltage and temperature.
Today, PCI logic has evolved into the Peripheral Component Interconnect Extended (PCI-X) to form the architectural backbone of the computer system. PCI-X logic has features that improve upon the efficiency of communication between peripheral devices and the CPU. PCI-X 2.0 is a new, higher speed version of the conventional PCI standard, which supports signaling speeds up to 533 megatransfers per second (MTS). Revision 1.0 of the PCI-X specification defined PCI-X 66 and PCI-X 133 devices that transferred data up to 133 MTS, or over 1 Gbyte per second for a 64-bit device. The PCI-X 2.0 revision adds two new speed grades: PCI-X 266 and PCI-X 533, offering up to 4.3 gigabytes per second of bandwidth, 32 times faster than the first generation of PCI.
PCI-X 2.0 is built upon the same architecture, protocols, signals, and connector as traditional PCI. The reuse of many of the design elements from the conventional PCI and PCI-X1.0b standards eases design and implementation migration. Migration to PCI-X 266 and PCI-X 533 is further simplified by retaining hardware and software compatibility with previous generations of PCI and PCI-X. As a result, new designs can immediately connect with hundreds of PCI and PCI-X products that are currently available. The combination of backwards compatibility and ease of migration provides investment protection for customers, developers, and manufacturers of existing PCI and PCI-X technologies as they migrate to PCI-X 266 and PCI-X 533.
PCI-X 2.0 also includes new features that will enhance applications in the future. It defines a new 16-bit interface width specifically designed for those applications that are constrained by space, such as embedded RAID controllers, or portable applications. PCI-X 2.0 also expands the device configuration space for each device-function to 4 Kbytes, and defines a new Device ID Message transaction to enable simplified peer-to-peer transactions for applications such as streaming-media.
PCI-X capable devices may include a scheduler to implement transaction ordering rules to determine which transaction in a queue will be handled next. To maximize data throughput on the PCI-X bus it has been found that write operations are best. Further, to maximize throughput with writes, small writes may be gathered together in a temporary buffer so that they can be burst on the PCIX bus as one large write. The speed and availability of the PCIX bus with respect to the CPU bus as well as the increase of stored data due to gathering may cause the buffer to fill. This may cause stalls at the processor (CPU) or cause data overwrites. To prevent the full buffer condition, a processor may gather writes in the buffer too slowly or not optimize the bursting of such writes.
It can be seen that there is a need for a method, apparatus and program storage device for managing dataflow through a processing system.