1. Technical Field
The present invention relates generally to an improved data processing system and, in particular, to a method and system for improving data throughput within a data processing system. Specifically, the present invention relates to a method and system for improving performance of input/output processing and bus access regulation.
2. Description of Related Art
Traditionally, symmetric multiprocessors are designed around a common system bus on which all processors and other devices such as memory and I/O are connected by merely making physical contacts to the wires carrying bus signals. This common bus is the pathway for transferring commands and data between devices and also for achieving coherence among the system""s cache and memory. A single-common-bus design remains a popular choice for multiprocessor connectivity because of the simplicity of system organization.
This organization also simplifies the task of achieving coherence among the system""s caches. A command issued by a device gets broadcast to all other system devices simultaneously and in the same clock cycle that the command is placed on the bus. A bus enforces a fixed ordering on all commands placed on it. This order is agreed upon by all devices in the system since they all observe the same commands. The devices can also agree, without special effort, on the final effect of a sequence of commands. This is a major advantage for a single-bus-based multiprocessor.
A single-common-bus design, however, limits the size of the system unless one opts for lower system performance. The limits of technology typically allow only a few devices to be connected on the bus without compromising the speed at which the bus switches and, therefore, the speed at which the system runs. If more master devices, such as processors and I/O agents, are placed on the bus, the bus must switch at slower speeds, which lowers its available bandwidth. Lower bandwidth may increase queuing delays, which result in lowering the utilization of processors and lowering the system performance.
Another serious shortcoming in a single-bus system is the availability of a single data path for transfer of data. This further aggravates queuing delays and contributes to lowering of system performance. Although a single-system-bus design is the current design choice of preference for implementing coherence protocol, it cannot be employed for a large-way SMP with many processors.
Once a decision is made to design a large-way, distributed multiprocessor system with multiple buses, there are several design challenges for ensuring efficient data transfers. The number of connections to centralized control units can become substantial. Pin count on the physical components becomes a significant limitation, especially in a system that supports a large address space with large data transfers. Hence, it is generally desirable to limit the number of signals so as to limit the number of physically separate pins. In addition, an effort should be made to increase the efficiency of bus arbitration and data transfers so as to decrease the number of dead cycles on the bus.
Therefore, it would be advantageous to have a large-way SMP design using bus-based cache-coherence protocols with efficient bus utilization and data transfers.
A distributed system structure for a large-way, symmetric multiprocessor system using a bus-based cache-coherence protocol is provided. The distributed system structure contains an address switch, multiple memory subsystems, and multiple master devices, either processors, I/O agents, or coherent memory adapters, organized into a set of nodes supported by a node controller. The node controller receives transactions from a master device, communicates with a master device as another master device or as a slave device, and queues transactions received from a master device. Since the achievement of coherency is distributed in time and space, the node controller helps to maintain cache coherency. In order to reduce the delays in giving address bus grants, a bus arbiter for a bus connected to a processor and a particular port of the node controller parks the address bus towards the processor. A history of address bus grants is kept to determine whether any of the previous address bus grants could be used to satisfy an address bus request associated with a data bus request. If one of them qualifies, the data bus grant is given immediately, speeding up the data bus grant process by anywhere from one to many cycles depending on the requests for the address bus from the higher priority node controller.