As it is known in the art, multi-processor computer systems are designed to accommodate a number of central processing units, coupled via a common system bus or switch to a memory and a number of external Input/Output devices. The purpose of providing multiple central processing units is to increase the performance of operations by sharing tasks between the processors. Such an arrangement allows the computer to simultaneously support a number of different applications while supporting I/O devices that are communicating over a network and displaying images on attached display devices.
To enhance performance, all of the devices coupled to the bus must communicate efficiently. Idle cycles on the system bus represent time periods in which an application is not being supported, and therefore represent reduced performance.
A number of situations arise in multi-processor computer system design in which the bus, although not idle, is not being used efficiently by the processors coupled to the bus. Some of these situations arise due to the differing nature of the devices that are coupled to the bus. For example, central processing units typically include cache logic for temporary storage of data, from the memory. A coherency protocol is implemented to ensure that each central processor unit only retrieves the most up to date version of data from the cache. Therefore, central processing units are commonly referred to as `cacheable` devices.
However, external Input/Output (I/O) devices are non-cacheable devices. They typically do not implement the same cache coherency protocol that is used by the CPUs, although measures must also be taken to ensure that they only retrieve valid data for their operations. Typically I/O devices retrieve data from memory, or a cacheable device, via a Direct Memory Access (DMA) operation, in which data is retrieved in a large block. Typically I/O devices also store data to memory via DMA; when the block of data to be stored is less than a cache block the bridge in the coherent domain reads the block and modifies portions of the data, then writes it back to memory via a DMA as a large block. One mechanism used to ensure coherency is to place a `lock` on the data block that is used by the I/O device. When a lock is placed on a data block, other cacheable devices in the system do not have access to that data block for the duration of the lock period. If the I/O device is only updating a portion of the block, then restricting the other cacheable devices from using that block results in unnecessary delay that reduces performance. Thus it would be desirable to provide a method for allowing communication between CPUs and I/O devices at increased performance levels.
Similarly, situations may arise in which one I/O device seeks to communicate with other I/O devices coupled to the system. For example, a graphics device or a network device may require data that is stored on a disk. If that device is coupled to the same I/O bus as the original device, then the transfer may be performed by straightforward transfer between the devices over the I/O bus.
However, typically in large multi-processor systems, there may be more than one I/O bus coupled to the system to accommodate more I/O devices. When an I/O device wants to communicate with an I/O device on another bus it must be accomplished via a system bus transfer. Typically, in such a situation, the I/O device issues a DMA transaction to the system, which stores the data in system memory temporarily. Then one of the CPUs issues an I/O write to transfer the contents of the system memory to the I/O device on the second I/O bus. Such an arrangement utilizes system bus bandwidth and CPU compute cycles in an undesirable manner.
A further performance problem arises as a result of system interrupts. Interrupts are a mechanism that are used by the system for indicating to the CPU that an event has occurred that requires attention or repair. Typically, interrupts are used for indicating to the CPU that a transaction has completed, that a service has been requested or, on rare occasion, for a hard or soft error at the I/O device. In addition, interrupts can be used to mark an occurrence of an event, such as the end of a time interval. When the interrupt event occurs, an interrupt signal is forwarded to the CPU. At the end of an instruction sequence, if the interrupt signal is asserted the CPU will halt execution of further instructions and service the interrupt.
Usually there are a number of interrupt event conditions, and each of the conditions is saved as one bit of an interrupt vector that is stored in an interrupt register. The occurrence of an interrupt event causes a signal to be asserted, and the signal assertion is logged in the appropriate location of the interrupt register. The interrupt signal is monitored by the CPU to determine which interrupts have occurred and their priority relative to the active process executing on the CPU.
If the interrupt is associated with the CPU, the interrupt register is readily available for examination and determination of the proper interrupt handling process. However, if the interrupt is associated with an I/O device the interrupt register is stored at the I/O device. The I/O device issues an interrupt signal to the I/O interface, which stores an interrupt status bit for each device. The CPU must periodically examine the interrupt status register of the I/O interfaces to determine which device had an interrupt. The CPU then fetches the interrupt vector from the indicated I/O device and handles the interrupt. This process for determining interrupt conditions suffers performance disadvantages because valuable compute cycles are wasted while the CPU fetches the interrupt vector.
Accordingly, it can be seen that there are a number of situations that may arise during the operation of a multi-processor computer system that decrease the efficiency of system bus. Therefore it would be desirable to determine a method or apparatus that would provide increased multi-processor performance through improved utilization of system bus bandwidth.