Computer buses are typically used to connect a central processing unit (CPU) with other significant resources, such as memory or I/O devices. The CPU typically acts as a master and the other devices typically serve as targets. Master devices are in charge of significant state change decisions in the system and may make arbitrary and unsolicited bus requests, while targets typically respond when something is specifically requested of them. To obtain more system performance, multiple masters may be provided on a common bus. Some masters may not be full CPU's, e.g. a Direct Memory Access (DMA) controller, which typically must be set up by a CPU before it can perform a bus access.
Bus transactions or cycles follow a certain protocol. The most frequent bus cycles are typically for memory reads and memory writes. A typical protocol, for a multiplexed address/data bus, may specify that memory read operations are to proceed generally as follows: (1) a master requests permission from an arbiter to use the bus, and waits for such permission to be granted; (2) the master sends out an address and a bus command that specify a memory read operation; (3) the master gets confirmation that the addressed target can reply in time, or waits; (4) the target puts the requested data on the bus, and the master latches this data and terminates the bus cycle; and (5) the target parks the bus in a known state.
A memory write operation according to a typical protocol may proceed as follows: (1) the master requests permission from an arbiter to use the bus, and waits until such permission is granted; (2) the master sends out an address and a bus command that specify a memory write operation; (3) the master gets confirmation that the addressed target can reply in time, or waits; (4) the master puts the write data on the bus, and the target latches this data and terminates the bus cycle; and (5) the master parks the bus in a known state.
In the above protocols, step 1 can typically be performed while another master is using the bus. Steps 2 through 5 tie up the bus with 4 or more clock cycles (if a wait occurs). The bus utilization rate is typically 50% or less, since only 2 of these 4 clock cycles are used to perform useful work. When memories are embedded on a chip and system clock rates are high, five or more clock cycles will typically be needed to perform a (pipelined) memory access, and particularly a memory read, resulting in even lower bus utilization.
In some cases, bus utilization can be improved by permitting burst accesses in which several consecutive reads or writes are performed without relinquishing control of the bus. However, burst accesses can cause undesirable delays when several CPU's share a memory, particularly if one CPU hogs the memory.
Another way to improve bus performance, particularly for memory reads, involves the use of a so-called “split-transaction” bus. Once permission is granted for a master to use this bus, the address is transferred to a target in one operation, and the data is returned to the master in another operation, where the bus is free for use in the intervening time interval. This improves the utilization of the bus but adds considerable complexity because (a) the target ordinarily must become a master for returning data, and must know where to return this data, and c) the arbiter services more devices. In some implementations, multiple transactions must be simultaneously monitored in hardware.
Unlike read operations, memory write operations ordinarily are not split. To improve write performance, a bus is commonly separated into separate address and data busses, each with its own arbitration. However, this approach can lead to a large number of signal lines and significant chip area when dozens of processors are to be replicated on the chip.
Parallel processor systems are commonly divided into hierarchical sub-systems or clusters so that bus traffic interference can be limited. Processors often communicate with each other through shared registers or memory. Such communications typically occur over a system bus with features as mentioned above. In many cases, a DMA device is programmed to handle such transfers. Bridging devices may be used to interconnect one bus level with another. This may involve translating one bus protocol to another as well as multiple bus arbitration. General purpose bus arbitration that supports a variety of types of bus traffic and software scenarios is typically used. The arbitration logic is typically not integrally tied to a shared device, although it may be intelligent enough to limit the bus access of a processor.
Many of the above-mentioned difficulties and complexities can be ameliorated by the present invention which, in a preferred embodiment, offers high bus utilization for clusters of processors via an integrated arbitration and shared memory access mechanism with uniformly split (de-coupled) read and write protocols.