This invention relates generally to a digital system comprising a plurality of processors and other devices, and in particular to a bus apparatus for interconnecting a plurality of said processors forming a cluster and for interconnecting a plurality of clusters.
A shared bus is one of the most common multiprocessor interconnection schemes and still remains an attractive way of interconnecting small numbers of microprocessors. The simplest form of a shared bus is embodied in standards such as the IEEE-796 standard which is an asynchronous non-multiplexed bus. A shared bus typically employs a fixed position-dependent priority scheme and has limited bandwidth because the bus is always allocated for one complete processor memory cycle. If the processors have no local memory, then the bus will saturate with only two or three active processors. Performance has been greatly improved if code is kept in local memory, but system bandwidth is still limited by the processors memory cycle rather than by bus or memory bandwidth.
Microprocessor bus designs in the prior art simply allocated the required busses for the total period required to complete the processor memory cycle. A processor in one cluster reading a global memory in a second cluster occupied its own cluster bus and the system bus from the time they were first granted to transfer the request until the global memory was read and the word transferred back to the requesting processor. Only a small portion of this total time was utilized for transferring the request and the response over the busses. This approach resulted in high bus utilizations relative to the actual transfer rates realized.
Digital systems based on shared busses have been improved by several methods. In a general purpose system where processors are deemed equal, splitting a memory read cycle into halfs, that is, a send/request address packet and a receive/response data packet, allows full utilization of the bus with a minor increase in logic at the processor and memory interfaces. Such an approach incorporated with rotating priority bus arbitration maximizes the usefulness of a shared bus, but limits the bus to supporting only tens of processors. The only way to make such a system extensible almost without limit is to allow the busses to be interconnected. However, circuit switching for interconnecting busses has significant deadlock potential.
A synchronous parallel bus is certainly not new in a digital system. However, often the bus is multiplexed whereby it can take up to several adjacent bus cycles to complete a read or write operation.