The bus structure in a multiprocessor computer system has a significant impact on the overall performance, the functionality and the granularity of the system. One of the most important key characteristics for an efficient bus is its bandwidth which describes the ability of the bus to match with the processors speed and a variable number of processors. It depends on the bus width, the efficient usage of the bus by the bus protocol and the bus cycle time.
The bus width is limited by physical constraints of the package as there are chip pads, module pins or wireability of cards and boards. In prior art bus systems therefore either one large or a set of smaller busses is constructed.
In another prior art approaches a smart bus protocol allows bus interleaving between processors to fill the time gap on the bus caused by the access time of the addressed memory bank. "Bus interleaving" is a known method to use wait cycles on a bus to issue a command to other memory banks. It allows several memory banks per bus to be operated simultaneously in order to increase the bus- and memory bandwidth.
The bus cycle time is the main contributor to the bus bandwidth. An efficient bus structure has to provide excellent electrical characteristics to match the processor and the bus cycle time as close as possible. For example, a simple structure which connects processors like a laundry line limits the cycle time and the number of processors by its poor electrical characteristics. In contrast, a much more complex structure provides point to point busses between each processor and a central switch unit. One or several memory banks are connected to that unit. That structure allows a short bus cycle, but requires an extensive wiring. For the packaging, a central switch unit also requires a very high pin count, since all busses are routed to the switch.
In addition, in prior art systems the performance of bus structures is improved by concepts for bus arbitration which may either be implemented as a distributed or a central arbitration function. The central arbitration concept is based on an additional hardware component, a controller, like a central bus switch. It receives all bus requests, performs the arbitration and delivers the grant to the requesting unit in the next cycle.
The distributed approach requires more wiring, because all request lines have to be wired to all bus participants. But its advantage is that the arbitration can be fulfilled within one cycle, since only one off-chip network is in the path, compared with the a central arbitration. The complete path consists of an off-chip network, i.e. the request lines, and the arbitration logic. If this limits the bus cycle time, then a two cycle central bus arbitration will become the preferred solution.
A further key characteristic for an efficient bus structure is the provisions which provides for data consistency assurance in a multiple cache processor structure. A known simple concept is the so called bus snooping. Hereby each processor monitors the bus operations of all other processors to keep track on the status of its cache lines. The status may be "modified", "exclusive", "shared" or "invalid". The required actions are a line invalidation or a cast-out of modified data with or without updating the memory. A cast-out of modified data means that a PU is no longer the exclusive owner of the line. All those operations are initiated and controlled by the individual processors.
But this simple concept has its limits, if there is more than one bus in the structure, as in a system with a central switch. Hereby data consistency has to be controlled by the central switch. It keeps copies of all cache directories of the processors, as well as the status of each line. The central switch monitors the status of the cache lines, initiates cast-outs from processors, if required, and issues the necessary memory commands and the data transfer. Advantageous are the short cycle time and the high bus/memory bandwidth. But a disadvantage of this approach is that it requires a design effort comparable to a complete processor, including a cache. Therefore, this costly concept can be only implemented in high end mainframe designs.
Such a central arbitration concept is disclosed in IBM Technical Disclosure Bulletin, vol. 35, no. 6, November 1992 (FIG. 2) employing a central crossbar switch. Herein, memory requests are transferred via a processor bus, and data via the crossbar switch to a memory unit. This dividing up concept allows a central memory control for an interleaving access to a number of memory banks.
Another approach for central arbitration is given in U.S. Pat. No. 5,355,455 to the present assignee which concerns with the problem of deadlocks in computer systems with several busses being connected by a bus adapter. The bus adapter divides the common bus system into two partial busses wherein the first part contains a processing unit and a memory unit, and the second part contains several I/O control units (FIG. 1 of that document). The underlying problem of that invention is concerned with computer systems having two data busses, where a bus unit of the first bus wants to communicate with a bus unit of the second bus, while a bus unit of the second bus wants to communicate with another bus unit of the first bus, normally leading to a deadlock situation. This problem originates in a bottleneck situation on grounds of the only one bus adapter and is particularly solved by introducing a BUS SUSPEND control signal.
An example for a distributed arbitration approach is disclosed in U.S. Pat. No. 5,093,826 assigned to Siemens Aktiengesellschaft, Germany, which is related to a redundant bus system which is split into two arbitrary halves which are connected to corresponding halves of a redundant main memory. Hereby identical information are stored in the two different memories. According to the teaching of that patent, in a special operating time, individual processors, one bus system half, and some of the memory sections are split off and interconnected to form an independent special-purpose computer.
European Patent Application No. 0 557 651 concerns with a multiprocessor system which allows an efficient interleaving on a memory bus. This concept allows multiple access of bus participants to the memory since the memory is logically divided into at least two memory banks.
Further, in German Patent GE-PS 3 708 887, a parallel bus structure is disclosed, whereby the bus is divided into sections by means of registers being inserted between those sections. In each bus section at a given time data are transferred independently of each other, or data are pipelined. Due to subdividing the data bus in individual bus sections with a physical length of typically centimeter, the signal transmission time is strongly reduced. Therefore higher cycle times and transfer rates are achievable.