The method relates to modular multi-processor systems.
High-performance computer systems are frequently constructed as multi-processor systems. A large number of processors is of particular use for transaction systems such as booking systems and reservation systems in order to achieve a uniform response-time characteristic. In this arrangement, modules which are connected to one another via a common backplane are suitably used. Each module, called central processing unit in the text which follows, contains one or more processors which are coupled via a bus system on the backplane.
The memory accessible to the processors can be connected to the backplane as a separate memory module, on the one hand. However, this leads to the band width of the bus system restricting the efficiency of the overall system.
It has been proposed, therefore, to distribute the memory over the central processing units and, therefore, also to arrange in each central processing unit memory which can be reached both by the processor directly and by the other processors via the bus system. The respective memories are mapped into the address space shared by all processors by means of address conversion. Programs or data located in the local memory can be accessed especially rapidly and efficiently. For this reason, this memory is also placed into the address space as a single continuous block.
Nevertheless, the bus system remains to be a bottleneck in the case of very high-performance computers comprising a number of processors. Each access outside the local memory is handled via the bus system. It is a rare achievement for the major proportion of memory accesses to be accesses of the local memory except in statically configurable process control or transaction systems. Even if half the accesses were accesses of the local memory, for example in a four-processor computer, the band width of the bus system would have to be twice as high as that of one processor. Without increasing the band width of the bus system, changing to eight or sixteen processors is therefore associated with an increase in performance which is greatly less in proportion.
There is also a problem inasmuch as fast bus systems cannot have an arbitrary length. At the currently used frequencies and technologies, the bus length is restricted to about 30 cm due to signal delay times and impedances. Given a width of 3 cm of a central processing unit, which cannot be significantly reduced, this makes it possible to have a maximum of 10 modules, and thus 10 processors. High-performance transaction processing systems, however, require an even larger number of processors. The restriction in the number of modules can be counteracted by each central processing unit module containing a shared memory and a number of processors. However, since each central processing unit is connected to the memory in the other central processing units via the bus of the backplane, the loading of the bus system increases again to such an extent that the bus represents the component which limits the performance.
An alternative to a multi-computer system coupled by a bus system is a multi-computer system, the processors of which are connected in accordance with the IEEE "Standard for Scalable Coherent Interface" (SCI), IEEE STD 1596-1992 (ISBN 1-55937-222-2). This relates to point-to-point connections from one processor to the next. So that each processor can access the memory of each other processor, SCI defines a transfer protocol in which each processor forwards packets which are not intended for it to its neighbor so that a ring structure as described in Section 1.3.4 of the SCI document is needed in which each processor can buffer at least one data packet. This creates delays which, although they allow an arbitrary number of processors to be used, impair their performance. Here, too, a band width limitation is given even if this is higher compared with bus systems. It is therefore proposed, inter alia in Section 1.3.5, to build up a two-dimensional lattice structure in which each processor can only reach the other processors in the same line or column. Other connections must be handled via intermediaries called "agents" in Section 1.4.6. However the SCI protocol necessitates quite a high expenditure.
It is therefore the object of the invention to specify a central processing unit for a multicomputer system which avoids the abovementioned disadvantages.