The present invention relates to a multi node server system capable of structuring a scaleup type server by intimately coupling a plurality of scale-out type server modules (also called nodes) and more particularly, to a multi node SMP (Symmetrical Multi Processor) structure.
The means for expanding the operation throughput in a conventional server system can be classified into two major types called “scale-out” and “scaleup”. A scale-out system as represented by a braid server system signifies an expanding architecture adapted to improve the total throughput by distributing transactions to a plurality of server systems and is effective to deal with a large number of transactions which are less correlated to each other. A scaleup system as represented by a SMP system signifies an expanding architecture adapted to improve the throughput of a unity of server system by increasing the speed and number of processors and the capacity of memories as well and is effective to deal with a highly loaded, single process. Since the braid server system and SMP server system have different features, it is general to select suitable one of them in accordance with applications and line of business in structuring a system. In effect, in an Internet data center (IDC), the braid server system suitable for scale-out is selectively used as a WEB server which executes a large number of relatively light loaded transactions such as WEB front end transactions and the SMP server system suitable for scaleup is selectively used as a data server which executes a transaction representative of, for example, a large-scale DB requiring a large number of memories. Seemingly, such a selective use as above meets, so to speak, putting the right man in the right place and is very efficient but because of purposive placement of dedicated or exclusive server systems, the management becomes sophisticated and the aforementioned selective use can hardly be said to be highly efficient from the standpoint of running costs. As known measures to cope with rapid changes in system requirements in the bewilderingly changing business environment, an expedient of increasing the number of hardware will first be enumerated. For example, in the case of the braid server system of scale-out type, this can be accomplished by increasing the number of braid server modules and in the case of the SMP server system of scaleup type, with a view to attaining this purpose, hardware resources such as processors and memories are increased in number or they are reinforced to high-performance ones, giving however rise to one cause of preventing reduction in the TCO (total cost of management).
In order to make a multi node SMP structure in a server system comprised of a plurality of nodes, it is necessary that data be transferred in a unit of block sized to a cache line by transmitting a memory address between nodes and maintaining the cache coherency. A processor of each node has a cache memory having the custody of a data block used frequently. The general cache size is 32, 64 or 128 bytes and is called a cache line. In the absence of necessary data in the cache (cache miss), the processor asks another processor for the necessary data. If a modified copy of a requested block is neither in any processor nor in an input/output controller, the block is taken out of a memory. For the sake of obtaining permission to change a block, a processor which has taken out the block from the memory must behave as a possessor of the block. When the processor having obtained the permission to change becomes a new possessor, all other devices make invalid copies they hold and the former possessor transfers to the new possessor the data the processor in possession of permission to change has requested. Following transfer of the data the processor in possession of permission to change has requested from the former possessor to the new possessor, when a different processor wants to share a read only copy of the data the processor in possession has requested, the data is offered from the device in possession thereof (not the memory). As the processor in possession needs a space area of the cache for the purpose of writing new data, it writes the cache block in the memory and the memory again becomes a possessor. A process for finding out the latest copy of a cache block is called “cache coherency”. By principally using two methods of broadcast coherency and directory coherency, a system designer maintains consistency of the memory as viewed from the individual processors.
In the broadcast coherency, all addresses are transmitted to all nodes. Each device examines in what condition a requested cache line is placed in a local cache (executes snoop). In the system, since the total snoop result is determined several cycles after each device has examined in what condition the requested cache line is placed in the local cache, the delay can be suppressed to a minimum in the broadcast coherency.
In the directory coherency, responsive to an access request from a processor, an address is transmitted to only a node (home node) having the custody of an address of a special cache block. By using a directory in the memory, a special RAM or a controller, the hardware manages which one of cache blocks which one of nodes shares or possesses. The “directory” is embedded in the memory and therefore, in principle, the controller must access the memory at the time of each access request to check directory information, with the result that the protocol becomes sophisticated and consequently the delay is prolonged and changes largely.
With the aim of realizing a multi node SMP structure, a crossbar switch is generally used in controlling the cache coherency among many nodes. But, a transaction must pass through the crossbar switch and as compared to absence of the crossbar switch, one device is additionally inserted in a path through which the transaction must pass, leading to a problem that the latency is degraded. Gathering from a round path of a requesting system transaction and a responding system transaction, the latency differs to a great extent for the case of the use of crossbar switch and the case of nonuse thereof.
At present, a multi node SMP structure without any crossbar switch is available but the SMP structure on the directory base of directory coherency type is general and as the delay in coherency prolongs, a degradation in system performance is caused correspondingly.
In addition, as a method of directly interconnecting nodes on a back plane, an example is described in US2004/0022022A1. This reference discloses the method for directly interconnection between nodes but fails to give any clear description of what form the cache coherency is maintained in and how to process transactions.