1. Field of the Invention
The present invention relates to a computer system in which small multiprocessors operating with shared memories are employed to constitute a large-scale multiprocessor system operating with shared memory units. The invention also relates to a method of initializing such system.
2. Description of Related Art
As the Internet becomes popular, data centers on the Internet (hereinafter xe2x80x9ciDCsxe2x80x9d) or the like have been built such that a great number of servers are installed to provide iDC services. An iDC receives requests for processing from users over the Internet and returns the result of the processing to the users. In the iDC, a multiprocessor system operating with shared memories or a cluster system is generally used to carry out rapid processing. Such a system achieves allocating a plurality of processors to process one request and executing any relevant transactions so as to shorten the response time.
Typical multiprocessor systems with shared memories are high-end servers such as Sun Enterprise 10000 and HP Superdome (which will be referred to as xe2x80x9cfirst prior artxe2x80x9d hereinafter). These systems are configured in a form in which CPUs, memory units, and I/O subsystems are interconnected by a boardband network called a switch. To supply power to these components, redundant power supply units are generally used. The above systems are initialized by the following mechanism: when one of the systems boots, its power supply units are turned on to automatically begin power supply to the whole system and perform its initialization. In the first prior art, all CPUs share the memory units and I/O subsystems such that a plurality of processors can easily execute a transaction requested by a user in a parallel-processing mode.
For the high-end servers having a great number of CPUs, generally, it is not practiced to operate the CPUs within the system under one OS. Instead, the system is divided into a plurality of sectional computer subsystems and each sectional computer subsystem runs its own OS, so that the system can also be operated as a cluster system which will be described later.
On the other hand, the cluster system (which will be referred to as xe2x80x9csecond prior artxe2x80x9d hereinafter) is configured in a form in which computers, each of which operates as a server under an independent OS, separately powered with its own power supply unit, are interconnected by a network. Therefore, the memory units and I/O subsystems in the system are not shared across the processors constituting the cluster system. By the support of middleware or application, the cluster system allows a plurality of processors to execute a transaction requested by a user so as to shorten the response time.
In the foregoing first prior art, it is theoretically possible to combine sectional computer subsystems, each having any number of CPUs, and the resulting number of the sectional computer subsystems may be very large. However, the same number of power supply units as the number of the sectional computer subsystems cannot be prepared. Consequently, it is impossible to set up sectional computer subsystems beyond the number of the power supply units. The power supply units are redundantly provided unless they are shared across the sectional computer subsystems. As such, it becomes impossible to control power On/Off per each sectional computer subsystem. This poses a problem that even if a sectional computer subsystem is out of service with its OS shut down, its power cannot be turned off while another sectional computer subsystem is operating that shares the power supply with it (which will be referred to as xe2x80x9cproblem 1xe2x80x9d hereinafter).
In the foregoing second prior art, because each server has its power supply unit, the above problem 1 does not arise. However, when booting (initializing) and shutting down the system, it is necessary to turn the power on and shut down the power supply at all computers in the system which burdens the power on/off management significantly. To eliminate such a burden to the power on/off management, the following method is taken. A server for managing the cluster system is set up and middleware for power on/off management is installed on the server, wherein this management is implemented via a network. However, this incurs additional middleware and cost (which will be referred to as xe2x80x9cproblem 2xe2x80x9d hereinafter).
A further problem with the second prior art (which will be referred to as xe2x80x9cproblem 3xe2x80x9d hereinafter) is as follows. Because the memory units and I/O subsystems are not shared across the processors, it is necessary to provide the application run on each processor with a parallel processing function or use the middleware for such a function. This implementation of parallel processing is slow and inefficient.
It is an object of the present invention to provide a computer system wherein, the computer system is divided into a plurality of sectional computer subsystems, it is possible to turn the power on for each sectional computer subsystem separately.
Another object of the invention is to reduce the management cost required for booting (initializing) and shutting down a multiprocessor system operating with shared memory units.
According to a typical implementation manner of the present invention, a computer system comprises a plurality of elemental servers and a router for interconnecting these elemental servers, each server comprising one or more CPUs, a memory, a BIOS, an I/O subsystem, and a power supply unit so as to function as a single server unit. In this computer system, each elemental server is provided with an initialization procedure in which the following are implemented. Each elemental server, at the instant of turning its power switch on, issues power-on inquiry packets to other elemental servers in order to organize and boot a multiprocessor system operating with shared memory units by checking whether the power has been turned on at each other elemental server. If the reply packets to the inquiry packets have been returned from all elemental servers of the multiprocessor system operating with shared memory units, each elemental server issues a packet for gaining a representative right in the system. The representative server that gained the representative right issues packets for checking available memory space and the I/O volumes on other elemental servers. Based on the available memory space and I/O volumes on the individual elemental servers included in the reply packets from the other elemental servers, the representative server-determines a schedule of interleaving the memory and I/O components. The representative server notifies the other elemental servers of the thus determined an interleave schedule.
The above initialization procedure has two phases. In the first phase, all elemental servers each issues the power-on inquiry packets and sends replies to the inquiry packets. In the second phase, a single elemental server that gained the representative right issues the packets for checking available memory space and I/O volumes, determines an interleave schedule, and notifies other servers of the interleave schedule. An arbitration means for determining the representative right is provided on the router of the invention. Specifically, having ascertained the power-on of all elemental servers, each elemental server issues a packet for gaining a representative right to the router. The arbitration means grants the representative right to the server which sent the first arrived packet for obtaining a representative right.
The above initialization method according to the typical implementation manner is pre-programmed, assuming that operators turn on the power switches on all elemental servers of the multiprocessor system operating with shared memory units. In other words, the above procedure is an automatic initialization procedure following the power-on of the individual elemental servers. This procedure can be modified that, when power is turned on at only some of the elemental servers of the multiprocessor system operating with shared memory units, at least one server on which the power has been turned on requests the remaining elemental servers to turn their power on. Specifically, at least one elemental server, at the instant of turning its power on, issues power-on request packets instead of the power-on inquiry packets to other elemental server. An elemental server that received the power-on request packet turns its power on, then generates and sends a reply packet to the source server that sent the power-on request packet. In this manner, burdens of powering on/off management for individual servers are alleviated significantly.
Because each of the elemental servers has its power supply unit, any number of elemental servers can be set to run in a sectional computer subsystem of the multiprocessor system, and turning power on/off can be performed separately per sectional computer subsystem.
Moreover, by setting a schedule of interleaving the memory units and I/O subsystems across a plurality of elemental servers, the plurality of elemental servers can be operated as a multiprocessor system with shared memory units in a multiplex manner. Because of sharing the memory units and the I/O subsystems, parallel processing can be implemented in a relatively easily.