Computer arrangements of this type are known, inter alia, under the term server farm. Server farms usually comprise a large number of identical servers which are also referred to below as computers and on which the same or different services or applications run. Since all servers execute the services and applications independently of one another, faults in one server do not have a direct influence on the rest of the server farm. User inquiries are distributed to all members of the server farm in accordance with defined rules. This is the task of the monitoring computer. Mechanisms which have been implemented for distributing the load ensure that use of the individual servers corresponds to the respective processing capacity.
The failure of one server is relatively unproblematic since only a few services and applications are affected thereby. Since the individual computers are relatively small and thus inexpensive devices, it is unproblematic, from the point of view of costs, to keep one or more standby computers ready, to which, after a computer has failed, the software units affected, i.e. services and applications, for example, are transferred in order to thus restore normal operation.
However, server farms often comprise several hundred computers. In the case of these so-called blade servers, there is no need for the external wiring complexity since the computers are accommodated and connected in the form of a plug-in card. However, the problem with this design is that, in the event of a power supply unit, for example, failing, a plurality of computers are affected and thus a plurality of computers simultaneously fail. For economic reasons, it is not possible to provide, for every case, as many standby computers as are actually required on account of the failure. Satisfactory operation of the computer arrangement is thus not ensured in every case.
When a software unit fails on account of the failure of a computer, it is known practice to transfer the software unit to a standby computer, that is to say to restart it there. If a plurality of computers fail, a plurality of software units are therefore affected. An attempt is then made, for each software unit, to find a standby computer or a sufficiently large amount of free capacity in a standby computer in order to be able to restart the software unit. This results in competitive situations, thus jeopardizing fault-free operation.