This invention relates to computer systems, and especially to computer systems that are employed as servers.
The systems may for instance be employed as servers for example in local area networks (LANs) or in wide area networks (WANs), telecommunications systems or other operations such as database management or as internet servers. Such servers may be used in so-called xe2x80x9chorizontally scaledxe2x80x9d applications in which tens or hundreds of corresponding servers are employed as part of a distributed system.
A typical computer employed for such purposes will comprise two or more processors mounted on a motherboard, together with power supply units (PSUs), and other components such as hard disc drives (HDDs), fans, digital video disc (DVD) players, memory modules, ethernet ports etc. One or more of the processors, the host processor(s), provides the main functions of the server, and may communicate with a number of peripheral components, including communication ports, optionally via peripheral component interconnect (PCI) bridges in order to provide server operation. One of those peripheral components, called the xe2x80x9cSouth Bridgexe2x80x9d further allows the host processors to communicate with internal devices via serial interfaces one of which transports the console interface of the processors.
In addition to the host processor(s), the system may include another processor, called the service processor or the remote management controller (RMC), which provides management functions for the system assembly. Such functions may include environmental monitoring, temperature monitoring of the enclosure, fan speed control, data logging and the like. The service processor may communicate with the host processor or with one of the host processors, and may also have one or more external communication ports so that a user or network administrator can communicate with the service processor, or can communicate with the host processor(s) via the service processor. For example, the service processor may have its own ethernet network port for direct communication to the network administrator.
Fan speed is controlled by the service processor in order to minimise the amount of vibration and noise in the neighbourhood of the equipment, and, more importantly, in order to increase the life of the fans. With proper fan speed control, it is possible to extend the life of the fans by an order of magnitude or more, so that the fan lifetime is generally equivalent to that of the computer system. This is advantageous in the case of those systems in which it may not be possible to change the fans without shutting the system down, since any change of fans will be associated with downtime of the system.
However, intelligent devices such as service processors are prone to function failures, and in order to reduce the amount of downtime of the system, it is desirable to enable the system to continue to function in the event of a failure of the service processor for whatever reason. Such a failure may be a hardware failure of the service processor or of any lines that are controlled by it, or may be a software failure, for example due to interference from other equipment, errors in memory modules or errors in packets that are received from the network, any of which may corrupt programs in the service processor.
The service processor may be provided with an internal xe2x80x9cwatchdogxe2x80x9d which requires initialising at periodic intervals in order to check that the service processor is functioning correctly, and, if it is not initialised, will reset the service processor. However, such an arrangement may not, on its own, protect the system from some forms of malfunction. For example, with some forms of malfunction, the service processor may fail to send any commands to peripheral components but will still re-initialise the internal watchdog when required. With other forms of malfunction, a failure of the management bus or other line may prevent commands or data that are sent by the service processor being received by the relevant peripheral device. In such cases, for example, the system might continue to operate without fan speeds being adjusted to take into account temperature changes in the system enclosure.
According to one aspect of the present invention, there is provided a computer system which comprises:
(i) a service processor for providing system management functions for the system;
(ii) at least one peripheral component that communicates with, and/or is controlled by, the service processor via a communication or control line; and
(iii)a timer that is separate from the service processor and which will set the peripheral component into a different state unless the timer is initialised by the service processor at a predetermined rate;
wherein the service processor sends initialisation signals to the timer along the communication or control line.
The system according to this aspect of the invention has the advantage that the control signals that are sent to initialise the timer must be sent from the service processor outputs along the relevant internal lines of the system, and so any failure of either the service processor or of the internal lines will cause the timer to expire and set the peripheral component to its different state.
A separate timer may be employed, or a timer may be integrated into one or more of the peripheral components.
The different state of the peripheral component will depend on the type and purpose of the particular component. Normally, the state will be one in which the peripheral component is quiesced or will operate independently of the service processor. For example, in one aspect of the invention, the peripheral component is a fan controller which receives fan speed signals from the service processor and provides a driving signal for the fans. In this case, if the timer expires, the fan controller may be arranged to increase the fan speed to a constant value, for example maximum speed, to ensure that the system enclosure is adequately cooled in the absence of intelligent fan control from the service processor.
In another aspect of the invention, for example where the computer system forms a server in a network, the peripheral component may be a physical interface for a system management ethernet port that enables the system to communicate with a network administrator. If the service processor malfunctions, interference from other components in the system, in particular from any other processors that may be operating, may be picked up by lines connecting the service processor to the physical interface. This interference would then be treated as a signal by the physical interface, and coded and sent along the external lines to the network. In this case, the timer will quiesce the physical interface so that no such signals can be sent.
In yet another aspect, the peripheral component may form part of a serial port that can communicate with both the host processor and with the service processor. In one such system, the port is connected to the host processor and the service processor by means of a multiplexer, so that signals from the serial port are routed to the host processor via the service processor. If the service processor malfunctions, the timer may cause the multiplexer to route the signals directly between the serial port and the host processor, thereby bypassing the service processor.
According to another aspect of the invention, there is provided a method of operating a computer system which comprises:
(i) sending signals from a service processor to a peripheral component and to a timer along an internal communication or control line at a predetermined minimum rate;
(ii) initialising the timer on receipt of the signals by the timer; and
(iii)setting the peripheral component into a different state if the timer has not been initialised within a predetermined time period.