Modern computer systems often comprise many components interacting with one another in a highly complex fashion. For example, a server installation may include multiple processors, configured either within their own individual (uniprocessor) machines, or combined into one or more multiprocessor machines. These systems operate in conjunction with associated memory and disk drives for storage, video terminals and keyboards for input/output, plus interface facilities for data communications over one or more networks. The skilled person will appreciate that many additional components may also be present.
The ongoing maintenance of such complex systems can be an extremely demanding task. Typically various hardware and software components need to be upgraded and/or replaced, and general system administration tasks must also be performed, for example to accommodate new uses or users of the system. There is also a need to be able to detect and diagnose faulty behaviour, which may arise from either software or hardware problems.
One known mechanism for simplifying the system management burden is to provide a single point of control from which the majority of control tasks can be performed. This is usually provided with a video monitor and/or printer, to which diagnostic and other information can be directed, and also a keyboard or other input device to allow the operator to enter desired commands into the system.
It will be appreciated that such a centralised approach generally provides a simpler management task than a situation where the operator has to individually interact with all the different processors or machines in the installation. In particular, the operator typically only needs to monitor diagnostic information at one output in order to confirm whether or not the overall system is operating properly, rather than having to individually check the status of each particular component.
However, although having a single control terminal makes it easier from the perspective of a system manager, the same is not necessarily true from the perspective of a system designer. Thus the diagnostic or error information must be passed from the location where it is generated, presumably close to the source of the error, out to the single service terminal.
One known mechanism for collating diagnostic and other related system information is through the use of a service bus. This bus is terminated at one end by a service processor, which can be used to perform control and maintenance tasks for the installation. Downstream of the service processor, the service bus connects to all the different parts of the installation from which diagnostics and other information have to be collected.
(As a rough analogy, one can consider the service processor as the brain, and the service bus as the nervous system permeating out to all parts of the body to monitor and report back on local conditions. However, the analogy should not be pushed too far, since the service bus is limited in functionality to diagnostic purposes; it does not form part of the mainstream processing apparatus of the installation).
In designing the architecture of the service bus, there are various trade-offs that have to be made. Some of these are standard with communications devices, such as the (normally conflicting) requirements for speed, simplicity, scalability, high bandwidth or information capacity, and cheapness. However, there is also a specialised design consideration for the service bus, in that it is particularly likely to be utilised when there is some malfunction in the system. Accordingly, it is important for the service bus to be as reliable and robust as possible, which in turn suggests a generally low-level implementation.
One particular problem is that a single fault in a complex system will frequently lead to a sort of avalanche effect, with multiple errors being experienced throughout the system. There is a danger that in trying to report these errors, the service bus may be swamped or overloaded, hindering rapid and effective diagnosis of the fault.