The complexity of computer systems and networks of computer systems has increased inexorably, so that such systems are now typically characterized by the presence and interaction of large numbers of system entities in providing a variety of system services. This in turn has placed a major strain on system management resources required to maintain continuous availability of system services, for example in connection with detection of faults and identification and correction of their causes.
One contribution to the resolution of this problem has been disclosed in WO 94/09 427, which describes a system management method and apparatus in which a respective declarative model is provided for each system service. This model specifies, independently of any particular task to be performed in relation to that service, the requirements or goals needing to be met for that service to be available, in terms of the entities required and their inter-relationships. A respective task program is provided for each task, such as installation, monitoring and fault diagnosis, for controlling performance of that task in a manner independent of any particular model, in terms of general inferencing operations that can be performed on any such model. Tasks are performed in relation to a service (e.g. fault-finding in respect of an inoperative print spooling service) by effecting inferencing operations on a declarative model relating to that service, under the control of the task program for that task.
In one implementation of the invention disclosed in WO 94/09 427 information on the system is made available through reference to a fact base which stores facts about the system and which can be updated through interaction with the system to provide desired information, either directly through queries to elicit specific items of information, or indirectly by inferencing from these items of information. An inferencing engine checks whether a requirement or goal associated with a service is being met by the system by performing inferencing operations on the relevant service model and by referring to the fact base, and, in the event that insufficient facts are present in the fact base, by causing interaction with the system to elicit further facts.
In the method and apparatus as described in WO 94/09 427 the operation of the inferencing engine is triggered by a requirement to perform a management task, typically at the request of a user to modify the system's services or to identify the cause of a service failure.
It is one object of this invention to provide a method and apparatus which enables management tasks to be initiated automatically, for example in response to detection of events indicating a possible change in the system status.
There is a practical limit to the size of system (i.e. overall number of system entities such as terminals, workstations and peripherals) which a single apparatus of the kind previously described can effectively manage. This limit, of the order of a few hundred entities, is imposed by various constraints, including memory available to store the fact base, processing capacity to support inferencing engine operations in respect of many different entities in an acceptably short time, consumption of network communications bandwidth and range of different management policies needed for different entities (e.g. in different locations or workgroups). It is in principle possible to partition a system into multiple sub-systems, each containing fewer than the maximum manageable number of entities and each having a respective system management apparatus. However, the management of each sub-system is then independent of all the others; furthermore, in practice it is not always simple, or even feasible, to assign a given entity exclusively to one sub-system or another, as it may provide services to multiple entities which themselves are in other respects better considered as being in different sub-systems.
It is another object of this invention to provide a method and apparatus which exploits automatic initiation of management tasks to facilitate the management of large networks containing, for example, several thousands of connected devices.