1. Field
This invention relates to a method and apparatus for managing network and information technology (IT) resource configurations. In particular, this relates to a method and apparatus for temporally classifying and visually representing network and IT infrastructure with planned or occurred configuration activities to check for policy compliance.
2. Description of the Related Art
Deployment of network and IT resources typically requires: deployment; configuration; ongoing maintenance; and determination as to whether such resources meet operational and regulatory requirements (i.e. policy compliance checks). The requirements are often expressed as policies against known aspects of the resources. Such network and IT resources include but are not limited to: routers; switches; printers; hosts; firewalls; servers; operating systems; software applications and virtual machines.
In such deployments, resource management software applications are used to manage hardware and software assets in a number of inter-related areas including but not limited to: fault management; performance management; configuration management; business service management; and security management.
Fault management typically focuses on managing the operational state of a given resource such that, in the event of a fault, operators can quickly determine the cause, symptoms and activities required to rectify a fault. For instance, a network port failure would typically result in a number of alarms being presented to the operator who then may schedule expedient work to move services from the failed port on to a spare port on the same device.
Performance management typically focuses on managing the historical, current or predicted ability of a resource to perform its role for a number of consumers. Operators typically make use of charts and graphs to view metrics such as network port throughput or computer processing unit (CPU) utilization and frequently combine related metrics into dashboards. Operators typically also wish to generate ‘problem’ or ‘resolution’ alarms in the event of a specific metric exceeding or falling below a particular threshold, such as if CPU utilization reaches a threshold of 90% utilization.
Configuration management focuses on managing the configuration artifacts associated with deployed hardware and software resources in the network or IT environment. Activities typically include, but are not limited to, making bulk configuration changes to large numbers of devices, such as: changing a network password; making granular changes such as part of targeted service provisioning activities; deploying software patches; and rolling back to a previously known good configuration. Configuration management operators frequently exploit fault and performance data in conjunction with resource configuration data to understand how a resource is currently configured, the potential impact of configuration changes in the environment and policy compliance of a specific configuration. Related to configuration management is policy management with respect to whether the network or IT environment adheres to a previously defined set of requirements expressed as policies, such as to ensure that encrypted passwords are used or that routing protocol configuration meets best practices.
Business service management focuses on managing a set of hardware and software network and IT resources with a view to understanding whether a specific business service is deployed and operating as expected. This discipline typically does not require the low level detail required in the previous three disciplines as it typically provides a higher-level view of the service landscape than that of, say, fault management. For instance, business service managers typically ask questions such as ‘Are my services operating effectively?’ and ‘Are my customers getting the quality of service they are paying for?’.
Security management focuses on managing security-related aspects of resources in the managed environment and is closely related to the other disciplines, especially configuration and fault management. This discipline typically covers areas such as ensuring that the appropriate user accounts and role/group memberships have been configured but also heavily comes into fault management given that security is an artifact of configuration. For example, a configuration compliance policy relating to an Access Control List (ACL) on a network router is a security-centric policy.
To manage these inter-related areas, the following common provisions are needed in any resource management software used:    1. Provision for discovery and construction of a model of the resources and relationships deployed in the environment to be managed.    2. Provision of a mechanism for collecting event and alarm information, either solicited or unsolicited, and associating such events and alarms with the constructed model.    3. Provision of a mechanism for interacting with resources in the managed environment, such as to retrieve or apply configuration artifacts on-demand    4. Provision of a mechanism for visualizing and reporting on a variety of known characteristics about the managed environment including topological, but not limited to, displays, service models, alarm lists, charts and graphs and textual reports.    5. Provision of a mechanism for creating and managing ‘trouble tickets’ corresponding to artifacts from each management discipline. For instance, in the event of a network port failure, the operator would typically raise a trouble ticket to track progress with the problem in conjunction with the customer using the port. Similarly, if planned maintenance work is required, the operator typically raises a ticket to schedule and track the planned maintenance ahead-of-time whilst notifying users dependent on the resource to be maintained.    6. Provision of a mechanism for determining whether known characteristics about the managed environment are deemed to be operationally and regulatory policy compliant or non-compliant based on some set of defined characteristics. For instance, ‘Are all of my machines running the correct operating system version?’, ‘Are all of my sessions adhering to configuration best practices?’, ‘Am I seeing any of my devices being configured by operators without the appropriate level of permissions?’.
The areas described above typically work together to provide effective control over the managed environment. However, the size and complexity of today's modern network and IT infrastructure and number of human operators required poses challenges for operators with respect to gaining concise, accurate and timely information about recent or planned configuration activities, changes and policy compliance. This is particularly important in network and IT management as understanding whether recent configuration-related activities adversely affect a business service in conjunction with other management disciplines (such as fault, security and performance) can significantly expedite the resolution of faults in the environment. Similarly, understanding the resources and relationships that can be affected by planned configuration activities and when is vitally important in effectively managing the network or IT resources.
Typically, operators exploit static reports and alarm lists to understand a variety of configuration-related characteristics in the managed environment, for instance:    1. Tabular reports can provide insight into the configuration activities within a specified timeframe, either historical or planned, when wishing to understand what resources have been/will be changed and by whom.    2. ‘Real time’ dashboards can provide an at-a-glance view of the currently configured state with respect to policy compliance across the managed environment, typically via a single score and graph of that score over-time.    3. Alarm/event lists can support the above characteristics by associating events/alarms with either current state or historical state, for instance, a configuration change may have resulted in a specific network port being administratively shut-down, in which case, an alarm will typically be raised that can be associated with the configuration change and associated trouble-ticket.
US patent publication 2010/0080129 ‘Network troubleshooting using path topology’ looks at some of the problems described. It discloses, in a networking environment, a method for categorizing resources, analyzing for time-related data, monitoring and comparing time-related data with a time checkpoint. The system includes a network node manager and health report generator. The network node manager generates and displays a path topology. The health report generator is coupled to the network node manager and receives identities of each of the network elements, determines summary information for each of the network elements, and displays the summary performance information.
However, the above approach has a markedly different focus given that it is predominantly fault oriented, exploits the inherently ordered nature of a path through a network topology and does not consider recent or planned configuration activities or policy compliance/violation with respect to historical or planned temporal classification, that is, ‘time windows’ based on an observable configuration-related property of a network or IT resource or relationship, such as a network model property or event received by, for instance, the management or trouble ticketing systems relating to a configuration activity or policy compliance/violation for a given resource, relationship or set thereof. Resources are related to one another, and whilst prior art configuration management systems provide a means to view configuration artifact activity over-time, they do not look for configuration or policy compliance/violation artifact characteristics in historical and future contexts.