1. Field of the Invention
The present invention relates generally to data processing systems, and more particularly, to the automatic discovery of relationships between components of distributed networks, systems, and applications.
2. Description of Background Art
A wide variety of computer-implemented services are available to consumers, manufacturers and others. For example, an investor can buy and sell stocks or other financial instruments over the web; travelers can check in for flights at airport kiosks; products can be configured for shipment; and the like.
Often, a user of a computer-implemented service or a computer program itself wishes to perform a task and cannot, because a necessary component is unavailable. For example, the needed component might be already in use and locked, corrupted, or missing altogether. Alternatively, necessary components might be available, but the overall performance of the service is poor. For example, a stock purchasing service might be functional but slow, so that trades guaranteed by the service provider to process in 5 seconds fail to be processed within 5 seconds, thereby violating an agreement between the stock trading service provider and their customer.
When service-affecting problems occur, software engineers typically seek to resolve them quickly, and where possible to detect them before they affect users of the service. One conventional method for resolving such problems involves proactively testing the end-to-end availability and performance of the IT system delivering the service. A robot can be used to programmatically test the service while monitoring service performance and availability. For example, products such as Keynote Transaction Perspective by Keynote Systems, Inc., Gomez Performance Network, by Gomez, Inc., and Mercury Business Process Monitor, by Mercury Interactive Corporation test services to determine when service failures and degradation occur.
Another method for resolving such problems involves monitoring use of the service to detect service-affecting problems. Solutions such as Timestock CTQ monitor actual users of the service and detect when service-affecting problems exist for these users. Both of these approaches, testing a service, and monitoring a service, along with other approaches, provide awareness of service-affecting performance and availability problems, among other service-affecting problems.
When a performance, availability, or other service-affecting problem is detected, an alert can be sent to an operator. An operator may then manually interrogate the individual components of the service to ascertain where a fault exists. Because services may be composed of a large number of components it may be difficult for the operator to identify those components upon which the service depends.
Services may depend on the performance and availability of many components, such as network routers and switches and the software executing on them; server hardware and the software executing on them, such as web server software, application server software, and database software; and mainframe computers and software executing on them, among other components. To determine dependencies between service components, conventional methods include application of periodic discovery and mapping techniques. These techniques create a map of dependency relationships periodically, generally on a scheduled basis. Micromuse Netcool for Business Service Management, by Micromuse Inc., and Mercury Application Mapping, by Mercury Interactive Corporation, are examples of products that attempt to map these service dependencies.
With the advent of technology such as web services, where relationships between components tend to be ephemeral, mappings can quickly become inaccurate, making it even harder to determine where the cause of a service-affecting problem lies. In addition, services often depend upon components that are shared amongst multiple services. External effects from other services sharing a component may have an effect on the service being tested, monitored, and/or mapped. For example, a service depending upon a certain amount of available bandwidth on a network circuit may be adversely (and temporarily) affected by another service sharing the use of that network circuit. Such interdependencies may not be reflected in a service dependency map, and may be temporary and fleeting.
While this approach may eventually resolve the problem, it is manual, time-consuming, and in many cases relies on prior knowledge that has become inaccurate due to frequent changes in the IT infrastructure environment.
Accordingly, there is a need for a system and method for automatically discovering relationships between components involved in providing a service and for discovering the relationship of shared components to other services at or near the time of a service-affecting problem.