Today's enterprise environments typically involve stacked middleware layers (business logic, application servers, database servers, storage servers, etc.) providing services to a number of business applications. Each middleware layer is a complex distributed system, often partitioned over multiple IT resources for performance and availability. As can be seen in the example shown in FIG. 2, a distributed J2EE application server (e.g., WebSphere) 202 and a database management system (e.g., DB2) 204 can be partitioned over a large pool of servers and shared by two applications 206, 208. In such environments, the resources used to serve a given application are typically a small subset of the overall pool.
The ability to accurately account for the IT infrastructure (servers, storage controllers, etc.) used by each business application enables a variety of important functions, such as:    1. Optimal alignment of the IT infrastructure to the business needs of the enterprise;    2. Ability to accurately predict which business application is expected to be impacted by a server or other IT infrastructure failure;    3. Ability to accurately estimate capacity requirements when planning migration of an application to a new infrastructure (e.g., during a technology refresh).
However, accurate mapping between business applications and the underlying IT infrastructure is hidden by intermediate virtualization and middleware layers, which interpose their services between the high-level (business) and low-level (servers, storage) tiers of the IT architecture.
Existing IT infrastructure discovery systems cannot offer a sufficient solution to the above problem as they typically discover and report only coarse-grain mappings of applications to the IT infrastructure. For example, consider an application A 206 that depends on application and database middleware services 202 and 204, as shown in FIG. 2. While it is possible that existing IT infrastructure discovery systems can narrow down application A's dependency to a specific cluster 210 of application servers, they lack the ability to continue drilling through a stack of subsequent middleware services (e.g., 204), maintaining the context of the specific application, and discovering the specific resources used by that application through all these tiers. As such, they typically assume that application A depends on the total IT infrastructure used to support 204.
Certain experimental research prototypes may provide finer-grain dependency information but are often based on active (e.g., fault injection) techniques and are thus disruptive to the IT environment. No existing infrastructure discovery system currently known to the inventors has the ability to accurately and non-disruptively drill inside each data service X and discover which fraction of X's infrastructure is actually used to support A. The methodology and system disclosed in the present disclosure offers a novel solution to this problem.
Basic infrastructure information about the target IT infrastructure (e.g., installed software and hardware components) is typically represented in the form of a System Configuration model, which is a standard representation compliant with a System Configuration meta-model such as the Common Information Model (CIM) or Service Modeling Language (SML).
In general, a meta-model is a precise definition of the constructs and rules needed for creating semantic models of particular entities. Another way to think about meta-models is as collections of “concepts” (e.g., things, terms, etc.) that make up a vocabulary with which one can talk about a certain domain. It is a similar concept to a “schema” as used in databases or XML, or to the definition of a class in object-oriented languages.
Standard System Configuration meta-models such as CIM or SML are vendor-independent and thus intentionally not very detailed. The Common Modeling Language (CML) is an effort to define interoperable, vendor agreed System Configuration meta-models—however, the CML effort is still in its infancy and its future unclear. System Configuration models are commonly populated by IT infrastructure discovery systems or manually.
Previous research projects have focused on methods for discovering end-to-end relationships in distributed systems, either by statistically analyzing system behavior, based on live activity or traces, or by using system support (e.g., passing tokens or other metadata over communication between layers). In addition, several commercial tools focus on discovery of infrastructure assets by scanning a range of IP addresses and querying the systems that respond. Additional refinement of asset discovery has been achieved through a template-driven discovery of applications. Network communication relationships among applications are discoverable by capturing network packets and analyzing their headers. However, these systems are either unable to discover accurate associations between business applications and the server infrastructure (in an end-to-end manner) or they can manage to do so in a way that is intrusive.
Various systems have investigated building distributed system dependency graphs using passive (e.g., trace collection and offline analysis) or active (e.g., fault injection) methods. Some of the uses of a dependency graph include problem determination, performance analysis, and visualization. Other systems trace the provenance of data to discover origin or data history. However, the provenance concept is evolving and distributed multi-tiered systems are beyond the scope of present provenance prototypes.