Today's enterprise networks and modern data centers have heterogeneous applications (e.g., e-commerce, content delivery) and services (e.g., DNS, active directory, email, and authentication), which are interleaved with each other in a complicated manner. Specifically, each service may support multiple applications and a particular application may rely on many different services. Moreover, each application itself usually has multiple components, some of which may be shared with multiple applications. Understanding and identifying components and their inter-dependencies of each application is critical for a wide spectrum of system management tasks, such as anomaly detection and failure diagnosis, system upgrading and patching, and application isolation and migration. For example, a multi-tier application can include different applications, such as an auction application and an e-commerce application, which have common components, such as an application server.
Existing approaches to this problem employ a variety of techniques ranging from active server instrumentation to light-weight middleware deployment, and to non-intrusive network traffic monitoring. Application dependencies are inferred by performing correlation analysis on the obtained network and/or system traces. These approaches rely on pairwise flow analysis, which may exhibit certain limitations in practice. Specifically, multi-hop (i.e., more than three) dependencies, which are common in large-scale transaction systems, can hardly be inferred from pairwise dependency information. This is especially true in the presence of overlapping applications where a single component is shared by multiple applications, and where the accuracy of pairwise analysis decreases, as in the case of multi-hop overlapping applications. Moreover, flow pair correlation is conducted on a per-host basis within a given time interval. Choosing a proper length of this interval is critical in determining the performance and accuracy, but is also prone to false positives.
Accordingly, an automated application dependency discovery system and method is needed for daily system management and operation of enterprise networks and data centers which are experiencing large-scale growth of applications and complicated interactions between service components.