Modern business information systems are typically structured as multi-tiered distributed systems comprising Web services, application services, databases, enterprise information systems, file systems, and other storage systems. In such environments, data is stored at multiple tiers, each tier associated with a different level of data abstraction. All data entities that map to an information entity owned and used by an application are logically associated, across tiers, and related to the application. Discovery of such relationships in a distributed system is a challenging problem that requires understanding how data is used and transformed. For example, discovering which logical storage volume(s) a business application uses and thus depends on requires first discovering, at a higher level, which data sources the application is using and how these data sources may map to databases; consequently, it requires discovering how database tables transform to file system files and/or logical storage volumes, and so on.
Discovery of such relationships is complicated by at least two trends in system design today: first, the widespread adoption of virtualization technologies enforces a separation between distributed system tiers. In addition, the traditional tendency to view the “server domain” independently from the “storage domain”, from a systems management perspective, is another factor contributing to this information gap.
Manual discovery of application-data associations is a difficult and error-prone task. A known technique discovers application-data relationships using online system monitoring and training heuristics for applications and data residing in a single computer system. However, this prior art technique has several drawbacks including: (a) being based purely on heuristic rules, it cannot eliminate the possibility of overlooking some application-data relationships (“false negatives”); (b) it does not relate applications running on one computer with applications and/or data on another computer.
Another prior art technique builds distributed system dependency graphs using active (e.g., fault injection) or passive (e.g., trace collection and offline analysis) methods. The dependency graphs show how applications on one computer system communicate with applications on another computer system. Antivirus programs, access control systems, disaster recovery management systems, and information lifecycle management systems are other potential consumers of application-data association information. Accordingly, what is desired is an improved system and method for automatic discovery of application-data relationships spanning multiple-tiers.