1. Technical Field of the Invention
The present invention relates to complex information technology systems (IT) and, in particular, to a system and method for retrieving, storing and propagating data indicative of system component states.
2. Background and Objects of the Invention
With the increasing use of computers, particularly personal computers (PCS), both in business and in the home, computers have become an integral tool of most information technology (IT) workers in a wide variety of fields. Another technological development facilitating the pervasiveness of computers in society is the growing interlinkage of computers, either locally in a Local Area Network (LAN) and/or remotely, such as through dedicated lines or via the World-Wide Web or Internet.
As networks of computers and peripheral devices grow in size and the interlinkages multiply, however, inter-component incompatibilities arise, manifesting themselves in a variety of ways, both obvious and subtle. To analyze the behavior of the various interacting resources within an IT system, large amounts of monitoring data are needed to duplicate or model system state conditions. Armed with sufficient monitoring data, a system administrator can model particular system state behaviors to analyze component failures and the reasons therefor, as described in assignee""s co-pending patent applications entitled xe2x80x9cSystem and Method for Generating Performance Models of Complex Information Technology Systemsxe2x80x9d, U.S. Ser. No. 09/036,393U.S. Pat. No. 6,311,175, filed Mar. 6, 1998, and xe2x80x9cSystem and Method for Model Mining Complex Information Technology Systemsxe2x80x9d, U.S. Ser. No. 09/036,394pending filed Mar. 6, 1998, both of which are and incorporated herein by reference.
As set forth in more detail in the aforedescribed co-pending patent applications, various system characteristics may be examined and performance data generated therefrom. For example, in a large network of computers and peripherals, some key characteristics include network bandwidth, processor speed, memory, database query speed, etc. Since large amounts of such data measurements are necessary to adequately model system (or subsystem) performance, one problem with existing monitoring or measuring techniques is that the data collection mechanism itself consumed significant resources, interfering with system operation and, therefore, skewing the data measurements.
It is, accordingly, an object of the present invention that the acts of data measurement consume minor amounts of resources and, therefore, influence the system monitoring as little as possible.
It is also an object of the present invention to provide an adaptable monitoring and data collection system and method so that new components may be more easily integrated (and old ones removed) into the existing system.
It is another object of the present invention to provide a robust monitoring and data collection system and method which continues to operate despite failures in various subsystems.
It is a further object of the present invention that the system and method collect time-related system state data in a computer network from a variety of different types of system components.
It is a still further object of the present invention that the system and method store the same aforementioned monitoring data in a variety of storage facilities, preferably storing the data in a consistent record structure.
It is another object of the present invention that the system and method record and store computer network component time-related state data in a flexible manner, optimizing the collection and storage of such data.
It is yet another object of the present invention that the system and method record and store the aforementioned computer network component time-related state data across components in a modular manner, such that loss of specific components does not impair the functionality of other components holding the data.
It is a still further object of the present invention that the system and method record and store the computer network component time-related state data in a manner allowing scalability, so that overall system effectiveness is not governed by the size or complexity of the underlying system infrastructure.
It is another object of the present invention that the system and method record and store the aforementioned computer network component time-related state data such that effective means for querying the stored state data is not a condition of knowledge of the manner in which queried data is collected or stored.
The present invention is directed to a system and method for automatically and adaptively capturing, recording, and retrieving large amounts of complex Information Technology (IT) system component state data in a distributed, hierarchical manner. Monitored components include virtually any element in an IT system, including hardware, e.g., routers, hard drives, etc., and software, e.g., databases, operating system kernels, etc. In a preferred embodiment, collection and storage elements, or objects, are logically arranged in a hierarchical manner such that data collected may be propagated up in the hierarchy. Similarly, querying of such data is performed in a hierarchial manner, e.g., queries are propagated down and results propagated up. Propagation of collected data through the storage system is performed in a manner to optimize system performance. Uniformity in the collection and storage scheme allows easy expansion of the collection and storage system, and thus the underlying IT system infrastructure.