1. Technical Field of the Invention
The present invention relates to complex information technology systems (IT) and, in particular, to a system and method for discovering relations between components in a complex IT system, and more particularly, to techniques for iteratively determining IT system component associations.
2. Background and Object of the Invention
With the exponential growth of the computer and the computer industry, information technology (IT) systems have become increasingly complex and difficult to manage. A typical IT system in even a small company may contain dozens of computers, printers, servers, databases, etc., each component in some way connected to the others across the interlinkage. A simplified example of an interconnected IT system is shown in FIG. 1, described in more detail hereinafter.
Although interconnected systems, such as the one shown in FIG. 1, offer many advantages to the users, e.g., resource sharing, as such systems grow and the number of component interlinkages increase, the behavior of these complex systems becomes more difficult to predict. Further, system performance begins to lag or becomes inconsistent, even becoming chaotic in nature. The addition or removal of one component, even seemingly minor, could have dramatic consequences on the performance of the whole system. Even an upgrade on one component could adversely affect a distant, seemingly unrelated component. The system and method of the present invention is directed to techniques to better predict the behavior of complex IT systems, offering system administrators the opportunity to identify problem areas such as performance bottlenecks and to correct them prior to a system or component failure.
Conventional approaches to system performance monitoring are inadequate to easily divine the nature of a performance problem in a complex IT system since any data collected in monitoring is generally useless in ascertaining the true nature of the performance difficulty. The system and method of the present invention, however, provide a mechanism whereby system monitoring data is made easily accessible and usable for analyzing current performance and predicting future performance. The present invention facilitates this analysis through use of data mining principles discussed further hereinafter.
In general, data mining is an analysis of data in a database using tools which determine trends or patterns of event occurrences without knowledge of the meaning of the analyzed data. Such analysis may reveal strategic information that is hidden in vast amounts of data stored in a database. Typically, data mining is used when the quantity of information being analyzed is very large, when variables of interest are influenced by complicated relations to other variables, when the importance of the variable varies with its own value, or when the importance of variables vary with respect to time. In situations such as these, traditional statistical analysis techniques or common database management systems may fail or become unduly cumbersome.
Every year, companies compile large volumes of information in databases further straining the capabilities of traditional data analysis techniques. These increasingly growing databases contain valuable information on many facets of the companies"" business operations, including trend information which may only be gleaned by a critical analysis of key data interspersed across the database(s). Unfortunately, because of the sheer volume and/or complexity of the available information, such trend information is typically lost as it is unrecoverable by manual interpretation methods or traditional information management systems. The principles of data mining, however, may be employed as a tool to discover this hidden trend information buried within the pile of total information available.
Such data mining techniques are being increasingly utilized in a number of fields, including banking, marketing, biomedical applications and a number of other industries. Insurance companies and banks have used data mining for risk analysis, for example, using data mining methods in investigating its own claims databases for relations between client characteristics and corresponding claims. Insurance companies have obvious interest in the characteristics of their policy holders, particularly those exhibiting risky or otherwise inappropriate activities or behaviors adverse to the companies"" interests, and with such analyses, are able to determine risk-profiles and adjust premiums commensurate with the determined risk.
Data mining has also found great success in direct marketing strategies. Direct marketing firms are able to determine relationships between personal attributes, such as age, gender, locality, income, and the likelihood that a person will respond to, for instance, a particular direct mailing. These relationships may then be used to direct mailing towards persons with the greatest probability of responding, thus enhancing the companies"" prospects and potential profits. In utilizing data mining techniques, the company mails X number of direct marketing sales proposals. Out of these mailing, a percentage Y reply. Data mining techniques are then applied to a database containing biographical information on all persons to whom mailings were directed. Relational factors between those that did and did not respond may be determined. The result is a subgroup of the original database with mailing targets that have demonstrated a greater probability of responding. This subgroup could be, for example, middle-aged, dual-income families with one child. Future mailings could be directed towards families fitting this biographical data. Responses from these familial groups could then be further data mined in relation to the original group to refine the analysis. A process such as this could be repeated indefinitely, where changes in behaviors of targeted groups would be recovered over time through increased amounts of data that is analyzed and with repeated analysis. In this sense, the data mining analysis xe2x80x98learnsxe2x80x99 from each repeated result. In this example, data mining is used to predict the behavior of customers based on historical analysis of their behavior.
In the same manner demonstrated hereinabove, data mining may also be employed in predicting the behavior of the components of a complex information technology (IT) system. Similar approaches with appropriate modifications can be used to determine how interconnected components influence each other and for uncovering complex relations that exist throughout the IT system.
As discussed, multiple applications will be operated within a common IT infrastructure, such as the one shown in FIG. 1. Often, these applications will utilize some of the same resources. It is obvious that sharing of IT infrastructure resources among different applications may cause unexpected interactions or system behavior and often such unexpected interactions, being non-synergist are undesirable in nature. An example would be multiple business applications sharing a router within an IT system. As illustrated, a particular application, e.g., an E-mail service, burdens a router in such a way that other applications do not function well. In this example, it is reasonable to expect numerous applications to, at times, share usage of the router. Traditional systems management techniques may prove difficult in determining which specific application is causing loss of system performance. This example explains why the need to find hidden relationships among IT system components and applications running in such environments exist. By way of solving the problem in the previous example, it may be necessary to reroute E-mail traffic through another router to obtain adequate performance for the other applications.
Traditional IT system management is now generally defined as including all the tasks that have to be performed to ensure the capability of the IT infrastructure of an organization to meet user requirements. Shown in FIG. 2 is a traditional IT systems management model, generally designated by the reference numeral 200. Essentially, there are groups of system administrators 210 having knowledge of the IT infrastructure, such as the one shown in FIG. 1 hereinafter and generally designated herein by reference numeral 220, which they are managing. Typically, the knowledge of the infrastructure 220 is scattered among the various personnel comprising the system administrator group 210. The total of this knowledge is limited to the sum of the individual administrators"" knowledge, where invariably there is a great deal of redundancy of knowledge. This redundancy may be considered an inefficiency of the overall knowledge base. In other words, a theoretical maximum knowledge of the infrastructure 220 would be realized only when each individual administrator of the administration group 210 had knowledge that was unique to that specific administrator. While this may appear to be an ambiguous analysis of the effectiveness of the group, it is of real consequence for the company that must finance a group of administrators. Furthermore, this knowledge is typically not stored in an easily retrievable electronic form.
When system monitoring is included in the aforementioned traditional management system, this monitoring is usually limited to real time data, such as the current system load and the like. An administrator may observe such reporting of real time data, and if system loads or events being monitored are noticed to be consistent with loads that the administrator recognizes to be associated with impending system malfunction or loss of performance, that administrator may redirect part of the load through alternative subsystems of the IT infrastructure.
Often, such real time data reporting may be used in coordination with a system model of the IT system, of which data is being collected and reported. The model usually includes a computer algorithm that utilizes code governing the relations among various system devices. A problem with such models, however, is that the relations used in modeling the system account only for expected interactions among components and subsystems. The model is, therefore, merely an idealized model of the actual system. Hidden or unexpected relations that exist between components would not be accounted for. Furthermore, as the infrastructure 220 is modified, the model must be manually altered to include new relations in the model algorithm to account for the changes made.
An improvement over this traditional management system is realized in the so-called expert system. An expert system is a form of artificial intelligence in which a computer program containing a database, frequently referred to as a knowledge base, and a number of algorithms used to extrapolate facts from the programmed knowledge and new data that is input into the system. The knowledge base is a compilation of human expertise used to aid in solving problems, e.g., in medical diagnosis. The utility of the expert system is, however, limited to the quality of the data and algorithms that are input into the system by the human expert.
Typically, expert systems are developed so that knowledge may be accumulated from a person or persons skilled in a specific area of technology and stored in an easily retrievable media. This way, persons less skilled than the experts, whose knowledge was accumulated within the expert system, have access to such expert information. In this manner, a company may save human and financial resources by having less skilled personnel access such expert systems instead of requiring the expert to handle all of such situations requiring a certain level of knowledge.
Utilization of such expert systems allows less skilled persons to also analyze IT systems behavior. These systems may be used to aid in troubleshooting faults in an IT system or they may be used to assist in predicting such faults with the assistance of system performance monitors, i.e., a person with access to an expert system applied to a particular IT system may, through appropriate monitors, study system load parameters or the like and through the use of the expert system, make estimates of potential faults due to system bottlenecks or the like.
A significant drawback of expert systems, however, is that they are poorly equipped to handle newly encountered problems or situations. In this manner, it is clear that expert systems are limited in their technical capability of resolving novel issues. Instead, expert systems require a complete model of all the events or failures that can occur in the system being modeled.
The present invention is a further progression towards the realization of a fully automated IT management system. In a manner similar to the way in which data mining techniques are applied to predict the behavior of, for instance, the customers in the direct marketing example, the idea of such techniques may be applied to complex IT system models in determining causal relations between IT system components. The system and method of the present invention when implemented determine how the interlinked components influence each other in terms of performance, potentially uncovering unexpected relations among different components of an IT system and automatically creating or updating causal association models of such systems. This is accomplished through the use of association rule induction methods in conjunction with other data mining techniques applied on historical data sets of system state data.
It is clear that with today""s increasingly interconnected and complex IT infrastructures and the corresponding increases in maintenance costs of such systems, a system and method for discovering causal relationships between various subsystems and elements of such complex networks in a substantially automated manner is certainly a valuable tool.
It is also an object of the present invention to have an automated means of accumulating the assortment of data that may be analyzed by an appropriate data mining technique such that performance models of complex IT systems based on periodic measurements of predefined performance levels may be generated or updated. Additional description on the collection of monitoring data and application of data mining techniques may be found in Applicants"" co-pending patent application, U.S. patent application Ser. No. 09/036,393, entitled xe2x80x9cA System and Method for Generating Performance Models of Complex Information Technology Systemsxe2x80x9d, filed concurrently herewith, which is incorporated herein by reference.
Another desirable feature of an IT system, such as one incorporating the improvements of the present invention, is to reduce the amount of human intervention required for the system to adapt to dynamic system changes. This is preferably accomplished through automation.
It is further desired that the system and method of the present invention analyze system performances with Boolean attributes, i.e., true or false.
The present invention is directed to a system and method for automatically creating causal association models of complex information technology (IT) systems by use of association rule induction methods, preferably in conjunction with other data mining techniques. System state information is periodically recorded by system monitors placed throughout the IT system. This state information is then stored in a database with the system model.
A model of the IT system environment is developed in terms of system components and relations between them. This model may be defined with any level of detail and does not necessarily have to be complete or consistent.
Thresholds are defined in terms of monitoring events. These thresholds are used to convert the monitored state information from its monitored numeric format to Boolean values. Target components are then selected and an association rules algorithm searches for associations with other components based on the Boolean values obtained from comparisons between monitored state information and associated thresholds. The probability of causal relation between components are indicated by sets of association rules. Causal relations implied by the model may then be confirmed or refuted. Causal relations discovered that are not implied by the model may indicate the model is incomplete. In this manner, the causal relations of the model may be refined to more accurately model the system environment.