Information Technology (IT) systems, methods and computer program products, including, for example, computer networks, have grown increasingly complex with the use of distributed client/server applications, heterogeneous platforms and multiple protocols all on a single physical backbone. The control of traffic on networks is likewise moving from centralized information systems departments to distributed work groups. The growing utilization of computer networks is not only causing a move to new, high speed technologies, but is, at the same time, making the operation of computer networks more critical to day to day business operations. Furthermore, as computer systems become more distributed and, thereby, more inter-related, the number of different components of a system that may result in problems increases. For example, application integration, including integration across heterogenous systems, has increased the complexity of systems and the interdependence of systems while also increasing reliance on such systems for example, for mission critical applications.
This increase in the complexity of systems may make problem determination and/or resolution more complex. In conventional systems, components, such as applications, middleware, hardware devices and the like, generate data that represents the status of the component. This component status data will, typically, be consumed by some management function utilized to monitor the system and/or for problem analysis/resolution. The management function may, for example, be a user reading a log file or it may be a management application that is consuming the data for analysis and/or display. In conventional systems, component and component owners are responsible for determining what data is provided, in terms of format, completeness and/or order of the data as well as the meaning of the data.
Such an ad hoc approach to component status information may be convenient for the component developer, however, the complexity of the management function may be increased. For example, the management function, may need some context for a status message from the component. In particular, the management function will, typically, need to know what a data message from a component represents, the format of the data, the meaning of the data and what data is available. For example, the management function may need to know that a particular message (e.g., message “123”), from a particular component (e.g., component “ABC”) has a certain number of fields (e.g., three fields) and what data is in each of the fields (e.g., a first field is a timestamp, a second field is a queue name and third field is a value for the queue name). Typically, no data other than the data provided by the component can be derived from the management system. Furthermore, this approach also assumes that the consumer of the data knows, not only the significance of the data fields, but also the format of the fields (e.g., the timestamp is in the mm/dd/yy format).
Furthermore, the cause of the problem that is reported by an error message may be reported by a component other than the component with the problem. Thus, a management function may need to know, not only the existence of the component, but the relationship between the components that are managed. Without such knowledge, the management function may not recognize that the source of the component is not the component reporting the error.
One difficulty that may arise from the use of differing component status formats is in the analysis of problems for differing components or from different versions of a component. Knowledge bases have conventionally been used to map component status data, such as error log messages, that are reported by components to symptoms and eventually to fixes for problems. For example, there are symptom databases utilized by International Business Machines Corporation, Armonk, N.Y., that map Web Sphere error log messages to symptoms and fixes. These databases typically work on the assumption that if you see a specified error message (e.g., message “123”) from a specified component (e.g., component “XYZ”), then a particular symptom is occurring (e.g., the performance is slow) and a predefined remedy (e.g., increase the parameter “buffsize” to 10) will likely fix the problem.
One problem with mapping error messages or combinations of error messages to symptoms and fixes is that such a mapping typically associates specific, component dependent, error messages to specific, component dependent, fixes. Thus, for example, if a new release of a product is released, the symptom database may need to be rewritten or modified to take into account all the new messages. This approach does not lend itself to creating cross-product or cross-component symptom databases as each message for each product must, typically, be known in order to create the symptom database.
The above problems may be exacerbated when attempting to resolve problems in a business process that uses application programs that run on an IT infrastructure that includes a plurality of IT components. In particular, the application programs can generally run a business application and/or process. It may be difficult to detect and resolve problems occurring in the IT infrastructure and/or the application programs that may cause the business process that is run from the application program to fail. More specifically, a business application/process administrator may not have an in-depth knowledge of the IT infrastructure components and how to detect and resolve problems therein. Moreover, changes in the IT infrastructure components and their characteristics may make even finding relevant IT infrastructure components difficult.
In one example, a business process or process step may encounter a problem. For example, an account balance may not be retrieved. Conventionally, the business application administrator may need to work with a database administrator, a network administrator, a host/operating system administrator, etc., to figure out the problem. This may make problem identification and resolution difficult, time consuming and/or costly.