1. Field of the Invention
The present invention relates generally to an improved monitoring application and in particular to a method, system, and computer program product for a monitoring application. Still more particularly, the present invention relates to a computer implemented method, a system, and a computer program product for displaying components assigned to events produced by resources analyzed by a monitoring application, the resources being executed within some computing environment. It is also related to a monitoring application for analyzing resources being executed within some computing environment.
2. Description of the Background
A person or team facilitates the management of a large-scale, possibly far-flung computer network, such as the extensive distributed systems that are commonplace nowadays in large organizations. The person or team responsible for this job is typically in charge of everything from the organization's power supplies to its business software applications. The organization's business management, naturally, may not wish to concern itself with the technical details, but does demand that when problems occur, they be dealt with according to the seriousness of the effects they have on the normal operations of the business. For example, management will want the greatest attention to be paid to those problems that affect the highest revenue generators among the various parts of the business organization.
This is a difficult demand to meet. For many network operation managers, it can be very hard just managing the network, identifying, diagnosing and correcting problems as they occur. Being able to prioritize among a set of problems occurring during the same time period in such a way as to differentiate among levels of service being provided to different parts of the business organization has thus far been beyond contemplation.
The phenomenal complexity of the world of a large distributed network of interrelated components is reflected in the distribution of costs involved in managing such a system. If it takes on average three times as long to identify a problem as it does to solve it, the distributed systems parts (hardware and software) and their interrelationships is nearly impenetrable to the operators.
At present, operators are unable to tell how a given problem affects the various users in the business organization, and therefore are unable to know where they should direct enhanced or reduced service efforts, until the problem has been correctly identified. One result of this is that the operations managers have only the other 25% of operations time—the problem resolution portion—from which to carve out all service differentiation, 75% of operations time being spent on identifying problems.
Further, identification of the problem does not necessarily lead clearly to successful resolution of the problem. For example, suppose that the operator has correctly identified the root of a given problem as a bad card in an IP (“Internet Protocol”) router. Do any critical business systems depend on that router? Perhaps, or perhaps not. Before the operator can direct problem resolution efforts to a specific part of the business organization, therefore, he or she needs to understand the systemic impact of the problem. Impact is sensitive to a wide system context, and even to conditions of the moment. The operation manager can attempt to deliver differentiated levels of service only when she knows whether and how this particular fault has affected particular groups of users under the conditions of the network at the time of the failure. As a result, management information about all of the existing components must be collected.
System management information about the components are correlated and analyzed to access the performance and availability of a particular service being provided within the organization. System management information is the information needed to monitor and manage a specific component in a network data processing system. Thereby, a description (a tag) can be assigned by an individual, usually the system manager, to the different components and/or applications. It is possible to use Web 2.0 tag clouds techniques to optimize the visualization of the tags corresponding to collection of information within some system management. The tag cloud is built out of all the assigned tags. And tag frequency is visualized by different sizes and/or color for the fonts. A drilldown can also be used to view all elements which have a specific tag. But such technique does not avoid the awkward requirement to collect management information about all of the existing components.
FIG. 1 shows a system monitoring as known from prior art. Two different applications (Application A 101 and Application B 102) make use of various resources (Res1 104, Res2 105, and Res3 106), which may or may not be shared between the applications 101-102. The applications 101-102 are instrumented and have the capability to produce “events” which will be processed by a system monitor 106. Different applications create different events and those events are associated with further information (data). Examples for events are: “operation X completed successfully”, “the schedule Y could not be executed because resource 1 could not be found”, “the throughput for operation Z is 2 TB/h (tera byte per hour)”.
The nature of events depends heavily on the application specifics. A system monitor 106 receives those events 107-108 and converts their data into measurable quantities, such as “average response time”, and “system available”, and sends an alert 109. The system monitor 106 does not correlate information received from different applications since this would require a high degree of manual customization. The main reason for this problem is that events typically do not contain enough information to correlate events produced from different applications automatically. For example, an application might report a service failure without specifying that this failure was caused by a resource that is temporarily unavailable. A system monitor 106 would not know that two applications were failing due to unavailability of the same resource. Thus, there exists a need to overcome at least one of the preceding deficiencies and limitations of the related art.