This invention relates to systems and methods for determining the root cause of a failure in a mixed hardware and software system. The mixed hardware and software systems include computer related hardware components, software applications, and processes any of which are susceptible to failing and causing varying levels of loss in functionality of the system.
With the continuing push to automate existing business processes dependency on these new automated processes increases. Many companies and users rely on the continued operation of these automated systems, thus when an outage occurs it becomes critically important to quickly diagnosis where the fault is in the automated system and what impact that fault has on the various components and subcomponents of the automated system. Typically, an automated system comprises many components and subcomponents of which each may be designed, installed and maintained by different suppliers. It is readily apparent from the interdependency of the system components and subcomponents that it becomes increasingly difficult to determine which supplier is responsible for a failure and what degree of damage the failure has caused.
Conventional diagnostic methods and systems are capable of reporting overall application failures and specific subsystem failures. However, such conventional diagnostic systems are unable to report the impact a particular application or subsystem failure has on other applications and subsystems within the overall system. Generally, suppliers of the subsystems and applications enter into agreements with companies who own the entire system to maintain their respective components and subcomponents. These agreements specify when a failure occurs the supplier of the subsystems or applications will get the subsystem or application up and running in a specified time. Without a tool to track a system outage and link the outage to a specific subsystem or application failure it becomes exceedingly difficult to hold suppliers accountable for the system outage. Thus, the company is left to shoulder the entire financial burden caused by the system outage.
Therefore, a need exists for a system and method which is capable of linking a hardware and software system outage to a specific process, application, and/or hardware subsystem failure. The system and method should also be capable of categorizing the impact on the overall system. For example, the system and method should report whether the hardware and software system is undergoing a complete outage, a partial outage, or a functional system degradation.
Accordingly, it is an object of the present invention to provide a system and method for linking a system performance to one of a subsystem and software failure.
In accordance with the above object and other objects of the present invention a method for linking a performance of a mixed hardware and software system to a system failure is provided. The method includes identifying a plurality of physical hardware components in the mixed system, determining a plurality of software applications in the mixed system, classifying a plurality of software processes in the mixed system, establishing a plurality of collection points for monitoring the operation of each of the physical hardware components, software applications, and software processes, creating a physical grid wherein the physical grid indicates a relationship between the subsystem and the software, creating a logical grid wherein the logical grid indicates a relationship between the software applications and the software processes, combining the physical and logical grids to obtain the relationship between the subsystem and the processes, and utilizing the combined grid to link a loss in functionality to one of a subsystem failure and a process failure.
In accordance with another aspect of the present invention a system for linking a performance of a mixed subsystem and software system to a system failure is provided. The system includes a plurality of subsystems in the mixed system, a plurality of software applications in the mixed system, a plurality of software processes in the mixed system, a plurality of collection points for monitoring the operation of each of the physical subsystem, software applications, and software processes, a physical grid wherein the physical grid indicates a relationship between the subsystem and the software, a logical grid wherein the logical grid indicates a relationship between the software applications and the software processes, and a combined grid created by combining the physical and logical grids to obtain the relationship between the subsystem and the processes, wherein the combined grid is used to link a loss in functionality to one of a subsystem failure and a process failure.
The above object and other objects, features, and advantages of the present invention are readily apparent from the following detailed description of the best mode for carrying out the invention when taken in connection with the accompanying drawings.