The present invention relates to outage management, and more particularly, to an outage management portal leveraging back-end resources to create a role and user tailored front end interface for coordinating outage responses.
Many companies conduct business operations using a network of managed resources. This network of managed resource can include any type of business resource that is subject to outages, such as a set of one or more information technology (IT) resources, a set of supply chain resources, components of a delivery system, a set of needed public utilities, personnel resources, and the like. One difficulty in controlling and maintaining these managed resources is to detect, diagnose, and remedy integrity troubles within the members. Common sources of trouble include adverse weather effects on physical components of a network of managed resources, equipment failures, user errors, software incompatibilities, and the like. Many of these conditions are extremely difficult to detect through traditional means, which often include engaging various support teams, who check their individual areas of responsibility to determine if “their” equipment/resources caused problems.
In a distributed computing example focused upon IT resources, network maintenance people will often “ping” a remotely located server to ensure that packages are able to pass between two end-points. Local hardware maintenance people will check statistics on their responsible machines, to make sure individual servers are operational. Software maintenance personnel will often check logs to determine whether a given software system has behaved or is behaving as intended. Testing discrete components in this fashion is often time consuming, which results in increased downtime. Further, this divide-and-conquer approach often minimizes combinative effects of interacting components. Additionally, this troubleshooting environment can become mired in administrative induced hurdles, as managers having limited areas of responsibility and/or one or more outsourced service companies can focus on assessing that their equipment is fine and that the problems are not their responsibility, as opposed to having everyone involved focusing upon cooperating to resolve the problem as quickly as possible. An analogous situation occurs for other types (non-IT resources) of managed resources.
An unfortunate reality of present systems for managing a network of resources is that back-end operations data is often “trapped” within subsystem boundaries. For the computing example, server information is often localized to a given server, intranet metrics are only exposed to an intranet management team, and the like. These subsystem boundaries often exist as barriers imposed to protect the safety and integrity of the systems. A totality of existent, but often unavailable, back-end data can provide a relatively clear “picture” of a distributed system, which can be invaluable for successfully handling outages in a timely fashion. Additional information, when properly organized concerning factors outside the back-end data (e.g., weather patterns, Internet communication surges, location specific emergencies such as fires, floods, and hurricanes) can further assist troubleshooting and resolution efforts. At present, troubleshooters are not provided with all relevant information existent for handling outages, whether they involve networks of a computing or non-computing (public utilities, supply chains, or any transportation/delivery network, for that matter) nature.