As is known in the art, cloud computing systems, even in pre-architected and pre-qualified environments, contain a relatively large number of hardware devices and components and software applications, modules, and components. In the presence of a fault, alert, or other condition needing attention, it can be difficult to identify the source of the fault or alert since there are many complex components that may be provided by multiple vendors which may make it difficult to correlate information in an efficient manner.
For example, in a cloud computing environment, alerts and events from various event sources in platforms normally contain limited information that may not be meaningful and may seem unrelated to the environment from which they originate. It is challenging for IT personnel to extract executable data from the alerts and take appropriate action.
With large volumes of alerts/events constantly coming from various sources, it is time consuming to troubleshoot all of them all any prioritization. It is challenging to prioritize the alerts/events and take appropriate actions without correlating them and knowing which of the alerts or events are root causes and which are just symptoms. In addition, many of the IT resources are managed in silos by IT personnel specialized in certain technology domains. For example, when a blade in the Cisco Unified Computing System (UCS) fails or has performance issues its impact propagates to the ESX server deployed on the blade, to the virtual machines deployed on the ESX server, to the applications or critical services running on those virtual machines, to the critical business that relies on those services. It may take hours or even days to sort through those alerts or events, which may result in significant detrimental impact on an enterprise.