Many current efforts ongoing within the information technology community include considerable interest in the concept of “intelligent workload management.” In particular, much of the recent development in the information technology community has focused on providing better techniques to intelligently mange “cloud” computing environments, which generally include dynamically scalable virtualized resources that typically provide network services. For example, cloud computing environments often use virtualization as the preferred paradigm to host workloads on underlying physical hardware resources. For various reasons, computing models built around cloud or virtualized data centers have become increasingly viable, including cloud infrastructures can permit information technology resources to be treated as utilities that can be automatically provisioned on demand. Moreover, cloud infrastructures can limit the computational and financial cost that any particular service has to the actual resources that the service consumes, while further providing users or other resource consumers with the ability to leverage technologies that could otherwise be unavailable. Thus, as cloud computing and storage environments become more pervasive, many information technology organizations will likely find that moving resources currently hosted in physical data centers to cloud and virtualized data centers can yield economies of scale, among other advantages.
Nonetheless, although many efforts in the information technology community relates to moving towards cloud and virtualized computing environments, existing systems tend to fall short in providing adequate solutions that can manage or control such environments. For example, cloud computing environments are generally designed to support generic business practices, meaning that individuals and organizations typically lack the ability to change many aspects of the platform. Moreover, concerns regarding performance, latency, reliability, and security can present significant challenges because outages and downtime often lead to lost business opportunities and decreased productivity, while the generic platform may present governance, risk, and compliance concerns. In other words, once organizations deploy workloads beyond data center boundaries, the lack of visibility into the computing environment that hosts the workloads may result in significant management problems. In this context, the most difficult problem with managing a data center relates to troubleshooting, especially because client devices tend to lack visibility into virtualized and cloud data centers that may be needed to identify particular machines delivering content to the client devices, while servers lack the visibility needed to identify the content being delivered to client devices without implementing custom logging techniques for every delivering application.
Moreover, the interaction between various workloads typically extends beyond the servers or other systems that exercise the workloads because a management infrastructure needs to have knowledge relating to every aspect in the managed environment, including what experiences are occurring on every level within the managed environment. For example, many business service management products that currently attempt to diagnose or troubleshoot problems in an information technology infrastructure tend to monitor a few levels within the managed environment and audit or track certain actions in the managed environment where the problems may be occurring. As such, because business service management technology currently used in the information technology industry primarily works only on problem causes, business service management technology currently in use typically ignores or rarely combines causes with potential effects to make intelligent management decisions. In particular, because the current technology tends to limit visibility into the managed environment to certain monitored levels and audited actions, the existing approaches to diagnosing or troubleshooting problems in an information technology infrastructure usually experience substantial problems when the infrastructure does not work as expected.
For example, a common problem that may occur in an information technology infrastructure relates to a user experiencing degraded performance for a particular service (e.g., slow e-mail response time through a web-based client). Historically, the user would contact help desk personnel, who then spends several minutes gathering information to create a trouble ticket to resolve the problem. In many cases, the ticket will not even be looked at until some time later (if at all), and any diagnostic efforts that then occur would then focus on checking certain monitors and glaring problems in the system that may be contributing to the reported problem. Accordingly, due to the limited information considered in the diagnostic efforts, many reported problems will have little or no useful information returned, with the problem ticket repeatedly bouncing back and forth between the user, help desk personnel, and other entities, with the ticket often eventually being closed because the problem could not be diagnosed. In other words, unless the help desk personnel are able to see that a substantial issue may be contributing the problem (e.g., the e-mail server went down), the lack of visibility that existing systems have into the infrastructure can result in nothing being resolved because existing systems lack the knowledge needed to correlate the problem with other potentially contributing problems. Accordingly, although existing systems have attempted to provide solutions that can troubleshoot and gather management data to diagnose issues in an information technology infrastructure, the solutions that have been proposed tend to fall short in providing techniques that can suitably capture and combine all potential causes and effects with instant feedback and other information sources to obtain more details around what may be happening in the infrastructure.