Cloud computing is now ubiquitous in both enterprise and consumer settings. In cloud computing, data and applications are accessed over the Internet instead of requiring local storage and compute resources, and instead of owning all the hardware where the data resides and the software applications execute, an enterprise or a consumer (the “client” or “tenant”) utilizes hardware and software resources supplied by a cloud computing provider to store the data and run the applications. Relying on sharing of resources among numerous clients, the cloud computing infrastructure (sometimes referred to as Infrastructure as a service (IaaS)) satisfies elastic demand spikes and achieves economies of scale thus becoming popular in various industries. In an IaaS model, computing resources are often offered as a number of virtual machines to a client requesting computing resources, and a hypervisor manages the offered virtual machines.
For media content processing, a video streaming system may utilize the cloud computing infrastructures offered by the cloud computing providers to provide services to a client. An operator of the video streaming system often is not the cloud computing providers. Thus a client may reach a service level agreement (SLA) with the operator of the video streaming system, and the operator of the video streaming system leases computing resources within the cloud computing infrastructures to process media contents from the client. Preferably the operator of the video streaming system can detect issues while utilizing the cloud computing infrastructures to process the media contents, thus mitigating them in real-time so that end user's experience is not affected.
Traditional network management/monitoring tools do not work well in this kind of applications. For example, simple network management protocol (SNMP) may provide management information to a management system, which determines if the system operates in a normal condition. However, management using a SNMP and other generic network management protocols are not tailored to video processing, which typically requires a high level of SLA (e.g., 99.9% availability or higher) thus the management/monitoring needs to detect issues and mitigate the issues quickly, and they do not have sufficient knowledge of the video processing for this kind of application in a cloud environment.