Cloud data centers rely on virtualization to enable cloud service portability. Services hosted in the cloud are typically deployed as sets of applications/processes running on one or more Virtual Machines (VMs). In this regard, the services hosted in the cloud that are running on VMs may be referred to as virtualized cloud services.
Using live migration technology, a cluster of VMs running multiple applications associated with a single service can be moved together as one unit from one subnet to another, thereby ensuring co-location of the VMs hosting a service and minimizing latency between VMs. This live migration technique is used in data centers to perform various tasks including IT maintenance, load balancing, power management, and development-to-operations support.
For example, the use of live migration as a mechanism for improving resilience/availability of cloud services is described in U.S. patent application Ser. No. 15/604,552 to Diallo et al. (“552 application”), also owned by Applicant, the content of which is fully incorporated by reference herein. Assumed was a cloud infrastructure as a service (IaaS) model where cloud service providers manage virtual machine instances and offer them as a service to customers. When there is an anomaly in a cloud infrastructure that results in disruption of the cloud (i.e., the cloud servers are no longer functional), then VMs will need to be migrated to preserve the availability of the services they are providing. But in order to efficiently migrate cloud services, it must be determined when to migrate cloud services, what virtual machines to select to migrate, and where to migrate the selected virtual machines. A virtual machine selection algorithm that maximizes the availability of high priority services during migration under time and network bandwidth constraints operates to address what virtual machines to select to migrate and where to migrate the selected virtual machines.
The success of live migration of virtual machines for preserving the availability of cloud services still ultimately depends on efficient mechanisms for detecting anomalies in the underlying virtual machines running the cloud services so as to determine when to migrate cloud services. Anomaly detection is the problem of identifying outliers in a dataset. The outliers are identified relative to a baseline of normal or expected data. In order to generate a baseline of normal data, historical data is collected and stored for future reference by the anomaly detection algorithm. Anomaly detection algorithms fit into one of three broad categories: unsupervised, supervised, and semi-supervised. Unsupervised anomaly detection algorithms assume that most data in a given dataset are normal, and only the data points that deviate the most from the data set are identified as anomalous. Supervised anomaly detection algorithms require a dataset with data points labeled as normal or abnormal in order to train a classifier to differentiate between the two classes. Semi-supervised anomaly detection algorithms take a normal dataset as input to build a model representing normal behavior in the system, and then output the probability that a given data point could be generated by the model that was built.
One technique used to detect anomalies in VMs in a cloud environment is the Kalman Filter, which is a typical time series forecasting algorithm. The Kalman Filter is a set of equations that implement a linear predictor-corrector estimator in the time domain that minimizes the estimated error covariance. The Kalman Filter enables forecasting of time series data, and can be used to identify data points that vary significantly from the forecast.
A need exists for an approach that extends the basic Kalman Filter and recursively adapts the input gains, the noise, and measurement covariances to achieve ongoing, automated operation, as well as an approach to use a moving average filter on the log likelihoods of past measurements to produce a more robust anomaly indicator.