This application relates to anomaly detection in storage devices.
Enterprise computer systems, datacenters, and other critical systems are generally expected to be online and accessible at any time. Any unplanned outages, data loss, and other failures often result in significant operational costs, negative publicity, and revenue losses. To increase uptime and reduce the probability of system failure, administrators often monitor data and signals processed by the components of the systems for anomalies and attempt to address them before they become critical in nature.
However, as the data processing and data storage capabilities of such systems increase year over year, the amount of data processed by the systems has reduced the effectiveness of current techniques to adequately monitor, predict, and address unusual behavior. For example, current large internet companies have banks of servers (such as e-mail servers, cloud storage servers, etc.) that are continuously monitored. Many measurements on server performance are collected every hour for each of thousands of servers. It is particularly challenging to process this volume of data for unusual behavior and then isolate it to a particular computing system or component.
Monitored signals are often analyzed as time series, which can change dynamically over time. Existing solutions attempt to flag independent time series as anomalous in various different domains. For instance, solutions attempt to detect anomalous units represented by a single time series. These existing solutions use various techniques (such as feature extraction, feature space dimensional reduction, and unit anomalous scoring) to identify and process anomalous time series. However, the features extracted using these time series techniques are typically difficult to interpret and associate with a given system component, such as a storage device.
As a result, these existing solutions are generally unable to automatically identify the components that require attention based on potential signal imbalances or unusual values for given time windows of interest or provide reliable insights about or reasoning behind any anomalous behavior.