When online services are used via networked computing environments, interactions with the online services generate large amounts of data that indicate various characteristics regarding the use of these online services. For example, various electronic interactions via online services (e.g., page views, website visits, webpage reloads) automatically generate data describing these actions (e.g., numbers of page views or website visits for each day of a given time period). Analysis of this data can identify issues that impact the ability of the online service to provide end-user experiences of sufficiently quality, reliability, or both.
One example of analysis that may be performed on datasets generated by online services is anomaly detection. An example of an anomaly is an outlier or group of outliers in a dataset that has a statistically significant deviation from a majority distribution. Anomaly detection involves finding trends in data that do not conform to expected or normal trends. Anomaly detection may be performed on machine-generated event log data (e.g., network logs) to detect, for example, changes in effectiveness for a given online service (e.g., network disruptions), responsiveness of end users to certain online content, indications of malware or other suspicious activity, or any other metric indicating a performance level associated with an online service.
Anomaly detection typically involves analyzing large amounts of data for possible anomalies. The amount of data required for detecting anomalies prevents (or makes impractical) reliance on human monitoring of data, and therefore requires executing automated algorithms to perform at least some of the data processing required for anomaly detection. For example, metrics data collected over a period of time may be analyzed using a time-series anomaly detection algorithm using certain metrics, dimensions, and filters.
One example of an anomaly-detection algorithm involves identifying point anomalies, where an individual value or a set of values for an individual point in time is determined to be anomalous with respect to the rest of the metrics dataset. If the value or set of values (at any point in time) differs significantly from a predicted value, then the corresponding point in time within the metrics dataset is marked as anomalous. Another example of an anomaly-detection algorithm involves identifying contextual anomalies, where certain combinations of trends in a metrics dataset deviate from an expected combination of trends. For example, if a first metric (e.g., “website visits”) and a second metric (e.g., “impressions”) are expected to be highly correlated over a certain time period, but exhibit a low correlation, then one of the metrics is determined to be behaving anomalously with respect to the second metric.
Current solutions for performing anomaly detection in datasets may present disadvantages. In particular, both point anomalies and contextual anomalies are determined with respect to an entire metrics dataset. For example, if a metric is “website visits,” website visits resulting from any source (e.g., clicking links on search results) may be analyzed to identify a point anomaly or contextual anomaly. When considered as a whole, the metrics dataset may not exhibit anomalies. For example, if a first segment of the metrics dataset (e.g., website visits originating from a first search engine) exhibits an anomalously large number of website visits and a second segment of the metrics dataset (e.g., website visits originating from a first search engine) exhibits an anomalously small number of website visits, the large number of visits may offset the small number of visits when all data is aggregated together. Thus, an anomaly-detection algorithm may not identify any anomaly even if two anomalies are present in the dataset.