When online services are used via networked computing environments, interactions with the online services generate large amounts of data that indicate various characteristics regarding the use of these online services. For example, various electronic interactions via online services (e.g., page views, website visits, webpage reloads) automatically generate data describing these actions (e.g., numbers of page views or website visits for each day of a given time period). Analysis of this data can identify issues that impact the ability of the online service to provide end-user experiences of sufficiently quality, reliability, or both.
One example of analysis that may be performed on datasets regarding online services is anomaly detection. An example of an anomaly is an outlier in a dataset that has a statistically significant deviation from a majority distribution. Anomaly detection may be performed on machine-generated event log data (e.g., network logs) to detect, for example, changes in effectiveness for a given online service (e.g., network disruptions), responsiveness of end users to certain online content, indications of malware or other suspicious activity, or any other metric indicating a performance level associated with an online service.
Current solutions for performing anomaly detection and other analysis of datasets may present disadvantages. Certain existing anomaly-detection algorithms analyze all of the metrics data generated by a given reporting tool of an online service. In some cases, this analysis involves large datasets requiring extensive processing resources. For example, metrics such as webpage visits, page views, reloads, and other metrics data may describe thousands or millions of interactions with an online service. Furthermore, the metrics data (e.g., website visits over a given time period) may be divided into additional geographic dimensions (e.g. website visits over a given time period for a first country, website visits over a given time period for a second country, etc.) or other dimensions. Increasing the number of dimensions analyzed by an anomaly-detection algorithm increases the complexity, and the required processing resources, for identifying anomalies in these metrics datasets.
With respect to these and other considerations, improvements are desirable for efficiently performing anomaly detection or other analytical algorithms over large datasets.