When online services are used via networked computing environments, interactions with the online services generate large amounts of data that indicate various characteristics regarding the use of these online services. For example, various electronic interactions via online services (e.g., page views, website visits, webpage reloads) automatically generate data describing these actions (e.g., numbers of page views or website visits for each day of a given time period). Analysis of this data can identify issues that impact the ability of the online service to provide end-user experiences of sufficient quality, reliability, or both.
Examples of analysis that may be performed on datasets generated by online services is anomaly detection and predictive modeling, such as forecasting of future metrics values. An example of an anomaly is an outlier or group of outliers in a dataset that has a statistically significant deviation from a majority distribution. Anomaly detection involves finding trends in data that do not conform to expected or normal trends. Anomaly detection may be performed on machine-generated event log data (e.g., network logs) to detect, for example, changes in effectiveness for a given online service (e.g., network disruptions), responsiveness of end users to certain online content, indications of malware or other suspicious activity, or any other metric indicating a performance level associated with an online service.
Both anomaly detection and predictive modeling involve analyzing large amounts of time-series data. Time-series data that is generated by interactions with online services often includes metrics data resulting from a complicated mix of various latent components. Examples of latent components include seasonal variations, anomalous spikes in interactions within certain time intervals, and sudden changes in the average level of data traffic with respect to the online service. The amount of data available for analysis prevents (or makes impractical) reliance on human monitoring of data, and therefore requires executing automated algorithms to perform at least some of the data processing required for anomaly detection, data forecasting, or both. But, to accurately identify anomalies or provide accurate data forecasts, these automated algorithms must account for these seasonal patterns, spikes, and level changes.
Current solutions for accounting for these latent components may present disadvantages. In one example, anomaly-detection models or data-forecasting models, which are applied to time-series data, are configured using an assumption that certain latent components (e.g., spikes, level changes, etc.) are non-existent or involve a negligible contribution to the value of a metric under consideration. But these assumptions may prevent the detection of anomalies or reduce the accuracy of forecasts if the ignored latent component accounts for a significant portion of the time series of metrics data. In another example, an analyst may manually configure an anomaly-detection model or a data-forecasting model with data identifying seasonal patterns or other latent components. But this reliance on the analyst's prior knowledge of the relevant latent components results in imprecise or inaccurate models if the analyst's knowledge is incorrect or incomplete.