1. Technical Field
One or more embodiments relate generally to systems and methods for synthetic data generation. More specifically, one or more embodiments relate to systems and methods of generating synthetic data with trend and/or seasonal information.
2. Background and Relevant Art
Some conventional tracking software monitors user interactions with media (e.g., hits to a website, application downloads, software error reporting). Analytics reports often detail the user interactions by showing the history of user interactions, including trends and notable events. In web applications, website developers and marketing personnel can use the analytics to predict future user traffic based on the number of hits or views a particular webpage has received. Predicting future traffic can play an important role in making development and marketing decisions for web applications and backend support.
In order to predict website traffic and other user interactions, a large amount of data may be required to produce sufficiently accurate predictions. Some methods of obtaining the data required to produce an accurate prediction include collecting previously sampled data. For example, a system can pull a large amount of actual analytics data sampled for a particular application and use the actual data to generate a prediction. Pulling a large amount of analytics data to produce an accurate predictive analysis, however, can use a large amount of processing power and/or processing time, resulting in an impractical solution.
Additionally, the available set of analytics data may not be available or may not contain enough data points to produce an accurate predictive analysis. Additionally, the pool of available data may include sensitive or confidential information. Using a limited pool of data for generating a prediction can result in inaccurate or otherwise unsatisfactory predictions.
In order to generate or expand the pool of data for use in predictive analysis, many methods use a set of actual analytics data to generate synthetic data. For example, some methods use random resampling of collected data to increase the amount of data available for generating predictions. While random resampling in a conventional resampling process can provide good predictive results based on data that has no time dependent characteristics, autoregressive models that generate the synthetic data or aid in the predictions do not retain trend information, seasonal characteristics, or other time dependent relationships between data points. Additionally, autoregressive models typically only apply to stationary processes or non-stationary processes with stationary successive differentiation. Thus, using autoregressive algorithms to predict time-dependent events can lead to unsatisfactory results.
These and other disadvantages may exist with respect to conventional data prediction techniques.