1. Field
The subject matter disclosed herein relates to offline data generation for online system analysis.
2. Information
Online systems process real time web traffic with low latency and high throughput. For example, some online systems may handle up to 30 billion events every day. Prior to the full deployment of such online systems, the systems are frequently analyzed against the target capacity in a staging or test environment using a portion of the real time traffic or alternatively, offline data.
Real time traffic is not often made available to systems that are not in production, and even when it is, it is complicated to budget the traffic to accurately exercise the online systems. The real time traffic must be of the right quantity, so that the performance numbers are truly representative, and in the right quality, so that all possible execution scenarios are covered. In contrast, offline data are reliably archived in data warehouses, and are easily accessible. Furthermore, the offline data are pre-processed and well-formed, and may be conveniently tailored to satisfy different quantity and quality requirements.