The accuracy and reliability of systems and processes that use historical data may increase when a sufficiently large collection of data captured at frequent intervals is available. However, storing large volumes of data for long periods may result in increased costs that may be prohibitive. When data is discarded as a matter of retention policy, mistake, or disaster, the absence of historical data may affect other systems that consume historical data.
For example, a major website operator may have tens of millions of visitors view a website each year. Metrics of various types may be captured related to website access and stored so the information may be used by other systems and processes. Such metric data may be sampled at certain time intervals. These samples may be taken, for example, at one minute, five minute, one hour and one day intervals. Each unit of data may contain information about some metric, e.g. web server load or purchase orders placed, at the sample time that the unit of data is captured.
The amount of storage used to store a day's worth of one minute data is significantly more than what is used to store a day's worth of one hour data. Because of the load that storing one minute data places on resources, retention policies may be put in place defining that data captured at frequent intervals, like one minute data, be discarded over time in favor of data sampled at a lower interval that may use less storage, such as one hour data. The one minute data may then be purged from the data store and may no longer be available to systems and processes that would benefit from the existence of the data.