A typical means of storing a working data set often involves one monolithic database. Updates may be applied to this data set presenting an updated view of the data with little to no history about the previous state. Any updated data can overwrite the previous value yielding a data set focused solely on current values. This may only be acceptable if the desire is for the most up-to-date information with little regard for history.
If there is a desire to store prior state information and values it is typically done in the same physical location. This may allow other dimensions to the data storage in exchange for significant increases in fields and records. Time or version is often a desired alternate data dimension. As the data collection period grows, the monolithic data store can become problematic for scalability and efficient hosting. Over time, the size of the data set may grow significantly, posing feasibility problems for hardware and software host systems. Even if hardware and software can maintain reliability in the face of ever increasing volumes of data, the performance may deteriorate.
Storing data over time may pose a challenge for clients accessing that data. Over time, the format and contents of a data set can often change. It is possible to apply a normalization to the data that can provide a consistent client view despite source changes. As the source format deviates further from the original format, it may become more difficult and resource intensive to convert new updates. Converting new updates to a previous format is usually destructive and so the value inherent in that difference may be lost. It is a cycle that can continue to deteriorate without a wholesale conversion of client expectations and data formats. Conversion of client expectations and interfaces required for a change in data content or format can be costly or impossible. Coordinating this type of conversion in a high availability environment further complicates the process.
Storing all history of a data set may be extremely expensive as every byte stored has a direct cost associated with it. The associated structures for searching the data can add space and further increase cost. Various methods exist for reducing the resources required for storing a data set. Most data storage optimizations are generic and designed to optimize storage space while sacrificing performance.
What is needed in the art is a method for organizing multiple data sets on-line so that over time new content may be added without effecting system reliability or performance. What is also needed is a way to efficiently normalize each of the data sets individually so that a consistent client view may be provided. What is also needed is a flexible system that can accommodate many different data formats.