Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Embodiments relate to handling large data volumes, and in particular, to a vault archive implemented in a big data platform.
With the evolution in sophistication and complexity of databases, stored data has become available for visualization and analysis in increasingly large volumes. Such “big data” may comprise millions or even billions of different records.
Examples of big data can include unstructured postings and shared documents available from social media. However, other types of structured data can also be stored, including rapidly increasing volumes of financial data for processing by business management systems.
Even though data of many kinds (e.g., unstructured and structured) is growing exponentially, it may be desired to retain that data for many years. This desire to archive data may be attributable to business value considerations and/or legal reasons.
Inexpensive long-term storage of historical data calls for the ability to use those data assets—for example to maintain the information that the data represents, and allow for flexible data analysis (reporting). This data storage ability is desired across even the classical silos.
In one example, it may be necessary to store a communication history together with the closing of a deal. In another example, it may be necessary to relate sensor data to a maintenance request.
Conventionally, storing such large volumes of data can be expensive. With such large data volumes at issue, difficulties can arise in preserving the data in a manner that allows cross-querying, where the data is stored unrelatedly in different silos. It can also be a challenge to keep track of the historical state of the data, given changes in the system environment over time, and also evolution in the data structures themselves.