The present invention relates generally to the field of data organization, and more specifically, to the versioning of data in data warehouses.
Enterprises are building what some would consider significantly sizable data warehouses to enable analytics. Analytic techniques may provide a way to unlock the power of information and improve business performance. For instance, it is known to employ data warehousing and analytics solutions to identify the reputation of a business product by collecting people's opinions on the Web. In some applications, Web data may be constantly collected, ingested and processed into the data warehouses to enable analytics. Throughout such a data flow Web pages may be frequently updated, for example, when new content is added, or existing content is revised or deleted, or while other Web pages might simply be inserted as newly collected Web pages.
In the field of data warehouse management, it is known to manually update changed records via low-level Relational database management system (RDBMS) operations. This approach may be practical when the number of users is relatively low and there are only occasional and small numbers of updates.
It is also known to update data records in a data warehouse by replacing the previous version of the data record with an updated file. The previous version is then removed or deleted from the system. Thus, the ability to track changes or perform data analysis on data over time in a data record may be irretrievable.
Therefore, there is a need for a method and system that provides a versioning scheme to data warehouse management.