The present invention generally relates to verifying data consistency, and more particularly verifying data consistency between update-in-place data structures and append-only data structures.
Emerging processing solutions provide a platform for curation and analysis of massive amounts of live (continuously updated) data by integrating big data processing with recovery log capture technology associated with update-in-place data structures, such as a relational database management systems (RDBMS). Big data refers to a massive amount of structured and/or unstructured data that is too large to process with traditional database techniques, e.g., a query-in-serial. Big data platforms may use distributed storage architecture (e.g., a distributed file system) and a distributed processing architecture. To support queries over a temporally complete, continuously updated history of RDBMS change data, processing solutions may continuously append the change history into big data targets, such as an append-only data structure (e.g., a log file/table stored in a distributed file system associated with a big data platform). However, data changes due to faulty replication processes, data corruptions, operator errors, and the like, may occur at the side of the update-in-place data structure or the side associated with the append-only data structure.