Field
The disclosed embodiments relate to data auditing. More specifically, the disclosed embodiments relate to techniques for tracking data replication and discrepancies in incremental data audits.
Related Art
Analytics may be used to discover trends, patterns, relationships, and/or other attributes related to large sets of complex, interconnected, and/or multidimensional data. In turn, the discovered information may be used to gain insights and/or guide decisions and/or actions related to the data. For example, business analytics may be used to assess past performance, guide business planning, and/or identify actions that may improve future performance.
On the other hand, significant increases in the size of data sets have resulted in difficulties associated with collecting, storing, managing, transferring, sharing, analyzing, and/or visualizing the data in a timely manner. For example, data used within an organization may be replicated across multiple data centers in different locations. To detect failures or issues with the replication, replicated copies of the data may periodically be retrieved from the data centers and compared. However, conventional data audit mechanisms are unable to scale with large data sets because bulk queries for retrieving the data sets may consume significant resources on databases in which the data sets are stored. Moreover, subsequent comparison of the retrieved data may only identify discrepancies between entire data sets, and fail to indicate where and when the discrepancies occur.
Consequently, management and replication of large data sets may be facilitated by improving the efficiency and granularity of data audit mechanisms.
In the figures, like reference numerals refer to the same figure elements.