1. Field
This description relates to a method, system and computer readable medium for detecting data integrity and inconsistence issues when replicating data in data storage and data processing systems.
2. Related Art
Many different kinds of replication tools are used to move data from a source application (e.g., enterprise resource planning (ERP)) running on, for example, database systems (e.g., Oracle™, MS-SQL™, and the like) into a destination application running on another database system. For example, moving data from traditional applications running on traditional database systems into an In-Memory database (e.g., high-performance analytic appliance (HANA)).
The HANA (e.g., SAP™ HANA) may be a data warehouse appliance for processing high volumes of operational and transactional data in real-time. HANA may use in-memory analytics, an approach that queries data stored in random access memory (RAM) instead of on hard disk or flash storage. A common problem when moving (e.g., replicating) data using a replication tool is a lack of data integrity and data inconsistency.
For example, replication tools are reactive in nature. Certain actions that happen on the source device or application cause an appropriate reaction from the replicator tool. Typically, the action would be an insert to a database, which is logged via database log files, database triggers or scanning via polling. Once a change is identified, the reaction is typically to copy all the relevant data and replicate the data into the destination system.
Replication tools may perform this task in a sequential generic manner. For example, when a change is detected in the source device or application, the replication tool queues the changes and replicates the changes in the destination device or application. Replication may not consider transactional integrity. Because the replication tool does not consider transactional integrity, a logical unit of work in an application including data across 4-5 different tables is replicated into the destination device or application in a completely random fashion. For example, in one scenario an ideal sequence for data replication may be Header, Line1, Line2, SubLine1.1, Subline 1.2 (as created in the source system). However, during replication the data gets created on the destination device or application in the sequence Line1, Header, Subline 1.1, SubLine1.2, Line2. As a result, data integrity issues may be present at some point in time during replication.
In addition, typically data may be continuously replicated from the source device or application to the destination device or application. If a program is being executed on the destination device or application, and updates are made on the table, these changes do not get reflected in the executing program, leading to incorrect results. As a result, data inconsistency issues may be present during program execution.
Further, typically during replication table locking may be necessary. In order to guarantee the data consistency and integrity, applications may use a lock mechanism. Traditional “SELECT FOR UPDATE” or “SELECT . . . LOCK” are effective for single server case (both on-line transactional processing (OLTP) and on-line analytical processing (OLAP) on same server). However, in the case of multiple servers (e.g., OLTP on enterprise core component (ECC) and OLAP on HANA), the traditional lock mechanism may be ineffective because all the data is replicated from source server to destination or target server by a data replication tool (e.g., SAP landscape transformation (SLT)). The replication tool is unable to run analysis on the destination or target server while locking the corresponding data rows on source server to prevent changes.