The present invention relates to extraction, transform, load (ETL) processes in databases, and more specifically, to systems and methods for concurrency control for multiple ETL processes implementing ETL-level locking and rollback for process failures.
Currently, business enterprises are building sizable databases to enable analytics to improve business performance. ETL is a process to load data into a database. Often, users run multiple ETL processes on one database, either accidentally or intentionally. For example, two users may both load data into one database at the same time without knowing each other's action. If a user knows that two ETL processes touch two disjointed parts of a database, these two processes should be able to be executed simultaneously. However, if there are multiple ETL processes performing simultaneously on one database, there may be data inconsistency. The loading result may not be the same as the ETL processes done one by one. Since an ETL process usually consists of multiple database transactions, the transaction-level concurrency control in database cannot guarantee the ETL-level consistency. Therefore, it is important that the loading tool can guarantee the data consistency with concurrent ETL processes. The effect (i.e., the loading result) of concurrent ETL processes should be the same as if those processes are executed one by one, which is called the serialization of ETL processes. Although database systems currently support transaction-level locks to guarantee the data consistency of concurrent transactions, the database systems cannot be used for the ETL processes because an ETL process often include multiple transactions.