Field of the Invention
The presently claimed invention is generally related to copying data from a primary data storage location to a secondary data storage location. More specifically the presently claimed invention is related to optimizing the performance of updating data in a replicated data store.
Description of the Related Art
Computer systems that manipulate large data sets must occasionally copy data from one large place to another. Conventionally data from a first place (i.e. a first database) is copied (replicated) into a second place (i.e. a second database). In order to maintain coherency between a first and a second database, either the entire database must be re-copied, or changes in the data contained in the first database must be merged into the second database. In certain instances the process of merging changed data into the second database is inefficient. For example, when the second database uses a write-once file system data residing at the second database must be read, changes in the data must then be merged with the data from the second database, and the combined data must be re-written. When the databases contain large amounts of data this process may be inefficient and slow.
An exemplary instance where this issue arises is when data managed by a relational database management system (RDBMS) such as an Oracle database is copied into Hadoop. Hadoop is an open-source software framework that manages large data sets using a Hadoop distributed file system (HDFS) that is a write-once file system. Data from a RDBMS database are commonly copied into Hadoop when performing functions such as archiving data, warehousing data, or parsing data to gather intelligence from the data. Examples of replication data from an RDBMS database to Hadoop are utilized by application such as Shareplex by DELL, Golden Gate by Oracle, and Tungsten by Continuent.
What is needed is a system and a method that allows databases to be rapidly replicated and updated without incurring delays associated with conventional data replication approaches.