1. Technical Field
Embodiments relate generally to data processing environments and, more particularly, to a system providing data replication using a partitioning scheme.
2. Background Art
Computers are powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a business may have a database of employees. The database of employees may have a record for each employee where each record includes fields designating specific properties or information about any employee, such as, but not limited to the employee's name, contact information, and salary.
Between the actual physical database (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art.
Certain tables of a database may perform join operations. During the join operation it is often required to move or copy the tables or intermediate results to other hosts of the same database instance, which is called a remote join. This significantly affects performance. Therefore it is reasonable to have local replicas on all relevant servers. However, replication causes a higher memory consumption and affects overall system performance.
When a row is inserted into a table, a record is written into a delta log. At this moment in time, an entry is also written into a recovery (redo) log. If a user performs a recovery operation of the database, not only the backup files but also the recovery log is taken into consideration. After the restoration of the backup files, the recovery log is read and in this way the delta log is restored at file level. Upon table access, the delta log is read and the data becomes available for processing.
A DBMS offers the ability to initially create replica of a non-replicated table. The non-replicated table is copied n times for all required replica. The problematic aspect is when a recovery operation is performed, the copy functionality has to work with backup and recovery functionality in a way that after recovery all data is replicated again. The simple approach to achieve this is to write all copies of the original table into the recovery log of the database. This file is read during recovery and based on its contents, all replicates are being restored. This approach has a negative effect, however, as the tables that are subject for replication are usually very large and writing them n times for n copies into the recovery log causes files that are extremely big.
Therefore, what is needed is a replication mechanism that is substantially transparent to components of the database system. Specifically, what is needed is a replication mechanism that uses other database engine infrastructures (like a partitioning feature) in which the components “think” of replication as a partitioned table—one with just a single partition, in which the single partition is the local replica.