In a relational database, index is the most common way to speed-up a query to the database. The indexes are associated with tables. Once an index is associated with a table, the operation on the index (either a read or a write) is implicit. There is no need for the user to have the cognizance of an index.
For recovery and replication purpose, generally RDBMS generates a log which describes each operation. This log is known as a redo log or a transactional log or a simply log. In many cases this log is physical in nature. Whenever a write operation happens on the database, the log records this write operation. This is accomplished by the “log writer process”. These logs include: the updates on the tables (the data segment) and the updates on the indexes associated with the tables (the index segments).
FIG. 1 is a schematic diagram showing an example of standard operation inside a RDBMS storage engine. As shown in FIG. 1, a simple table with two indexes is considered. When a record is inserted into the table, the database process will write to the data segment and the two index segments, while logs for each operation are generated.
On the other hand, for requirements of high availability (HA), load balancing databases use replication as a standard methodology. The most common form of replication implementation is “log shipping”.
FIG. 2 is a schematic diagram showing an example of log shipping between a master database node and a subscriber database node. As shown in FIG. 2, in this solution, the designate master database node continuously ships its logs (such as redo logs) to the subscriber database node. The subscriber database node applies these logs on the database to synchronize with the master database node. The shipped logs include updates on the data segments and the index segments.
For purposes of seamless switch-over (or fail-over), it is desirable that the schemas in the master database node and the subscriber database node are kept the same. Most of HA systems in communication domain and enterprise domains use a master database node which is Read-Write and one or more subscriber database nodes which are Read-Only.
Nowadays, log shipping from the master database node to the subscriber database node is a normal feature for all database replications. Specialized replication solutions like GoldenGate (such as in Oracle) can synchronize databases in heterogeneous topologies. A similar solution is offered by using a translation middleware in [reference 1]. These solutions use logical replication based on SQL statements which are quite slow compared to the synchronization based on log shipping.
[Reference 2] has proposed an invention that: the data, which is shipped from the master database node, is re-interpreted in the subscriber database nodes semantically, and then is applied to the databases. This solution potentially can reduce the size of log which is being shipped from the master database node.
[Reference 3] has proposed an invention that: the master database node ships a log to a standby database node. The standby database node has a mapped table to the master DB's table (known as the federated table mapping). The change is applied to the federated table.
However, all the above said solutions have the following two weaknesses: (1) the re-interpretation is done in the subscriber database node and the master database node has no cognizance of this fact, which does not help to reduce the master's load; (2) as the two nodes are semantically not equal, switch-over is going to be very costly.
[Reference 4] offers an invention which can reduce the load of the master database node and the subscriber database nodes. This invention can also reduce the memory consumption on the master database node. In this innovation, the subscriber database node has a set of its own indexes which are generated by the subscriber database node on receiving the logs from the master database node. On switch-over these indexes are dropped by new master database node. This solution has the advantage of easy switch-over and memory optimization on the master database node.
However this solution has 3 issues: (1) it does not have a mechanism to reduce redundant indexes on the subscriber database nodes; (2) it requires the subscriber database node to identify which indexes need to be populated additionally, and which cannot be typically accomplished in the storage engine layer alone; (3) it puts the onus index creation on various nodes based on roles to the administrator.
[reference 1] “Middleware-based database replication: the gaps between theory and practice”, Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Pages 739-752.
[reference 2] WIPO Patent Application WO/2007/037984 (also U.S. Pat. No. 8,290,910), titled “SEMANTIC REPLICATION”.
[reference 3] WIPO Patent Application WO/2010/073110, titled “SELECTIVE DATABASE REPLICATION”.
[reference 4] US Patent Application 20110320404 titled “STANDBY INDEX IN PHYSICAL DATA REPLICATION”.