A typical computer network comprises a plurality of computers linked by a communications network. Applications running on these computers need to share information. One solution is to host a database on one of the computers, and have applications, regardless of their location, read and update this database. This client-server solution works well in many situations, but is less effective when network bandwidth or availability is marginal, when the server cannot accommodate the applications' database access workload, and when the server is an unacceptably risky single point of failure.
An alternative that addresses these issues is to replicate the database on some or all of the computers. On each computer, applications read and update the local database by sending it requests and the database sends a response to each request once it is completed. The challenge in this configuration is to keep the database replicas consistent with each other. One way this can be accomplished is through eager replication in which each update is propagated and applied to every database before confirming completion to the requesting application. Furthermore, updates are applied in the same order to all databases to ensure that the databases remain consistent with each other. Compared to the client-server approach, eager replication provides improved query performance and query availability in the face of network and node failure. However, update performance is actually lower and updates cannot be performed during network or node outages because communication links between nodes are broken. These issues can be addressed using a quorum-based approach, at the cost of poorer query performance and availability. Another way to keep the database replicas consistent is through lazy replication. With lazy replication, each application sends requests to its local database, and receives a confirming response as soon as the request is processed locally. Subsequently, a replication function propagates the local updates to other computers and applies them to those computers' databases. If communications connectivity is temporarily broken, update propagation is delayed but local database request processing continues.
One problem with the state of the art in database replication is that neither eager replication nor existing lazy replication methods adequately addresses reducing communication network resource costs while still maintaining reasonable levels of database synchronization.
Commercial database management products such as Oracle implement lazy replication. Two approaches commonly used by these products are data-level replication and procedure-level replication. With data-level replication, the requests propagated are inserts, updates, and deletes. Request reordering and compression are used. With procedure-level replication, the requests are arbitrary procedures but request reordering and compression are not used. Neither approach attempts to reduce communication network resource costs while still ensuring that information in database replicas meet reasonable synchronization requirements for the defined mission of the database.
One form of database compression, used typically to reduce data storage costs, is known as process data historians. Process data historians use a compression technique called the “swinging door algorithm” that eliminates samples that can be reconstructed with acceptable accuracy through interpolation from surrounding samples. However, interpolation was not defined for arbitrarily complex state spaces, so it is limited in application to simple scalar data such as found in process control systems.
For the reasons stated above and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the specification, there is a need in the art for database replication which reduces communication network resource costs while providing reasonably reliable data.