Transaction processing systems, particularly those employed in financial institutions, receive and process thousands of transactions a day. Each of these transactions may require operations to be performed on large amounts of data. As such, data management between communicating systems must be highly reliable. If the processing of a transaction fails due to an outage (e.g., a power loss or server failure), then it may be necessary to return a system to a known state of operation. The process of returning to a known state may be referred to as reconciliation.
It is desirable for transaction processing systems to be enabled with mechanisms for protecting against, as well as for recovering from, loss of data due to unexpected outages. Common mechanisms employed for data protection include, but are not limited to: (1) backup of data made to electronic storage media stored at regular intervals; (2) replication of data to an off-site location, which overcomes the need to restore the data (the corresponding systems then need only be restored or synchronized); and (3) high availability systems configured to keep both the data and system replicated off-site, enabling continuous access to systems and data.
In a disaster recovery context, replication of data may also be referred to as data mirroring. Depending on the technologies used, data mirroring may be performed synchronously, asynchronously, semi-synchronously, or point-in-time. As used herein, the term “asynchronous process” refers to a process that executes in the background and occurs as soon as it can in the background. As used herein, the term “synchronous process” refers to a process that executes directly in line with other processes and does not allow other processes to continue until one or more executable steps (e.g., a put or write) is completed.
Prior art data mirroring executed synchronously (i.e., using one or more synchronous processes) achieves a recovery point objective (RPO) of zero lost data, but may require unacceptably long execution time of a few minutes to perhaps several hours. Prior art data mirroring executed asynchronously (i.e., using one or more asynchronous processes) may achieve an RPO of just a few seconds, but does not guarantee zero data lost.
In high volume transaction processing systems, such as those used by financial institutions, an RPO of even just a few seconds is not acceptable, and may result in the loss of millions of dollars to clients and/or the transaction system provider. In addition, any remedial steps taken, depending upon the volume of data being received, should not add more than about a 50 to 100 milliseconds additional delay to complete a single message process. A person skilled in the art will recognize that a business can tolerate this time increase in the complete message cycle, since the time to transfer a single message is on the order of 250 milliseconds, mostly as a result of long distances between client and server. Also, a 50 to 100 milliseconds additional delay will not have any noticeable effect unless new messages arrive while the current message is still being processed on a specific channel.
Due to the smaller delays introduced by asynchronous mirroring methods, they are more frequently implemented. Unfortunately, prior art disaster recovery systems that employ asynchronous mirroring methods over long distances run the risk of data loss in the event of an outage. A disaster recovery declaration will result in a systems recovery to a point-in-time preceding the actual outage event. This results in a potential loss of data, which can be several seconds or minutes in duration and account for a plurality of transactions. In such circumstances, a receiving transaction processing system may complete a number of transactions and acknowledge their completion back to a requesting system before a disaster recovery system has safe stored all of the transactions. As used herein, the term “safe storing” refers to a transaction message that is received and stored in its original state prior to being processed.
Solutions are needed to account for and reconcile lost transaction messages, as well as to retrieve and process the same. Unfortunately, the widespread use of MQ network messaging technology with its “destructive” read of message traffic creates an environment whereby lost data cannot be re-sent by the sending systems or cannot be retrieved from message queues associated with the transaction processing system. Thus, a disaster recovery system may have no record of the most recent messages processed by the transaction processing system, thereby necessitating a difficult reconciliation process. This presents an unacceptable financial risk to businesses and requires a solution.
Accordingly, there exists a need for a method and system for safe storing transaction messages, data, and acknowledgements over long distances that permits minimal or no loss of data in a disaster recovery scenario.