Determining the end-of-log is a condition necessary to correctly perform the recovery of a database. Normally, database management system preserves state in checkpoints that establish the point in the transaction log from where the recovery processing must begin. But, the end of the log is determined by reading the log itself. Traditional methods for finding the end-of-log are based on two premises. First, there is a capability in most database management systems to discover that a log record is not well-formed. This not-well formed log detection marks the last complete log record. The last well-formed log records is deemed the end-of-log. Second, there is a maximum size of I/O operations that the database can expect. So, by formatting the tail of the log using this maximum I/O size in a manner that invalidates the log buffers, the system is assured of being able to find an invalid log buffer should a new crash occur and the need to find the end-of-log arise.
In mirroring systems, the mirror is processing the log being sent to it by the principal. In addition, the principal and the mirror establish and update the “end of log of interest”. For example, the mirror can tell the principal how far in the log it has committed to disk and vice versa. Given the intercommunication technologies available and the processing of shipping buffers containing log records, it is possible for the mirror to be ahead of the principal in committing the log.
When transactional consistency is desired between the principal and the mirror, there is additional coordination of log hardening (saving to disk) that occurs. In particular, before declaring that a transaction has committed, the principal has to receive from the mirror that it has hardened the log through the records for the transaction being committed. Long-running transactions produce many log records before they commit. The processing of transactional statements that manage bulk data produces large volumes of log records before the transaction commits. Thus, the mirror can have an arbitrary large amount of log records that are beyond what would be useful in case of a crash and failover. So, in case of mirroring, when the system fails over to the mirror, the log may contain an arbitrary log of records that are not of interest.
FIG. 1 depicts a principal log 10 and a mirror log 20 at a time of failover. The principal log 10 contains committed data 30 and transactional log data 40. The mirror log naturally contains the same committed data 30a and some uncommitted transactional log data sent to the mirror from the principal log 10. The mirror 20 steadily commits data as it is sent from the principal as described above. At a failover from the primary log 10 to the mirror log 20, a last committed transaction 55 in the mirror becomes the point at which the mirror can accept new transactions. However, there exists uncommitted transactions 40 in the principal that need to be discarded in an organized manner if the principal is to act as a backup for the mirror database.
For purposes of example, assume a mirroring database scenario with a principal sending log records to the mirror. Here, the principal is allowed to operate in a mode where the mirror is trailing on a best-effort manner to accept transactions and commit them. In this mode there is no guarantee that the mirror is processing the log at a point, a logical sequence number (LSN), which is close to where the principal is processing. In other words, the LSN of the log record being processed by the mirror can be much smaller than the LSN of the last log record being produced by the principal. In one example, the mirror may be offline for a period of time and then reconnect. Thus the processing of log records between the principal and mirror can vary widely. Throughout this period the principal continues operating undisturbed. It should be clear that in this style of circumstance the amount of log records at the principal may be substantially longer than those at the mirror. Even when full transactional consistency is required between the mirror and the server, a long running transaction may also provoke this same circumstance of log records.
In one embodiment, a “forced failover” may be initiated. In such a forced failover the principal fails and the mirror database becomes the principal, even though the mirror can be way behind in processing the transaction log. When such a request arrives in a principal (System P) having a log arbitrarily larger than the log present in the mirror (System M), data loss may occur. But, from the perspective of the database, the only challenge is to provide the required internal consistency that is expected. At the mirror database one may apply the traditional method to find the end-of-log to complete recovery and accept new transactions. System M can do this using the traditional methods.
Some time later, system P reconnects to system M. System P starts acting as the mirror and must detect how far ahead of the old mirror it was, undo the changes falling in that time range, and then beginning synching the log to catch up with the old mirror/new principal, system M. The system M will have started serving the database as of some point specified by a LSN, LSN-Fail. The system P will know that it was in-step with the mirror up to some LSN, LSN-PLow, up to which it can trust the log as being identical to system M's log. This LSN will be less than or equal to LSN-Fail. The system P will also detect the end of its log by traditional scanning, the end of log being some LSN, LSN-PHigh. The log range between LSN-PLow and LSN-PHigh is subject to the aforementioned undo and subsequently must be discarded. This discard is where problems with the traditional method arise.
If system P were to begin receiving log blocks from system M and simply overlaying the existing discarded log, the log would not be scanable via the traditional method as the end would not be detectable. In one instance, the log would potentially look corrupt, or to complicate things further, through lucky block alignment, the log might be scanable yet not internally consistent. Thus, a problem exists in properly dealing with unusable log records.