This invention relates to the processing of messages relating to computer databases, and, more particularly, to protecting database processing messages during a termination and restart.
Availability is a key element of many computer systems, requiring that the systems be kept operating as much as possible with as little downtime as possible. In many businesses, system unavailability can have an immediate and lasting adverse impact on the business. Should a fatal error develop in most computer systems, the computer system must be stopped and rebooted. All in-process activities are lost, together with the in-process data. The data that exists typically is in the state of the last time data was saved before the processes that were stopped began, but there may be no assurance of the exact state of the data. Thus, in order to meet these mission-critical requirements, the system must be stable and keep running without loss of data.
One example of computer systems that must be kept operating as much as possible comprises servers which support a large number of clients in the mission-critical area and their access to the data they require to perform their jobs. The data is often maintained in disk drives, such as RAID systems, for short term access, and in data storage libraries for longer term access. For the systems to operate efficiently, both the disk drives and the data storage libraries must be kept operating on a continuous basis.
RAID systems provide a means of safeguarding the data by effectively duplicating the data such that one copy of the data will be available even if one of the disk drives fails. The incorporated Day III et al. application discloses data storage library systems in which redundant copies of data volumes are stored in two (or more) similar libraries. To save time, the data is first stored in one library and then copied to the other library. When the data is recalled, the most current copy must be accessed. The most current copy is ascertained by means of synchronization tokens which are directly associated with each data volume in each of the libraries. Thus, when a data volume is to be accessed, the synchronization tokens for the data volume in the libraries are accessed and compared to determine the most current. Then, the data volume is accessed from the library directly associated with the most current of the synchronization tokens.
Should an error occur in the synchronization token handling, the library controller managing, updating and comparing the synchronization tokens must be stopped and rebooted, thereby stopping the entire data handling process. The messages that were received for conducting the processing of the tokens may then be lost, and it may require inventories of both libraries in order to find the prior update status of each data volume, so that the most recent messages could be reviewed. Further, the messages that were lost would have to be reconstituted from the original source, which may not be possible. Simply finding a process to save all of the token messages for a period of time so that the tokens could be reconstituted may prove risky in that, if one of the messages caused the fatal error, again conducting the processing action of the message may again cause a fatal error, and an infinite retry loop may result.
The same problem exists with regard to processing any database that has a high degree of criticality and that may result in a fatal error.
An object of the present invention is to protect database processing messages so as to allow a termination and restart without requiring a system-wide reboot.
Disclosed are a computer message processor and a method which may be implemented in a computer program product for protecting database processing messages during a termination and restart, and a data storage library implementing the invention to protect synchronization token processing.
An in-process message queue is coupled to a message reader and receives a copy of each read input message. A completion response detector monitors a valid completion response message to each read input message, and, upon detecting the valid completion response message, deletes the copied input message from the in-process message queue. Upon a termination and restart, a restart processor operates the message reader to read the input messages in the in-process message queue, and deletes the copied input message from the in-process message queue. The deletion thereby prevents any re-accessing of the copied input message upon a second restart, preventing a loop.
The ease of termination and restart allows a special restart of the affected process of the database, rather than a system-wide termination and reboot.
The startup processor first operates the message reader to read all of the input messages from the in-process message queue and, only upon reading all the saved input messages from the in-process message queue, reading the new input messages.
For a fuller understanding of the present invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings.