The prior art teaches methods and means for maintaining the continuation of program execution after a processor failure while executing certain types of instructions. Other prior techniques have enabled recovery of some instruction execution failures. Such prior techniques have always had the limitation of not being able to handle all types of processor failures under all circumstances. Shared data resources are found in all business activity, and business is predicated on maintaining the integity of business data in a consistent manner. For example, what if the stock market were to accidentally start off a trading day using a version of some stock prices other than the closing version of the prior day? Or what if a version of the expense records of a business other than the last version at the end of the year were used for calculating income taxes? The dire consequences of failure to maintain data integrity are endless.
As far as is known, one type of problem which has never been adaquately addressed in the prior art is the catastrophic consequences which can occur when shared data loses integrity due to processor failure while the processor is changing the shared data. Although processor failure is not a common occurrence today, it should be apparent that failure when a processor has only partly changed data in a shared computer resource may leave the resource in an unknown data state, which could render the data unreliable and result in dire consequences. This problem does not appear to have been adaquately addressed in computer design in the past, perhaps due to the fact that the circumstances of data contamination occurrence, and how to prevent, recover from, and generally maintain computer operation in a way that can prevent data contamination, have not been adaquately understood.
Shared data integrity is the environment for the subject invention's effort to maintain data integity under failing processor circumstances. Maintaining non-shared data integrity under failing processor circumstances is better known, and is a much less complex subject.
A general method of instruction recovery from a failed processor is taught in U.S. Pat. No. 5,214,652 to A. Sutton entitled "Alternate Processor Continuation of Task of Failed Processor". Non-shared data can clearly be handled by this patent's alternate processor task continuation method. This patent teaches how a service processor of a computer system may request an alternate processor to continue execution of a program which was being executed by a processor which failed. Before the method in that patent assigns an alternate processor to continue program execution, a service processor of the system processes a signal from the failing processor for indicating the type of error condition occurring for the instruction in execution during the processor failure to enable the service processor to determine if the program can have its execution continued by an alternate processor. For the program to continue, the instruction in execution during processor failure had to be a retryable instruction. If it was not a retryable instruction, the alternate processor method could not be used, and the program processing was ended. Prior serializing instructions generally were not of the type which could be retried due to the fact that they generally required locking of a resource, and the locked state of the resource may be unknown to the processor, or the locking process did not allow an alternate processor to change the state of the lock on a resource held by a failed processor.
The subject invention herein deals with maintaining integrity of shared data resources during processor failure where the failure occurs while the shared resources are being changed with instructions of a type which could not be handled by the alternate processor method taught in the known prior art.
No prior art is known to use blocking symbols for serializing access to shared resources, and therefore no art is known for maintaining data integrity through the eventuality of processor failure during execution of instructions using blocking symbols to serialize access to the shared resources. Known methods of recovery from processor failure may not operate correctly to perform recovery from failure of locked instructions using blocking symbols.