In a data processing system, data and system control structures may be shared between several programs running on a single central processing complex (CPC), or shared between several CPC's using a shared facility.
Commands are communicated over a link to the shared facility through channel apparatus. The channel expects a response to the request from the shared facility resulting from execution of the command. If a response is not received within some predetermined time, or the channel detects signal errors on the link, it will notify the program of the error condition. At this point the program must recover the failed command and free resources that are held for the command. If the command is still in execution at the shared facility after the error is presented, the program is faced with significant difficulties in completing the recovery action.
The shared facility provides a program controlled command execution processor which accesses the shared control and data structures. The shared storage is comprised of system storage for system-wide or global control structures, and storage for CPC-program created data and list structures. All of these structures can be shared among programs in one CPC, or among plural CPC's. Commands are received over a plurality of links. Link buffers are provided to receive commands and/or data, and store responses for transfer over the link to a CPC and/or program. When the shared facility interconnects a plurality of CPC's, a system complex (Sysplex) is created to form a single system image from all of the autonomous CPC's.
Consider the situation where a program has obtained a lock to serialize a shared data item X. After the serialization has been obtained, the program attempts to update the contents of X in the shared facility by issuing a command to write X to the shared storage and store new values for X in its existing location. However, an error is presented to the program while the command is still executing. Recovery for the command releases the serialization to make the data available for other programs.
A second program running on a different CPC obtains the serialization for X. Once serialization is obtained, the program assumes that it will have a consistent and unchanging view of the data item X. The program may wish to read X, update X, or even delete X. In each case, the continuing execution of the previously failed command may cause problems. For instance, two successive reads of X may see different values if a store occurs between the read operations. The program would see this as an error since it owns the serialization for the data. Another problem would occur if the program attempted an update of X by reading X, updating X in main storage and then writing X back to the shared facility. A subsequent store by the previous command could cause the update to be lost. Finally, if the second program chose to delete X, re-execution of the failed command may restore an old version of X after the delete had occurred. In each case, correct actions by the second program would be construed as errors.
It is therefore very important that a function and system be provided in the shared facility that maintains consistency of data or control structures. A program that initiates an action in the shared facility must be able to determine whether a command was received, received and completed, or received but aborted. The program must eventually receive the results of the action, or determine that the action must be requested again.