1. Field of the Invention
The invention relates to a technique, specifically a method, apparatus, and article of manufacture that implements the method, to process an error when a write fails using write-to-operator-with-reply in a ported application.
2. Description of the Related Art
In a computer system, data is typically stored in files in a persistent storage medium such as a hard disk drive. Typically, a file is allocated a predetermined amount of file space for storing data on the hard disk drive. An operating system, and more particularly, a file system within the operating system, typically manages the file space for the files. When new data is to be written to a file, the operating system determines whether the file has sufficient allocated file space to store the new data in that file. If there is sufficient file space, the operating system writes the new data to the file. If the operating system determines that insufficient file space is available for storing the new data in the file, the operating system generates an out-of-space error.
An application program (also referred to as an application) executes on a computer system to perform certain tasks. Applications are typically programmed to execute on a particular operating system. To write data to a file, applications issue a file write which invokes the operating system.
For example, in one mainframe operating system environment, a z/OS® (Registered trademark of International Business Machines Corporation) environment, when an out-of-space error occurs, one application, the IBM® DB2® (IBM and DB2 are registered trademarks of International Business Machines Corporation) database management system, generates an error message and uses Checkpoint/Restart to control recovery processing. Checkpoint/Restart saves the state of an executing application in persistent storage, so that the database management system can be restarted from the point at which it was saved.
In another operating system environment, an AIX® (Registered trademark of International Business Machines Corporation) environment, if the file write fails, the operating system generates an error. In another application that executes in the AIX environment, for example, another database management system, and more particularly, the IBM DB2 Online Analytical Processing (OLAP) Server, if the error is an out-of-space error, the application may terminate. Since the application is a database management system, the termination may leave the data in an inconsistent state. To restore the data to a consistent state prior to when the out-of-space error occurred, a system administrator allocates additional file space for the file associated with the out-of-space error. The system administrator then reloads the data to restore the database's file(s) to the point where the out-of-space error occurred. Restoring files can result in a long outage during which the database management system cannot be used. Outages are undesirable because a business may not be able to process transactions using data that is stored in the database management system.
The term “platform” refers to the operating system and/or the type of computer system on which an application executes. Examples of platforms include, and are not limited to, the WINDOWS® (Registered Trademark of Microsoft Corporation) operating system, the AIX operating system, the UNIX® (UNIX is a Registered Trademark of The Open Group in the United States and other countries) operating system, and the z/OS operating system. The term “port” refers to transferring a computer program, such as an application, from one platform to another.
An application may be ported to an operating system that executes on a mainframe computer to take advantage of the increased processing power. Because an application executing on a mainframe may be able to process even more transactions than the same application executing on the computer from which it was ported, application termination and the associated outage from an out-of-space error is even more undesirable.
One solution may be to add Checkpoint/Restart to the ported application to control recovery processing for out-of-space conditions. However, adding Checkpoint/Restart to the ported application may result in substantial architectural changes to the ported application, and implementation could be time-consuming and error-prone.
Therefore, there is a need for a method, apparatus and article of manufacture implementing the method, for a technique for improved error processing when a write fails in an application that has been ported. In addition, the technique should avoid substantial architectural changes to the ported application. In particular, the technique should improve error processing for an out-of-space error in the ported application.