In a computer cluster environment, it may be desirable to start an application package on one computer and, if the start encounters an error, then attempt to start the application package on a second computer in the cluster. Attempting to start the application package on a second computer may result in success where the first attempt failed. However, if the start fails on the first computer, traces of the start attempt need to be removed from the first computer before attempting a start on the second computer. Otherwise, the traces of the start attempt left on the first computer may be in contention with a completed start on the second computer. This is an undesirable condition.
FIG. 1 depicts an example of a prior art computer cluster environment in which an application package startup procedure (startup) may be attempted on one computer and, if the startup procedure fails, the application package may be moved to another computer. Cluster environment 100 includes computer 1 104 and computer 2 106. Two computers are shown for exemplary purposes only. The cluster could have more computers, for example, sixteen computers. The computer 104 and the computer 106 are interconnected via switches 108 and 110. Therefore, the computers 104 and 106 see the same resources. In this example, the switches 108 and 110 allow the computers 104 and 106 to see shared storage 112 and 114. Application clients 102 are connected to the computers 104 and 106 by a Local Area Network (LAN).
In an exemplary scenario, a customer would like to start an application package. The application package could be, for example, an ORACLE® database with processing software. The customer would like to start the package on the computer 104, and if it does not start, they would like to be able to stop the package on the computer 104 and attempt to start the package on the computer 106. To accomplish this, traces of the startup attempt on the computer 104 need to be removed. Otherwise, the traces of the startup attempt left on the computer 104 may be in contention with a successful startup on the computer 106.
FIGS. 2 and 3 depict an exemplary prior art process for undoing a failed startup attempt. FIG. 2 depicts an exemplary startup procedure 200 including, for example, over two hundred steps. In the example, the startup procedure 200 is implemented as a script. The script may, for example, call various functions at various points in time, such as functions 202, 204, 206, and 208. Parts of the script may be recursive, for example. In the example, the script exists as one large unbroken startup procedure 200. To facilitate discussion, an error is encountered, for example at step 210, as shown in FIG. 2. In response to the error, the startup procedure 200 stops, and an error number, in this case error #37 for example, is generated.
In FIG. 3, the error #37 is fed into a stop script 300 to attempt to undo the startup procedure 200. Because the start script 200 is a large, continuous series of steps, some of which may call on external functions and others of which may be recursive, stop script 300 needs to take these complicated occurrences into account in resolving the error, e.g., resolving error #37. Accordingly, the creation of a stop script 300 that can handle all errors generated when the start script is executed in the prior art manner is a difficult and complex task. Yet, such a stop script is required in the prior art if a failed start attempt is to be properly handled.