1. Field of the Invention
The present invention relates to computer file systems, and more particularly to methods and systems for providing transactional semantics to file system operations in parallel processing systems.
2. Background
Computational speeds of single processor computers have advanced tremendously over the past three decades. However, many fields require computational capacity that exceeds even the fastest single processor computer. An example is in data warehousing, where data volumes are so large that even the simplest operations may take days to complete on the fastest available uniprocessor computer. Accordingly, a variety of "parallel processing" systems have been developed to handle such problems. For purposes of this discussion, parallel processing systems include any configuration of computer systems using multiple central processing units (CPUs), either local (e.g., multiprocessor systems such as SMP computers), or locally distributed (e.g., multiple processors coupled as clusters or MPPs), or remotely distributed (e.g., multiple processors coupled via LAN or WAN networks), or any combination thereof.
Complex data processing applications running on parallel processing systems typically make changes to multiple external collections of data (files, databases, etc.), usually by a combination of file manipulation operations (such as creating, deleting and renaming) and data manipulation operations (such as reading and writing). Such applications do this by running one or more programs either concurrently or sequentially. If a failure occurs, partial changes may have been made to the external collections of data, which render that data unusable by the current application or other applications until corrected. Typically, this will happen if a file has been modified, deleted, or created in the course of a computation. In parallel processing systems, the problem is intensified since the collection of data will often be spread over many different nodes and storage units (e.g., magnetic disks), making the work required to "roll back" the state of the data increase proportionately with the number of storage units.
To recover from such failures, it is necessary to shut down the current (i.e., failed) application, and then either:
(1) undo all changes made by the application since its start (a "fall rollback"), or PA1 (2) restore the state of the system to an intermediate "checkpoint" and restart execution from that point (a "partial rollback"). PA1 1. the transaction master requests that the transaction be committed (i.e., made permanent); or PA1 2. the transaction master requests that the transaction be rolled back (i.e., undone); or PA1 3. if the transaction master exits without performing either action (e.g., due to a software failure), the system generates a roll back request on behalf of the transaction master. PA1 (a) a Perform routine including instructions for providing the functional equivalent of one of the family's corresponding native operations while preserving information necessary to roll back such native operations; PA1 (b) a Finalize (commit) routine including instructions for committing the result of the corresponding perform routine; and PA1 (c) an Undo (roll back) routine including instructions for rolling back the result of the corresponding perform routine.
One familiar method for dealing with such situations is to restore the prior state of the system manually, by removing junk files, for example. A shortcoming of such solutions is that they may require the participation of individuals familiar with the operation of the failed program, and that there is no guarantee that the proper corrective action will be taken. The problems are magnified in parallel processing systems, where a computation may be spread over multiple processors and the computation may have reached different points on different machines. In such cases, manual intervention may be required for every program participating in the computation and a single error may prevent the computation from being restarted successfully.
Another familiar solution to such problems is to build applications using databases and database management systems that allow groups of operations to be grouped into transactions. All operations in a transaction may either be committed (that is, made permanent) as a group, or rolled back (that is, undone) as a group. In addition, an integrity preserving protocol, such as the well-known "two-phase commit" protocol, may be used to ensure that, in cases where multiple machines are used to run the application in a parallel processing system, transactions may be spread across multiple machines with the database management system ensuring that either all operations on all machines are committed or all operations on all machines are rolled back.
Unfortunately, many applications are not written using databases and even those that do may still access and modify data stored in files using conventional file system operations. Accordingly, there is a need for a method by which operations on files may participate in an integrity preserving protocol, such that either all changes made to the file system by those applications are committed or all such changes are rolled back.
Referring to FIG. 1a through FIG. 1c, transaction processing in database systems is often implemented using some variant on the following method. First, a central transaction master, whose actions are shown to the left of the vertical separator, issues a start transaction command (Step 20). The transaction master then sends messages to various agents (Step 22), causing the recipient agents to perform actions (Step 24). Such messages normally include an identifier for the transaction and a variety of methods are available for the sending of messages, such as remote procedure calls (RPCs) and inter-process communication mechanisms. Agents may, at various times, send messages back to the master (to confirm completion of an action, for example). Whenever an agent performs an action that modifies any permanently stored data (e.g., a disk file) (Step 26), the agent uses a reliable atomic procedure to append a log entry to a log, typically a log file (Step 28). The agent may also, and optionally, save additional auxiliary information to supplement the log file entry, so that it is possible to undo the effects of the action (Step 30). The log entry serves as a note that, in the event of a failure, the action must be undone. The log entry typically contains the identifier of the transaction under which the operation was performed.
Eventually, one of the following will take place:
In cases 1 and 2, the decision is logged for use in the event of a failure of some kind.
To commit a transaction (Step 32), the transaction master engages (Step 34) a commit protocol, such as the two-phase commit protocol, to be executed by the agents (Step 36). If this protocol succeeds, commit processing continues (proceeds) (Step 40); otherwise, a roll back command is generated and the transaction is rolled back (Step 46). If the commit proceeds, each agent steps through its log (Step 40), and for each action belonging to the transaction being committed, a finalize routine may be invoked which will have the effect of erasing the auxiliary information which was saved to enable roll back processing (Step 42). When all log entries have been processed, the commit operation is complete (Step 44).
To roll a transaction back (Step 46), the transaction master broadcasts a roll back message (including the identifier for the transaction) to all agents that participated in the transaction (Step 48). Each agent steps through its log (Step 50), and for each operation belonging to the transaction being rolled back, an undo routine may be invoked which will have the effect of undoing the effects of the original operation (Step 52). When all log entries have been processed, the roll back operation is complete (Step 54).
To deal with a system failure, a recovery utility may be used (Step 56). For every transaction active in the system (Step 58), the recovery utility examines bookkeeping information produced by the master-agent commit protocol (Step 62) to determine whether a decision had been made to commit the transaction (Step 64). If so, commit processing is begun or resumed, as appropriate, at Step 40, where each agent steps through its own log and any necessary finalize routines are executed. Otherwise, the roll back process at Step 46 is performed for the transaction, where each agent steps through its own log and any necessary undo routines are executed.
A problem of the prior art is that no generalized method exists for applying the techniques of database transaction processing to non-database applications such that manual restoration of file state is substantially reduced or eliminated if a failure occurs. Accordingly, there is a need for a method and system that applies transaction techniques to file system operations in non-database applications executing on parallel processing systems. The present invention provides such a method and system.