The present invention relates to a system and method for high level transaction logging mechanism in a fault tolerant, low latency shared system resource, such as a networked file server, and, in particular, a cross-server high level mirrored transaction logging mechanism for use in a multiple server system resource.
A continuing problem in computer systems is in providing secure, fault tolerant resources, such as communications and data storage resources, such that communications between the computer system and clients or users of the computer system are maintained in the event of failure and such that data is not lost and can be recovered or reconstructed without loss in the event of a failure. This problem is particularly severe in networked systems wherein a shared resource, such as a system data storage facility, is typically comprised of one or more system resources, such as file servers, shared among a number of clients and accessed through the system network. A failure in a shared resource, such as in the data storage functions of a file server or in communications between clients of the file server and the client file systems supported by the file server, can result in failure of the entire system. This problem is particularly severe in that the volume of data and communications and the number of data transactions supported by a shared resource such as a file server are significantly greater than within a single client system, resulting in significantly increased complexity in the resource, in the data transactions and in the client/server communications. This increased complexity results in increased probability of failure and increased difficulty in recovering from failures. In addition, the problem is multidimensional in that a failure may occur in any of a number of resource components or related functions, such as in a disk drive, in a control processor, or in the network communications. Also, it is desirable that the shared resource communications and services continue to be available despite failures in one or more components, and that the operations of the resource be preserved and restored for both operations and transactions that have been completed and for operations and transactions that are being executed when a failure occurs.
Considering networked file server systems as a typical example of a shared system resource of the prior art, the filer server systems of the prior art have adopted a number of methods for achieving fault tolerance in client/server communications and in the file transaction functions of the file server, and for data recovery or reconstruction. These methods are typically based upon redundancy, that is, the provision of duplicate system elements and the replacement of a failed element with a duplicate element or the creation of duplicate copies of information to be used in reconstructing lost information.
For example, many systems of the prior art incorporate industry standard RAID technology for the preservation and recovery of data and file transactions, wherein RAID technology is a family of methods for distributing redundant data and error correction information across a redundant array of disk drives. A failed disk drive may be replaced by a redundant drive, and the data in the failed disk may be reconstructed from the redundant data and error correction information. Other systems of the prior art employ multiple, duplicate parallel communications paths or multiple, duplicate parallel processing units, with appropriate switching to switch communications or file transactions from a failed communications path or file processor to an equivalent, parallel path or processor, to enhance the reliability and availability of client/file server communications and client/client file system communications. These methods, however, are costly in system resources, requiring the duplication of essential communication paths and processing paths, and the inclusion of complex administrative and synchronization mechanisms to manage the replacement of failed elements by functioning elements. Also, and while these methods allow services and functions to be continued in the event of failures, and RAID methods, for example, allow the recovery or reconstruction of completed data transactions, that is, transactions that have been committed to stable storage on disk, these methods do not support the reconstruction or recovery of transactions lost due to failures during execution of the transactions.
As a consequence, yet other methods of the prior art utilize information redundancy to allow the recovery and reconstruction of transactions lost due to failures occurring during execution of the transactions. These methods include caching, transaction logging and mirroring wherein caching is the temporary storage of data in memory in the data flow path to and from the stable storage until the data transaction is committed to stable storage by transfer of the data into stable storage, that is, a disk drive, or read from stable storage and transferred to a recipient. Transaction logging, or journaling, temporarily stores information describing a data transaction, that is, the requested file server operation, until the data transaction is committed to stable storage, that is, completed in the file server, and allows lost data transactions to be re-constructed or re-executed from the stored information. Mirroring, in turn, is often used in conjunction with caching or transaction logging and is essentially the storing of a copy of the contents of a cache or transaction log in, for example, the memory or stable storage space of a separate processor as the cache or transaction log entries are generated in the file processor.
Caching, transaction logging and mirroring, however, are often unsatisfactory because they are often costly in system resources and require complex administrative and synchronization operations and mechanisms to manage the caching, transaction logging and mirroring functions and subsequent transaction recovery operations, and significantly increase the file server latency, that is, the time required to complete a file transaction. It must also be noted that caching and transaction logging are vulnerable to failures in the processors in which the caching and logging mechanisms reside and that while mirroring is a solution to the problem of loss of the cache or transaction log contents, mirroring otherwise suffers from the same disadvantages as caching or transaction logging. These problems are compounded in that caching and, in particular, transaction logging and mirroring, require the storing of significant volumes of information while transaction logging and the re-construction or re-execution of logged file transactions requires the implementation and execution of complex algorithms to analyze, replay and roll back the transaction log to re-construct the file transactions. These problems are compounded still further in that these methods are typically implemented at the lower levels of file server functionality, where each data transaction is executed as a large number of detailed, complex file system operations. As a consequence, the volume of information to be extracted and stored and the number and complexity of operations required to extract and store the data or data transactions and to recover and reconstruct the data or data transactions operations is significantly increased.
Again, these methods are costly in system resources and require complex administrative and synchronization mechanisms to manage the methods and, because of the cost in system resources, the degree of redundancy that can be provided by these methods is limited, so that the systems often cannot deal with multiple sources of failure. For example, a system may provide duplicate parallel processor units or communications paths for certain functions, but the occurrence of failures in both processor units or communications paths will result in total loss of the system. In addition, these methods of the prior art for ensuring communications and data preservation and recovery typically operate in isolation from one another, and in separate levels or sub-systems. For this reason, the methods generally do not operate cooperatively or in combination, may operate in conflict with one another, and cannot deal with multiple failures or combinations of failures or failures requiring a combination of methods to overcome. Some systems of the prior art attempt to solve this problem, but this typically requires the use of a central, master coordination mechanism or sub-system and related complex administrative and synchronization mechanisms to achieve cooperative operation and to avoid conflict between the fault handling mechanisms, which is again costly in system resources and is in itself a source of failures.
The present invention provides a solution to these and other related problems of the prior art.
The present invention is directed to a high level transaction logging mechanism for use in a fault tolerant, low latency, shared system resource, such as a networked file server, and, in a preferred embodiment, a high level, cross server transaction mirror logging mechanism.
According to the present invention, a system resource includes a resource subsystem for performing low level system resource operations and a control/processing sub-system that includes a first blade processor. The first blade processor includes a first system processor performing high level system resource operations including transforming system resource requests from clients into corresponding low level system resource operations. A first transaction logging mechanism includes a first log generator for extracting high level system resource operation information relating to each system resource request directed to the first blade processor and a first transaction log for storing the high level system resource operation information. The first log generator is responsive to the restoration of operation of the system resource after a failure of system resource operations in the first blade processor for reading the high level system resource operation information relating to each system resource request directed to the first blade processor from the transaction log and restoring the state of execution of system resource requests directed to the first blade processor.
In the presently preferred embodiment, the high level system resource operation information relating to each system resource request directed to the first blade processor is extracted before the corresponding system resource is completed by the first system resource processor and a client system resource request is acknowledged as accepted by the system resource after the high level system resource operation information is stored in the first transaction log.
In further embodiments of the present invention, the first transaction logging mechanism further includes a first transaction log mirroring mechanism located separately from the first blade processor and communicating with the first log generator for receiving and storing mirror copies of the high level system resource operation information relating to each system resource request directed to the first blade processor. The first transaction log mirroring mechanism is responsive to the restoration of operation of the system resource after a failure of system resource operations in the first blade processor for reading the high level system resource operation information relating to each system resource request directed to the first blade processor from the first transaction log mirroring mechanism and restoring the state of execution of system resource requests directed to the first blade processor.
In still further preferred embodiments of the present invention, the control/processing sub-system further includes a second blade processor operating in parallel with the first blade processor that includes a second system resource processor performing high level system resource operations including transforming system resource requests from clients to the second blade processor into corresponding low level system resource operations and a second transaction logging mechanism. The second logging mechanism includes second log generator for extracting high level system resource operation information relating to each system resource request directed to the second blade processor and a second transaction log for storing the high level system resource operation information relating to each system resource request directed to the second blade processor. The second log generator is responsive to the restoration of operation of the system resource after a failure of system resource operations in the second blade processor for reading the high level system resource operation information from the transaction log and restoring the state of execution of system resources directed to the second blade processor and represented in the second transaction log. The preferred embodiment further includes a first transaction log mirroring mechanism residing in the second blade processor and communicating with the first log generator for receiving and storing copies of the high level system resource operation information relating to each system resource request directed to the first blade processor. The first transaction log mirroring mechanism is responsive to the restoration of operation of the system resource after a failure of system resource operations in the first blade processor for reading the high level system resource operation information relating to each system resource request directed to the first blade processor from the first transaction log mirroring mechanism and restoring the state of execution of system resources directed to the first blade processor. The first transaction logging mechanism, in turn, further includes a second transaction log mirroring mechanism residing in the first blade processor and communicating with the second log generator for receiving and storing copies of the high level system resource operation information relating to each system resource request directed to the second blade processor. The second transaction log mirroring mechanism is responsive to the restoration of operation of the system resource after a failure of system resource operations in the second blade mechanism for reading the high level system resource operation information relating to each system resource request directed to the second blade processor from the second transaction log mirroring mechanism and restoring the state of execution of system resource requests directed to the second blade processor.