1. Field of the Invention
The present invention relates to computer networks, and more specifically to providing a memory management technique for implementing a high performance stable storage system in a computer network.
2. Background
Service availability and fault tolerance have become important features in building reliable communication systems, networking systems, and reliable distributed transactions. In such an environment, it is extremely important to provide continuous, non-interrupted service to end users. Further, in the event of a service process crash, it is essential that the crashed process be restarted as quickly as possible so that end users do not experience any disruption in service. Conventionally, in order to restart a crashed service process quickly, and reinitialize it correctly, certain critical application data relating to the service process is stored in a stable storage system which is part of the system or network on which the service process is running.
A stable storage system is an abstraction of a perfect storage system which is able to survive certain system failures such as, for example, communication failures, application failures, operating system failures, processor failures, etc. In order to ensure the integrity of key system data and to avoid data inconsistency (caused, for example, by a process crash which occurs in the middle of a system operation), client processes or applications store key data within a stable storage system so that the data can be rolled back to a previously consistent state if a failure occurs. Typically, a stable storage system provides atomic read and write operations to stored data, and keeps the data intact even when failures occur. For example, in a router system, the network state information such as, for example, Forwarding Information Base (FIB), reservation state, and multi-cast group information are stored in stable storage systems in order to restore packet forwarding processes quickly in the event of a process crash.
Traditional stable storage systems use either a file system-based storage system, or a reliable RAM disk. Examples of the traditional file system-based storage systems include the well known Andrew File System RVM and LibFT, from Lucent Technologies of Murray Hill, N.J. An example of a conventional file system-based stable storage system is shown in FIG. 1 of the drawings.
As shown in FIG. 1, stable storage system 104 comprises a block of xe2x80x9cnon-volatilexe2x80x9d memory, such as, for example, a hard drive. Typically, the stable storage system 104 is configured to operate independently of the network operating system in order to preserve the data within the stable storage system in the event of an operation system or network crash. A plurality of clients 102, which represent various applications or processes running on the network, write and/or read essential data to and from stable storage system 104. The data is stored within a plurality of data files 110. An access manager 106 manages the data which is sent to and retrieved from data files 110. Additionally, the access manager manages a plurality of log or back up files 108 which are used for tracking data which is written to the data files 110.
While the file-system based approach to stable storage may be resilient to process and OS failures, this approach imposes high performance penalties because of multiple inter-process buffer copying and disk I/O latency. Further, the file system based stable storage system does not support fast, incremental updating of state fragment data such as, for example, FIB entries or TCP message sequence numbers. Rather, conventional disk storage systems support sequential access of data which is managed using sector allocations. To update data within a data file 110, the new data is typically appended to the end of the data file. In order to retrieve this new data appended to the end of the data file, the entire data file must be accessed sequentially. This slow method of accessing and updating data is undesirable, particularly in systems such as embedded systems (e.g., router systems) where fast recovery is essential to avoiding service disruption.
Alternatively, RAM disks may be used for implementing for stable storage systems. However, the reliable RAM disk approach is undesirable for most commercial systems since it requires installation and support of additional hardware.
While conventional file-system based stable storage systems may be used for storage of essential application data, there exists a continual heed to provide improved techniques for implementing stable storage in conventional computer systems or networks.
According to specific embodiments of the invention, a technique is provided for implementing a high performance stable storage system which provides stable and fast storage services to applications built on top of one or more operating system (OS) kernels in a computer network.
According to a specific embodiment of the invention, a unique high performance stable storage hierarchy is provided comprising two levels. A set of byte-addressable stable memory regions (SMRs) forms the first level stable storage. Each stable storage memory region (SMR) includes a data structure for storing desired or essential data related to one or more client processes. The SMR is configured to provide an access interface which supports atomic access to its data structure. The SMR is also configured to be resilient to application failures. For example, if a client dies or crashes, the data contained within the SMR will still be accessible to other applications or network components. Further, the SMR is configured to be byte addressable, and configured to share at least a portion of its address space with at least one client process. The term xe2x80x9cbyte-addressablexe2x80x9d refers to the capability of a client to write data into an SMR data buffer directly using pointers (instead of appending logs to a file like in traditional stable storage systems).
The second level of the high performance stable storage hierarchy includes any traditional file system based stable storage system which is configured to communicate with one or more SMRs. The data contained in an SMR can be flushed into (and loaded from) the second level stable storage device atomically upon request.
In accordance with a specific embodiment of the present invention, the plurality of SMRs form a high performance fault resilient xe2x80x9ccachexe2x80x9d layer to traditional file system based stable storage systems. On most platforms where processors and operating systems are much more reliable than applications, this layer itself can boost the system availability considerably without the typical performance penalties incurred by traditional stable storage systems. This performance gain is especially important for applications which perform fast incremental state updates for small transactions.
An alternate embodiment of the present invention provides a data storage system implemented in a computer system. The computer system includes an operating system and at least one CPU. The data storage system includes at least one SMR managed by the operating system. The SMR includes at least one first data structure for storing data related to a client process. Further, the SMR is configured or designed to support atomic access of data within the first data structure. The SMR is also configured or designed to support incremental updating of client process data within the first data structure. The incremental updating is implemented using a pointer-based data transfer mechanism. The SMR is further configured or designed to allow at least one other client process to access data within the SMR data structure. The SMR may also include a memory based semaphore for providing exclusive access to the SMR when desired (e.g. when writing data to the SMR data structure). The SMR may also include a reference counter for keeping track of the number of client processes accessing the SMR.
An additional aspect of the above-described data storage system provides a system manager configured or designed to monitor the client processes and to detect client failures. The data storage system may also include a recovery agent configured or designed to perform recovery of a failed client process by utilizing at least one SMR. The SMR may be further configured or designed to share at least a portion of memory address space with the operating system, the recovery agent, and/or at least one client process.
An alternative embodiment of the present invention provides a file system-based stable storage system in a computer system. The computer system includes an operating system and at least one CPU. The stable storage system includes one SMR having at least one data structure for storing data related to a client process. The SMR is configured to support atomic access of data within the first data structure, and is configured to share at least a portion of memory address space with the client process and the operating system. The SMR may also be configured to support direct memory to memory transfer of data from the client heap to the SMR data structure or vice-versa. The storage system may also include non-volatile memory configured to communicate with the SMR. The non-volatile memory includes a second data structure which is configured or designed to support atomic access of data within the second data structure.
Alternate embodiments of the present invention provide a method and computer program product for storing data in a computer network. The network includes at least one operating system and at least one stable storage memory region (SMR) managed by the operating system. The SMR includes at least one data structure. The network also includes non-volatile memory having at least one second data structure configured to communicate with the SMR. Client application data in the first data structure is accessed atomically. Once accessed, the client data is then atomically transferred between the first data structure and the second structure. An additional aspect of this embodiment includes incrementally updating a portion of state fragment data within the SMR data structure by performing a direct memory to memory copy of state fragment data from a client heap to the SMR. Where the above-described aspect is implemented as a computer program product, the computer program product will include a computer readable media having computer readable code embodied therein for implementing the above-described process.
Further embodiments of the present invention provide a method and computer program product for implementing stable storage in a computer system. The computer system includes at least one operating system and includes non-volatile memory. The non-volatile memory has at least one second data structure for storing data related to a client application. At least one stable storage memory region (SMR) is provided within the main memory of the computer system for storing the client application data. The SMR is managed by the operating system and includes at least one first data structure. Further, the SMR is configured to share at least a portion of memory address space with the operating system and with the client process. Also, the SMR is configured to communicate with the non-volatile memory. The client application data within the SMR data structure is accessed atomically. The SMR may be configured to provide concurrent access of the client application data to at least one other client. Further, the SMR may be configured to be resilient to a client crash whereby the SMR and data therein are able to survive the client crash. The SMR may be used to recover client data in response to detection of a client crash. Further, consistent client data may be retrieved from the second data structure into the first data structure for use by a client application. Additionally, client application data may be flushed from the SMR to the non-volatile memory in response to a request or signal initiated by the client. Where this embodiment is implemented as a computer program product, the computer program product will include a computer readable media having computer readable code embodied therein for implementing the above-described process.
Additional embodiments of the present invention provide a method and computer program product for implementing a memory management system in a computer network. The network includes at least one operating system and at least one SMR managed by the operating system. The SMR includes at least one data structure. The network includes non-volatile memory configured to communicate with the SMR, wherein the non-volatile memory has at least one second data structure. When a client crash is detected, an appropriate SMR which is utilized by the crashed client is located. Consistent client data is then atomically transferred from the second data structure to the first data structure to thereby allow a restarted client application to quickly reinitialize itself using the consistent application data retrieved into the SMR. Where this embodiment is implemented as a computer program product, the computer program product will include a computer readable media having computer readable code embodied therein for implementing the above-described process.
Additional features and advantages of the present invention will become apparent from the following description of its preferred embodiments, which descriptions should be taken in conjunction with the accompanying drawings.