Traditionally, computers have stored their data in either memory or on other input/output (I/O) storage devices such as magnetic tape or disk. I/O storage devices can be attached to a system through an I/O bus such as a PCI (originally named Peripheral Component Interconnect), or through a network such as Fiber Channel, Infiniband, ServerNet, or Ethernet. I/O storage devices are typically slow, with access times of more than one millisecond. They utilize special I/O protocols such as small computer systems interface (SCSI) protocol or transmission control protocol/internet protocol (TCP/IP), and they typically operate as block exchange devices (e.g., data is read or written in fixed size blocks of data). A feature of these types of storage I/O devices is that they are persistent such that when they lose power or are re-started they retain the information stored on them previously. In addition, networked I/O storage devices can be accessed from multiple processors through shared I/O networks, even after some processors have failed.
System memory is generally connected to a processor through a system bus where such memory is relatively fast with guaranteed access times measured in tens of nanoseconds. Moreover, system memory can be directly accessed with byte-level granularity. System memory, however, is normally volatile such that its contents are lost if power is lost or if a system embodying such memory is restarted. Also, system memory is usually within the same fault domain as a processor such that if a processor fails the attached memory also fails and may no longer be accessed.
Therefore, it is desirable to have an alternative to these technologies which provides the persistence and durability of storage I/O with the speed and byte-grained access of system memory. Further, it is desirable to have a remote direct memory access (RDMA) capable network in order to allow a plurality of client processes operating on multiple processors to share memory, and therefore provide the fault-tolerance characteristics of networked RDMA memory.
Prior art systems have used battery-backed dynamic random access memory (BBDRAM), solid-state disks, and network-attached volatile memory. Prior direct-attached BBDRAM, for example, may have some performance advantages over true persistent memory. However, they are not globally accessible, so that the direct-attached BBDRAM lies within the same fault domain as an attached CPU. Therefore, direct-attached BBDRAM will be rendered inaccessible in the event of a CPU failure or operating system crash. Accordingly, direct-attached BBDRAM is often used in situations where all system memory is persistent so that the system may be restarted quickly after a power failure or reboot. BBDRAM is still volatile during long power outages such that alternate means must be provided to store its contents before batteries drain. RDMA attachment of BBDRAM is not known to exist. Importantly, this use of direct-attached BBDRAM is very restrictive and not amenable for use in network-attached persistent memory applications, for example.
Battery-backed solid-state disks (BBSSDs) have been proposed for other implementations. These BBSSDs provide persistent memory, but functionally they emulate a disk drive. An important disadvantage of this approach is the additional latency associated with access to these devices through I/O adapters. This latency is inherent in the block-oriented and file-oriented storage models used by disks and, in turn, BBSSDs. They run through a sub-optimal data path wherein the operating system is not bypassed. While it is possible to modify solid-state disks to eliminate some shortcomings, inherent latency cannot be eliminated because performance is limited by the I/O protocols and their associated device drivers. As with direct-attached BBDRAM, additional technologies are required for dealing with loss of power for extended periods of time.
It is therefore desirable to provide memory that is persistent (not volatile) either through extended periods of power loss or past an operating system crash. Moreover, it is desirable to locate all or part of such memory remotely (i.e., outside the fault domain of failing processors) so that it is robust to processor failures. It is further desirable to provide (remote) access to such persistent memory over a system area network (SAN) where it can be efficiently accessed by many processors, although not necessarily at the same time. With such persistent memory, improved computer systems can be implemented.