The present invention relates to a data storage system, and more particularly, to a block-based storage system with low latency.
Non-volatile or persistent solid state memory, such as NAND Flash, is progressively replacing hard disk drives (HDDs) as the data storage device for computer systems. Unlike a random access memory (RAM) in which each byte can be independently accessed, a NAND Flash memory is divided into a series of blocks, which are the smallest erasable units in a NAND Flash device.
The NAND Flash memory, in the form of solid state drive (SSD), provides a drop-in replacement for slower HDDs by connecting to computer systems via Serial ATA (SATA) or Serial Attached SCSI (SAS) interface originally designed for HDDs. As the performance of the NAND Flash memory improves, however, the high latency of the SATA and SAS interfaces becomes a bottle neck, prompting the data storage industry to adopt potentially faster interfaces, such as peripheral component interconnect express (PCIe).
To further reduce the latency and increase the bandwidth of the interface, a persistent solid state memory like NAND Flash may, in principle, be connected to one or more central processing units (CPUs) via a system memory bus. Such implementation, however, faces several issues as will be discussed later.
FIG. 1 shows a block diagram of a conventional computer subsystem 48 that includes multiple CPUs 50A and 50B and byte-addressable non-persistent memory (ByNPM) modules 56A/B/C-70A/B/C that reside in dual in-line memory module (DIMM) slots of a mother board. Unlike persistent memory, volatile or non-persistent memory, such as dynamic random access memory (DRAM), loses data stored therein when power is interrupted. The computer subsystem 48 of FIG. 1 includes two CPUs 50A and 50B communicating with each other through a high speed link or interconnect 54. Each of the CPUs 50A and 50B includes a respective one of memory controllers 52A and 52B, through which the CPUs 50A and 50B communicate with and manage the ByNPM modules 56A/B/C-70A/B/C. The memory controller 52A of the CPU 50A communicates with 12 ByNPM modules 56A/B/C-62A/B/C via four memory channels 72-78. Each of the four channels 72-78 is connected to respective three of the ByNPM modules 56A/B/C-62A/B/C. Similarly, the memory controller 52B of the CPU 50B communicates with 12 ByNPM modules 64A/B/C-70A/B/C via four memory channels 80-86. Each of the four channels 80-86 is connected to respective three of the ByNPM modules 64A/B/C-70A/B/C.
Most commercially available CPUs may support enough DIMMs for system memory but cannot support enough persistent memory via DIMM slots for data storage applications. For example, Xenon® E5-26XX processors manufactured by Intel Corporation, a family of CPUs that are commonly used in computer servers, can each support a maximum of only 12 DIMMs per CPU. However, the memory speed drops to 1600 MHz from 2133 MHz if all 12 DIMM slots are populated. Moreover, a Xenon® E5-26XX processor with 4 memory channels can only support a maximum of 8 logical ranks per channel. This means that only two quad-rank DIMMs may be used with each channel, which further limits the number of physical DIMMs that can be used with each CPU.
Information relevant to attempts to address these problems can be found in U.S. Pat. Nos. 8,185,685 and 9,158,636 and U.S. Patent Application Publication Nos. 2013/0086311 and 2014/0101370. However, each one of these references suffers from one or more of the following disadvantages: inadequate storage capacity and high latency.
For the foregoing reasons, there is a need for a persistent memory module or system that can communicate with a CPU via the memory bus to reduce latency while having enough capacity for data storage.