Over the years improvements in input/output (I/O) performance have not kept pace with those of processor or memory performance. As a result, computing systems are becoming largely I/O limited. Typical storage devices attached to computing nodes are hard disk drives (HDD) with spinning disk media and are attached to the computing nodes through I/O channels (e.g., PCI-express, etc.) as a local disk storage or through a storage area network (SAN) as a shared disk storage. The HDD based storage systems have the following disadvantages:
(a) Long disk access latencies (e.g., on the order of milliseconds).
(b) Long I/O bus and interface latencies.
(c) Inefficient latency overhead for small size data access causing HDD storage to be unsuitable for random access of small data elements.
(d) Limited capacity of HDD-integrated DRAM/SRAM caches due to space and power constraints.
Paging is a method in a computer operating system to store and retrieve data from secondary storage for use in main memory. In particular, the operating system retrieves data from the secondary storage in same size blocks called pages. Paging is an important part of virtual memory implementation in most contemporary general-purpose operating systems, allowing them to use disk storage for data that does not fit into physical main memory. High performance computing (HPC) applications exhibit a wide range of memory access patterns from sequential to completely random accesses. As a result, the memory working sets also vary widely with different applications. From a virtual memory paging perspective, the memory access patterns can be classified as:
(i) Sequential/random access patterns over a memory working set that does not exceed the capacity of the main memory (DRAM).
(ii) Random access pattern over a large working set (i.e., a large number of memory pages) that exceed the capacity of the main memory (DRAM).
Application workloads of category (i) benefit from high speed access to memory (e.g., to avoid long stalls) but do not require high speed paging devices. Application workloads of category (ii) benefit from high speed paging devices more than from fast memory. Generally speaking, application workloads of category (ii) are usually executed using a truncated datasets to avoid random access pattern exceeding the capacity of the main memory (DRAM) and the undesirable result of paging to long latency storage device.
Flash memory is a non-volatile computer memory that can be electrically erased and rewritten in large blocks. HyperTransport is a processor interconnection technology with bidirectional serial/parallel high-bandwidth, low-latency point-to-point links and is promoted and developed by the HyperTransport Consortium. The technology is used by various vendors, for example in the form of !HyperTransport® (!HyperTransport® is a registered trademark of Advanced Micro Devices, Sunnyvale, Calif.). The Intel® QuickPath Interconnect (QPI) (Intel® is a registered trademark of Intel Corporation, Santa Clara, Calif.) is a point-to-point processor interconnect developed by Intel to compete with HyperTransport. Prior to the announcement of the name, Intel referred to it as Common System Interface (CSI). Earlier incarnations were known as YAP (Yet Another Protocol) and YAP+.