1. Field of the Invention
The present invention provides improvements to the memory structure and utilization of computer systems. “Computer memory” as used herein holds data temporarily during information processing for fast data access. The term “memory” generally identifies data on ICs or memory chips. The word “storage” is generally used for data that exists non-dynamically on tapes or disks.
2. Discussion of the Related Art
Most modern computer memories are hierarchical and comprise multiple levels of caches of fast but more expensive memory and a slower main memory (DRAM). The computer will then use a storage disk for long term, secondary storage. Roughly speaking, disk access is 100 times slower than main memory access, and main memory access is 10 to 50 times slower than cache access with current technology.
Processor and network technologies are evolving rapidly. The growth in the number of transistors in each processor is rapidly increasing processor capabilities. Innovations in utilizing theoretically unlimited bandwidth of fiber optics can be helpful in reducing the network latency.
However, the incompatibilities among processor and memory performance are fueling an increasing performance gap between peak performance and sustained performance of computers. Moreover, power consumption increases faster than processor and memory performance. Increased power consumption has become a major obstacle for modern computing, from high-end supercomputers to small electronic devices.
In recent years, memory bandwidth has become a major bottleneck to full utilization of the capacity of processors and network capabilities. Following the so-called Moore's law, processor speed continues to double roughly every 18 months. Network interconnect speeds are also increasing to hundreds of Gbps and the latency is being reduced to a few nanoseconds. In contrast, main memory (DRAM) speed and its bandwidth have not increased enough to catch up with the processor speeds. This performance gap has been increasing for the last 20 years and is becoming a bottleneck for performance.
Advanced hierarchical memories that include cache memories at various levels have been developed to bridge this gap. A cache memory works on the principle of spatial and temporal locality. However, there are many applications that lack locality in accessing the memory. These applications spend a major fraction of execution time waiting for data accesses.
Power requirements of computing devices are also increasing with the computing power and functionality. The current improvement of chip performance depends on the rising number of transistors. On the positive side, this enables the increase in the cache size and more levels of cache and translation look-aside buffer. But this also rapidly increases the power demands caused by numerous chips in a computing device. The increasing power requirements can be understood by comparing the consumption of Intel PENTIUM 4 (75 watts) to Intel ITANIUM (130 watts) processors.
There are writings known in the art which use the term “memory server” in various contexts. All writings known to the applicant merely try to exploit improving data access time by substituting remote memory for local disk access without removing data management burdens from the main CPU. The focus of these so-called “servers” is to provide space for data, and not to fetch data to other processing elements.