Increased processor and bandwidth demands continue to drive advances in central processing units (CPUs) and associated memory devices. A CPU typically includes a die, or microchips, which contain multiple processing units, communications hardware, and a local networking or communications bus. The core logic chipsets (cores) are the components that make up the processor die. The cores comprise the central processing logic of a computing system. A system's core logic typically includes a controller for handling memory functions, a cache for storing instructions, the logic for bus interfaces, and the functions of data paths. A single die can contain hundreds of processor cores. In increasing the number of cores, computer performance also increases, as does the need for more memory. For efficiency considerations, the memory-to-processor core ratio must stay relatively constant. That is, as more processors are added, memory must be proportionally added.
The need for higher memory to processor-core ratios is further driven by advances in virtualization. Virtualization makes it possible to run multiple operating systems and multiple applications on the same computer at the same time, increasing the utilization and flexibility of hardware. In one respect, virtualization allows the transformation of hardware into software, including the CPU, RAM, hard disk and network controller, to create a fully functional virtual machine that can run its own operating system and applications just like a physical computer. Virtualization is advantageous because it allows for server consolidation and increased processor accessibility. And thus, virtualization is driving the need for even higher memory to processor-core ratios, and higher memory capacity on servers.
The increased processing afforded by virtualization requires the addition of memory to maintain the required ratio. For speed considerations, the preferred way to add memory is to attach main memory directly to the processor. Performance is increased with data being stored directly in main memory, as opposed to slower, remote memory, e.g., memory on a disk. However, attaching memory directly to the processor typically imposes a limitation on the total amount of available bandwidth and memory. Attached memory may be inadequate for applications requiring larger bandwidth and/or memory capacities. Bandwidth is the amount of data that can be carried from one point to another in a given time period.
Memory compression is sometimes used to optimize available memory. Using compression, data may be encoded (represented as symbols) to take up less space. Memory compression effectively expands memory capacity up to two or more times for some applications without increasing actual physical memory and associated expenses. Despite its benefits, however, memory compression typically requires more memory bandwidth than is available in conventional attached memory, as well as compression logic.
Memory compression is often measured in terms of its associated compression ratio. The compression ratio is the quotient of memory space required by uncompressed data relative to the smaller amount of memory space required by compressed data. As data changes in main memory, the compression ratio can also change. When the compression ratio decreases, more physical memory is required. As a result, some needed physical memory must be vacated to accommodate changing data having a small compression ratio. This practice requires interaction with the operating system, taxing overall system processes. It can prove difficult to obtain the needed support from the operating systems to efficiently accomplish memory compression.
Caching is another common technique used to speed memory processes. A cache memory is smaller, faster and typically more expensive than main memory. When a CPU requests data that resides in main memory, the processing system transmits the requested data to the processor, and also may store the data in a cache memory. When the processor issues a subsequent request for the same data, the processing system first checks cache memory. If requested data resides in the cache, the system gets a cache “hit” and delivers the data to the processor from the cache. If the data is not resident in the cache, a cache “miss” occurs, and the system retrieves the data from main memory. Frequently utilized data thus is retrieved more rapidly than less frequently requested data, and overall data access latency, i.e. time between a request for data and delivery of the data, is reduced.
In the context of memory compression, caching becomes more complex and requires additional bandwidth. Memory compression may require additional bandwidth to support the overhead of accessing an optional cache mapped into main memory that holds uncompressed data, and for accommodating a cache miss. Additional bandwidth may also be needed for fetching information from a compressed data table entry that points to where compressed data is stored, as well as for accessing the actual data that is to be uncompressed. Likewise, on a write, a large amount of bandwidth may be needed to store data into the cache or to main memory.
In associative mapping, instead of hard-allocating cache lines to particular memory locations, it is possible to design the cache so that any line can store the contents of any memory location. A cache line is the smallest unit of memory than can be transferred between the main memory and the cache. Associativity improves performance by, in part, enabling multiple concurrent accesses to portions of memory.
Relatively large amounts of bandwidth are needed to support associativity, however. On some processor memory architectures, for instance, the x86, there is not enough memory bandwidth to support a cache operations, memory compression and associativity. Moreover, server consolidation, transaction databases, and engineering design automation efforts require relatively large memory capacities and bandwidth than can be conventionally supported by systems having processors with directly attached memory. The inability to support these and other memory optimizing processes relegates manufacturers to using other, less efficient forms of memory access and lower performance.
Consequently, what is needed is an improved manner of managing memory in a system comprising a processor with directly attached memory.