Increased processor demands continue to drive advances in central processing units (CPUs), bandwidth and associated memory devices. A CPU typically includes a die, or microchips, which contain multiple processing units, communications hardware, and a local networking or communications bus. The core logic chipsets (cores) are the components that make up the processor die. The cores comprise the central processing logic of a computing system. A system's core logic typically includes a controller for handling memory functions, a cache for storing instructions, the logic for bus interfaces, and the functions of data paths. A single die can contain hundreds of processor cores. In increasing the number of cores, computer performance also increases, as does the need for more memory. For efficiency considerations, the memory-to-processor core ratio must stay relatively constant. That is, as more processors are added, memory must be proportionally added.
The need for higher memory to processor-core ratios is further driven by advances in virtualization. Virtualization makes it possible to run multiple operating systems and multiple applications on the same computer at the same time, increasing the utilization and flexibility of hardware. In one respect, virtualization allows the transformation of hardware into software, including the CPU, RAM, hard disk and network controller, to create a fully functional virtual machine that can run its own operating system and applications just like a physical computer. Virtualization is advantageous because it allows for server consolidation and increased processor accessibility. And thus, virtualization is driving the need for even higher memory to processor-core ratios, and higher memory capacity on servers.
The increased processing afforded by virtualization requires the addition of memory to maintain the required ratio. For speed considerations, the preferred way to add memory is to attach main memory directly to the CPU. Performance is increased with data being stored directly in main memory, as opposed to slower, remote memory, e.g., memory on a disk. However, attaching memory directly to the CPU typically imposes a limitation on the total amount of available memory. Attached memory may be inadequate for applications requiring larger memory capacities.
Caching is commonly used to speed memory processes. A cache memory is smaller, faster and typically more expensive than main memory. When a CPU requests data that resides in main memory, the processing system transmits the requested data to the processor and also may store the data in a cache memory. When the processor issues a subsequent request for the same data, the processing system first checks cache memory. If requested data resides in the cache, the system gets a cache “hit” and delivers the data to the processor from the cache. If the data is not resident in the cache, a cache “miss” occurs, and the system retrieves the data from main memory. Frequently utilized data thus is retrieved more rapidly than less frequently requested data, and overall data access latency, i.e. time between a request for data and delivery of the data, is reduced.
In associative mapping, instead of hard-allocating cache lines to particular memory locations, it is possible to design the cache so that any line can store the contents of any memory location. A cache line is the smallest unit of memory than can be transferred between the main memory and the cache. Associativity improves performance by, in part, enabling multiple concurrent accesses to portions of memory.
Relatively large amounts of bandwidth are needed to support associativity, however. On some processor memory architectures, for instance, the x86, there is not enough memory bandwidth (the amount of data that can be carried from one point to another in a given time period) to support a cache with associativity. The inability to support cache access with associativity relegates manufacturers to using other, less efficient forms of memory access and lower performance.
Consequently, what is needed is an improved manner of managing memory in a system comprising a processor with directly attached memory.