Many challenges exit in the development of new and improved solutions to address the ever increasing processing demands in current, as well as future, server/computer usage models. One of these challenges is the perceived “memory wall” where memory bandwidth is unable to keep up with the rising compute bandwidth. For highly parallel applications, it is relatively simple to increase the compute bandwidth and efficiency by increasing the number of cores used and/or offloading certain tasks to highly specialized cores. However, these approaches do not scale well when it comes to increasing the memory bandwidth because designing a coherent interconnect and memory hierarchy that is able to keep up with the ever increasing compute bandwidth poses several challenges.
First, increasing the computing cores puts pressure on overall die size. Many of the current high performance computing (HPC) and integrated Graphics central processing unit (CPU) designs are either already at the reticle limit or unable to increase die size due to cost constraints. This leaves very little on-die physical space available for implementing the coherent interconnect buffers and queues necessary to support an increase in memory bandwidth. Second, many of the current CPU designs are already significantly power-challenged as they are. Couple that with a strong desire to allocate more power to the compute elements rather than to the fabric, it is evident that the key to increase memory bandwidth lies in the smarter and more efficient use of existing memory resources in current architecture designs rather than trying to create more of them.