In von-Neumann machines a central processor (CPU=central processing unit or GPU=graphics processing unit) may employ several mechanisms to overcome the so called “Memory Wall”, which is a term to denote the growing performance gap between ever faster processors and comparably slower memory technologies. These mechanisms are in particular focused on tolerating longer access latencies of the main memory system in order to minimize the time that the processor's execution units are stalled. Or in other words: to maximize the utilization of the execution unit(s).
One of the most important feature of these mechanisms is the use of a memory hierarchy comprising multiple levels of fast caches. Other mechanisms include support for out-of-order execution of instructions and multi-threading which both allow to continue processing with different instructions and/or threads when certain instructions or threads have been stalled while waiting for data to arrive from the memory system.
Another example of a mechanism to reduce the (average) access latency is a prefetching of data from the memory system.
In recent years, Field Programmable Gate Array (FPGA) technology continued to grow in importance as one of multiple programmable off-the-shelf accelerator technologies that can be used to improve performance and optimize power for selected application domains.