Modern general purpose processors often access main memory (typically implemented as dynamic random access memory, or “DRAM”) through a hierarchy of one or more caches (e.g., L1 and L2 caches). Relative to main memory, caches (typically static random access memory, or “SRAM”, based) return data more quickly, but use more area and power. Memory accesses by general purpose processors usually display high temporal and spatial locality. Caches capitalize on this locality by fetching data from main memory in larger chunks than requested (spatial locality) and holding onto the data for a period of time even after the processor has used that data (temporal locality). This behavior often allows requests to be served very rapidly from cache, rather than more slowly from DRAM. Caches also generally can satisfy a much higher read/write load (for higher throughput) than main memory so previous accesses are less likely to be queued and slow current accesses.
Computational workloads like networking and graphics are often performed better on special purpose processors designed specifically for the given workload. Examples of such special purpose processors include network processors and graphics accelerators. In general these special purpose processors are placed outside of the general purpose processor's caching hierarchy, often on a Peripheral Component Interconnect (PCI) or Accelerated Graphics Port (AGP).
Memory accesses by the special purpose processor therefore involve only main memory, not the cache of the general purpose processor. Moving data between the general purpose processor and the special purpose processor often requires both a main memory write and a main memory read, so such a transfer can proceed at only DRAM speeds.