This invention is applicable to data processing systems with multi-level memory where the second level (L2) memory used for both unified (code and instructions) level two cache and flat (L2 SRAM) memory used to hold critical data and instructions. The second level memory (L2) is used for multiple purposes including unified instruction and data level two cache, directly addressable SRAM memory used to hold critical data and code accessible by both external and internal direct memory access (DMA) units.
When the level one data cache controller is granted access to the level one data cache, this access could force an existing line to be evicted. The CPU can also force the level one data cache to evict lines though the block writeback operation. At the same time, the level two cache could be receiving a DMA access to the same line. This situation could break coherency, if DMA data were committed incorrectly. This could occur by writing to the level two memory then overwriting that data with the level one cache victim. This could also occur by sending the DMA data as a snoop write to the level one data cache. This forces the level one data cache to write the DMA data to its cache after the victim has been evicted. This effectively, drops the DMA write. Thus when a victim is in progress, a DMA write sent as snoop could miss the victim.
The current trend in SoC design now is always more: more cores and more memory. As processors are added more low-latency, high-bandwidth memory is needed to keep the processors fed with data. Providing this low-latency, high-bandwidth memory and an efficient infrastructure to access it is key to a well-performing device.
Several issues arise. Large efficient memories typically only have a single read/write port. The memory controller needs an arbiter to grant access quickly and fairly to ensure smooth operation of the whole system. In many cases the processors are not working on the same data set. Providing a way for different cores to access different sections of memory in parallel leads to large performance gains. Most modern processors implement local caches to increase performance and power efficiency. When the processors are working on the same data set or nearby data sets, some hand shaking is necessary to avoid corruption and ensure a consistent, coherent view of memory for all processors. At deep sub-micron geometries memory cells are very susceptible to soft errors which could lead to incorrect calculations or crashes. Minimizing the fatal failure rate in these memories is very important. Memory protection and virtualization of hardware resources is becoming extremely important for software reuse and security sensitive applications. Many modern processors also implement some form of speculative fetching/execution which can increase the per-core bandwidth requirement. If one core is highly speculative and floods the system with these accesses it could stall out other cores real accesses.