High-performance processors often access the same data multiple ways, such as through a cache and through direct memory access (DMA). A cache is typically used for irregular control, such as deciding how to encode the next frame of video. DMA is typically used for high-performance computation, such as encoding the next frame.
Accessing the same memory multiple ways presents synchronization requirements. For example, any data stored in the cache must be flushed before it can be accessed with DMA. Any DMA operations, which execute asynchronously from the control thread for performance reasons, must complete before the data can be accessed through the cache.
Existing solutions typically depend on explicit function calls to flush the cache, wait for a DMA call to complete, or perform other synchronization tasks. Unfortunately, an application will compile and run if the synchronization calls are omitted. The missing calls result in hard to diagnose bugs. For example, a programmer might write code that initializes an array using a cached pointer, then loads part of the array using DMA, while forgetting an intervening call to flush the cache. The code would compile and run. However, the DMA access would not see the cached changes, resulting in incorrect results from otherwise correct code.
Further, synchronization with respect to asynchronous DMA is hard to express. It may be oversimplified such that a DMA function call blocks (does not return until the DMA is done) or requires a barrier later (which waits for all DMA calls, not just the needed call). Either simplification diminishes performance. Alternatively, the synchronization may be expressed in a complicated manner in which the user has to track and indicate which DMA operations to wait for.