High-performance processors often access the same data multiple ways, such as through a cache and through direct memory access (DMA). A cache is typically used for irregular control, such as deciding how to encode the next frame of video. DMA is typically used for high-performance computation, such as encoding the next frame.
Accessing the same memory multiple ways presents synchronization issues. For example, any data stored in the cache must be flushed before it can be accessed with DMA. Likewise, any DMA operations, which for performance reasons execute asynchronously from the control thread, must complete before the data can be accessed through the cache.
Previous synchronization attempts have depended on either explicitly calling functions to flush the cache, waiting for a DMA call to complete, or performing other synchronization steps. Unfortunately, with all of these previous attempts, an application will compile and run even if the synchronization attempts are omitted. The missing calls result in hard to diagnose bugs. For example, a programmer might write code that when executed, initializes an array using a cached pointer, and then loads part of the array using DMA. This code would compile and run, even if the programmer forgot an intervening call to flush the cache. However, the DMA would not see the cached changes, resulting in incorrect results from otherwise correct code.
Further, synchronization with respect to asynchronous DMA is hard to express. It may be oversimplified such that a DMA function call blocks (does not return until the DMA is done) or expressed as a barrier layer (which waits for all DMA calls, not just the needed call). Either simplification diminishes performance. Alternatively, the synchronization may be expressed in a complicated manner in which the user has to track and indicate which DMA operations to wait for.