Caching schemes have been employed by hardware designers to reduce access times by a Central Processor Unit (CPU) to main memory, and hence, increase system performance. In many computing systems, main memory consists of a large array of memory devices with speeds which are slow relative to processor speeds. During accesses to main memory, the processor is forced to insert additional wait states to accommodate the slower memory devices. System performance during memory accesses can be enhanced with a cache. Smaller in size than main memory and significantly faster, the cache provides fast local storage for data and instruction code which is frequently used by the processor. In computing systems with caches, memory operations by the processor are first transacted with the cache. The slower main memory is only accessed by the processor if the memory operation cannot be completed with the cache. In general, the processor has a high probability of fulfilling a majority of its memory operations with the cache. Consequently, in computing systems which employ a cache, effective memory access times between a processor and relatively slow main memory can be reduced.
Caches can be highly optimized according to a number of different features. One important feature which affects cache performance and design complexity is the handling of writes by the processor or an alternate bus master. Because two copies of a particular piece of data or instruction code can exist, one in main memory and a duplicate copy in the cache, writes to either main memory or the cache can result in an incoherence between the two storage systems.
For example, specific data is stored in a predetermined address in both the cache and main memory. During a processor read to the predetermined address, the processor first checks the contents of the cache for the data. Finding the data in the cache, the processor proceeds to read the data in the cache at the predetermined address. In systems with an alternate bus master, Direct Memory Access (DMA) writes to main memory by the alternate bus master modify data in main memory but not the cache. The cache and main memory may be incoherent.
During a DMA write operation, incoherency between the cache and main memory can be handled with bus ‘snooping’ or monitoring, instructions executed by the operating system, or combinations thereof. In a “write-through” and a “write-back” cache, bus snooping invalidates cache entries which become “stale” or inconsistent with main memory following the DMA write operation. Additionally, cache PUSH and INVALIDATE instructions can be executed by the operating system prior to the DMA write operation, to WRITE “dirty” or altered data out to main memory, and to invalidate the contents of the entire cache. Since only a single copy of data exists in main memory following the instructions, the DMA write to main memory will not present the problem of possibly “stale” data in the cache.
The implementation of bus snooping is expensive in view of the complexity of the snooping logic, to space requirement of the logic and the power consumption. In particular, space requirement and power consumption are subject to design constraints with respect to system-on-chips to be used in embedded applications. Executing cache PUSH and INVALIDATE instructions at a processor unit prior to the DMA write operation increases load at the processor unit, increases complexity of operating system/applications, is error prone and difficult to debug, each representing primary issues in the field of embedded applications.
For instance, in the field of vision processing large amounts of data is written to a continuous memory space in the main memory in a very structured and time bound manner. The large amounts of data is typically written to the memory using bus mastering or DMA write operations. Any copies are inconsistent with the data stored within the continuous memory space in the main memory.
Hence, there is a need for a solution to prevent cache and main memory incoherency in systems with an alternate bus master overcoming the aforementioned drawbacks.