The present invention relates to system-on-chip (SoC) applications, and more particularly, to processing modules with multilevel cache architecture.
To effectively utilize an on-chip memory and to minimize the performance gap between a high-speed processor and a low-speed off-chip memory, many embedded systems exploit cache resources. There may be several levels of caches in the embedded system, for example, a level-one (L1) cache, a level-two (L2) cache, and even a level-three (L3) cache. The L1 cache is typically closest to the processor for easy access, and often has the same operating speed as that of a processing core circuit of the processor. Due to the cost of such as a high-speed on-chip memory, the size of the L1 cache is very limited, usually ranging from several kilobytes (KBs) to tens of KBs.
Taking an embedded system having two-level cache architecture as an example, when a cache miss of an L1 cache within the embedded system occurs (e.g. when a request from a processor of the system corresponds to a L1 cacheable range, and the requested data corresponding to the request is not in the L1 cache), the L1 cache will ask a L2 cache within the embedded system for the requested data. If the requested data is in the L2 cache, the requested data is sent back to the L1 cache directly. If the requested data is not in the L2 cache, however, the L2 cache has to ask an external memory for the requested data. Based on this conventional architecture, data in the L2 cache must be a superset of that in the L1 cache. In addition, when the L1 cache miss occurs, the latency of the time to obtain the requested data is extended due to the lookup time required by the L2 cache, where the size of the L2 cache typically ranges from several tens of KBs to hundreds of KBs, and the L2 cache has to maintain the coherence of data with respect to the L1 cache. The large size and coherence problem of the L2 cache make this conventional approach costly and complex to design and verify. It is needless to say how expensive and complicated to introduce a L3 cache in an embedded system.
Within an embedded system, a hardware engine (for example, a video or audio engine) may have a private memory to achieve higher performance. However, more private memories within the embedded system, it will increase the cost and testing efforts. In order to prevent these problems, it would be helpful to replace the private memory with the resources of a L2 cache that is utilized for caching data for hardware engines within the embedded system, i.e. the L2 cache is utilized as a working buffer of the hardware engine. However, it is very complicated when the L2 cache can be accessed by DMA circuitry or some other hardware bus masters within the embedded system. Since there will be more than one master accessing the L2 cache, making the cache access more random, it will reduce the effectiveness of the L2 cache by generating more cache misses or by replacing one master's data with another master's data.