By using a time or space locality of a processor executing programs, a traditional cache memory temporarily stores the latest and/or frequently executed instructions and data in the cache memory close to a processor unit. When it is required to access a certain instruction or data, the cache memory will be accessed first, and then a next-level memory of low speed and large memory space will be accessed if the cache memory is missing.
FIG. 1 shows a typical cache structure. As shown in FIG. 1, since instructions and data loaded into a cache are updated in real time according to dynamic executions of programs, the processor core will first search a tag matrix in the cache to confirm whether the required instructions or data are in the cache. Once cache miss, the tag search and data comparisons are invalid, and then the next-level memory will be accessed, which results in a waste of a plurality of execution cycles of the processor and a waste of the power consumption of the cache. To increase the hit rate of the cache, a multilevel cache structure for example of set association, complex replacement operation, perfecting, predicting read and hierarchy are used. Of course, such performance improvements are dependent entirely on increasing hardware complexity and chip area overhead. Since the cache structure shown in FIG. 1 is a well known cache, the functions and principles of respective parts will not explained in detail here.
Another disadvantage of cache is that access delays for hit and miss are completely different, it is difficult to predict the delays for accessing the cache, and Tightly Coupled Memory (TCM) is introduced in many occasions. TCM is a Static Random Access Memory (SRAM) close to the core of a processor, and characterized in its high speed and fixed delay. The contents of TCM cannot be replaced in real time, and the TCM has a fixed and small capacity. Refreshing the TCM is dependent completely on software scheduling. Before refreshing the TCM, software will find out when to perform the refreshing and performing corresponding configuration operations, during which the TCM is inaccessible. Those factors limit the application of TCM.
Content Addressable Memory (CAM) is a dedicated memory. As a general module, it is incapable to make full use of its performance for some specific application scenarios. Further, comparing all the memory entries with input entries in parallel will lead to highly complicated and expensive hardware.
Therefore, it is difficult to improve performance by relying entirely on hardware complexity, power consumption or software intervention. Further, the fine granularity of processor execution and memory access (as per instructions), resources are fixedly categorized and divided, which is inefficient and wastes the memory resources of a system. With the close combination of software and hardware, it is possible to perform flexible and intelligence processing according to the program execution and features of data structure, the performance can be improved greatly and the performance, power consumption, cost and the like are more balanced.