In order to improve performance of a central processing unit (CPU) system, a cache structure is generally used to temporarily store recently and frequently used instructions or data. In this way, a memory is not necessarily accessed every time when an instruction is fetched or a data operation is performed, thereby reducing a delay of an operation significantly. A typical CPU system is shown in FIG. 1. A level-1 cache (L1 cache) is closest to a CPU, and its access speed is the highest, but its capacity is generally not large. A level-2 cache (L2 cache) is disposed at the periphery of the L1 cache. An access speed of the L2 cache is a little lower than that of the L1 cache, but its capacity is obviously larger. In a large multi-core CPU system, a level-3 cache (L3 cache) or even a level-4 cache (L4 cache) also exists. A generally known memory is located downstream of a last-level cache (LLC), and an access speed of the memory is much lower than those of various levels of caches, but its capacity is much larger.
Requests from the CPU are mainly classified into two types, where one is an instruction fetch operation, and the other is a data operation. The instruction fetch operation is reflected as a read operation, and the data operation is reflected as a read operation and a write operation. No matter whether the operation is a read operation or a write operation, a request caries address information. The CPU sends the request to the L1 cache first. According to the address information in the request, the L1 cache determines whether requested data exists in the L1 cache. If the requested data exists in the L1 cache, an operation is directly completed in the L1 cache, or if the requested data does not exist in the L1 cache, the L1 cache sends the request to a next-level storage (which may be a next-level cache or the memory).
There are mainly three types of cache structures, direct mapped, fully associative, and set associative. The set associative structure is the most widely applied. A typical set associative cache structure is shown in FIG. 2. Using an example in which a read request is received, a processing process is as follows. First, a set is found according to an index field in address information of the read request, where each set includes several ways, and each way may store requested data. Then, a tag field in an address information of the request is compared with tag information stored in each way, and if the tag field is consistent with the tag information, it means that a way is hit and the requested data is stored in the way, or if the tag field is not consistent with the tag information, it means that a miss occurs, this level of cache does not include the requested data, and the request needs to be sent to a next-level storage.
In the design of a large-capacity cache structure, in order to reduce power consumption, generally, the following manner is used. A tag Random-Access Memory (RAM) is looked up first to determine hit/miss, and then whether to read a data RAM is decided according to a lookup result. In order to improve a throughput, the tag RAM is generally looked up using a pipeline structure. Currently, the following manner is commonly used.
All tag RAMs are used as a pipeline, and a lookup request is received in each clock cycle. When a tag is looked up according to the lookup request, information of all ways in an index is accessed simultaneously, and after the information is read, hit/miss is determined altogether. The pipeline can receive a lookup request in each clock cycle and can work in each clock cycle. The following two problems exist in the foregoing manner when a quantity of ways of a cache increases. (1) With more Tag Rams that need to be accessed concurrently during the lookup, logical complexity of determining hit/miss is higher, which leads to low processing efficiency, and (2) because a quantity of tag RAMs that need to work simultaneously is relatively large, peak power consumption increases.