Field of the Invention
The invention relates in general to a cache management technology, and more particularly, to a management technology for reducing cache misses.
Description of the Related Art
In a computer system, a cache is utilized for temporarily storing a small amount of data that has been recently used or may be later used. Compared to a main memory having a larger capacity, the cache has a faster data access speed and however a higher cost. In general, a main memory is implemented by a dynamic random access memory (DRAM), and a cache is implemented by a static random access memory (SRAM). When a certain set of data is required, a processor first looks for the data from the cache, and only turns to search for the data from the main memory if the data is not found in the cache.
A cache includes multiple cache lines for storing data contents captured from a main memory. Each cache line has a tag, an index and an offset. For the data stored in the cache lines, addresses of the data originally stored in the main memory are distributed and stored in the three fields of tag, index and offset. Taking video data stored in a cached for example, FIG. 1 shows contents of the three fields above. In this example, each of the cache lines stores an image block. An address of each block includes coordinates of a start position with respect to horizontal and vertical directions. These coordinates are represented in 12 binary bits, as x[11:0] and y[11:0]. Further, the field of the offset includes 5 bits (x[4:0]), indicating that each cache line is capable of storing image data of 32 (=25) of a same horizontal line located in a picture. For example, assuming image data of one single pixel is 8 bits and the field of the offset includes 5 bits, the capacity of each cache line is then 256 (=8*32) bits.
As seen from FIG. 1, the coordinate x[11:0] of the horizontal start position is divided into three parts—x[11:7], x[6:5] and x[4:0], which are stored into the three fields of tag, index and offset, respectively. x[11:7] represent five most significant bits of the coordinate x[11:0], x[4:0] represent five least significant bits of the coordinate x[11:0], and x[6:5] represent the two remaining middle bits of the coordinate x[11:0]. Further, the coordinate y[11:0] of the vertical start position is divided into two parts—y[11:6] and y[5:0], which are stored into the two fields of tag and index, respectively. As shown in FIG. 1, the field of tag in this example further stores a time coordinate indicating a time sequence of the picture that includes a particular image block. The time coordinate is indicated by six binary bits as t[5:0]. Combining the contents of the three fields of tag, index and offset, the processor is able to obtain complete address information of the image block.
The size of a cache is usually quite limited, and cannot accommodate all image blocks of one picture. Taking a direct mapped cache for example, image blocks having the same contents in the field of index are stored into a same cache line when the image blocks are captured from the main memory to the cache. For the example in FIG. 1, the field of index includes eight bits, which represent 256 (=28) possibilities (00000000˜11111111), indicating that the cache includes a total of 256 cache lines. For example, for the two image blocks A and B, given the eight bits of y[5:0] and x[6:5] are 00101100, even when other parts (x, y, and t) of the addresses of the two image blocks A and B are different, these two image blocks A and B will be set to be stored into the cache line with an index 00101100. In practice, if the image block A is previously stored in that cache line, the processor will overwrite the data of the image block A when writing the image block B into that cache line.
The contents in the field tag of a same cache line are different at the time points at which the image block A and the image block B are stored into that cache. To search for a set of target data from a cache, the processor first identifies the corresponding cache line according to the index, and determines whether the contents in a field representing the correctness indicates that the contents of the cache line are correct. The processor then determines whether the contents in the field of tag match the address of the target data. Only when the tag and the index both match, a cache hit is achieved, or else a cache miss has occurred. In the event of a cache miss, the processor needs to capture the target data from the main memory instead, and stores the target data to the corresponding cache line for subsequent uses.
In a motion picture decoding system, a cache is often utilized for temporarily storing a reference picture that is required for a motion compensation process. Motion compensation is a technology extensively applied in the field of motion image compression. A picture to be decoded is divided into multiple same-sized image blocks (e.g., 16*16 pixels). For each of the image blocks, an encoder identifies a most similar reference region from the reference picture, and determines a motion vector between an image block and the corresponding reference region. Apart from the motion vector, the encoder further determines an image content difference between an image block and the corresponding reference region. Such image content difference is referred to as a residual. An encoded image is represented by the motion vector and the residual. Correspondingly, the motion compensation process at a decoder needs to reconstruct complete contents of individual image blocks according to the motion vector, the residual and the reference picture.
To further enhance compression efficiency, a motion compensation process that involves multiple reference pictures is being adopted by many motion compensation standards in the recent years. That is, an encoder is allowed to identify an optimal motion vector and residual from multiple reference pictures (e.g., five preceding picture and five subsequent pictures of a current picture containing the image block). Thus, when decoding different image blocks of a same picture, the processor may need to capture the contents of multiple reference pictures from a main memory to a cache. FIG. 2 shows an example of a corresponding relationship of multiple pictures. The Nth, (N−1)th and (N−2)th pictures are temporally adjacent motion pictures. Assume that an image block A1 in the Nth picture is encoded on the basis of a reference block R1 in the (N−1)th picture, and an image block A2 in the Nth picture is encoded on the basis of a reference block R2 in the (N−2)th picture. The processor first captures the reference block R1 to the cache in the decoding process of the image block A1, and then captures the reference block R2 to the cache in the decoding process of the image block A2.
It is seen from FIG. 2 that, although having different time coordinates, the start position coordinates of the reference block R1 in the (N−1)th picture are identical to the start position coordinates of the reference block R2 in the (N−2)th picture. According to a current cache mapping configuration (e.g., the cache mapping configuration shown in FIG. 1), the reference blocks R1 and R2 are set to be stored into the same cache line. With one or multiple cache lines storing the reference block R1, the processor that fails to find the reference block R2 from the cache determines the occurrence of a cache miss. Thus, the processor captures the reference block R2 from the main memory to the cache, and overwrites the reference block R1 originally stored in the one or multiple cache lines. If another image block (e.g., A3) in the Nth picture is subsequently encoded also on the basis of the reference block R1 in the (N−1)th picture, a cache miss is again incurred, such that the processor at the decoder needs to again capture the reference block R1 from the main memory to the cache and overwrites the reference block R2.
Generally known to one person skilled in the art, system overall performance degrades as the rate of cache miss gets higher. Proven by simulation experiments, a high cache miss rate is often caused during a motion compensation process based on multiple reference pictures at the decoder when a current cache mapping configuration is adopted.