1. Field of the Invention
The present invention relates to data and instruction access in a computer system and, more particularly, to a method and an architecture capable of adaptively accessing data and instructions.
2. Description of Related Art
The processing speed of CPU of a modem computer has increased significantly. Furthermore, such trend of increase is still continuing. It is known that a corresponding increase in accessing memory is required for increasing the total data and/or instruction access efficiency of the computer. In other words, a relatively slow memory is a bottleneck of the efficiency increase of the computer. For solving this problem, a cache memory is thus developed, in which a memory access unit is defined to have a constant length composed of a predetermined number of instructions or data, and such unit is called a cache line. The length of the unit is critical. For example, in a memory having a burst transfer capability, multiple data accesses can be performed by only giving one address and associated setting, so that a data string having the assigned burst length is continuously transferred. As a result, an initial delay prior to data transfer is decreased. In such memory, the length of the cache line is related to the burst length.
With reference to FIG. 1, it presents schematically a conventional processor architecture having the above cache capability. As shown, in case that a cache line having the required data or instructions is in the cache module 11a, the processor kernel 14 can fetch required data or instructions from a cache module 11 directly with no or very low time delay. However, if the required data or instructions are not in the cache module 11, a cache miss is encountered. At this moment, the processor kernel 14 has to command the cache module 11 to read the required data or instructions from a memory device 13. Such an operation is called cache refill. Thus, a significant system delay (called cache miss penalty) is occurred since all cache lines have to be stored in the cache module 11.
The cache miss penalty often occurs continuously when the processor kernel 14 accesses a certain section of program codes or data section at the first time. This can adversely affect the performance of the computer system. For solving this problem, a prefetching is proposed. As shown in FIG. 2, a prefetch module 12 is provided between the cache module 11 and the memory device 13. The prefetch module 12 acts to predict possible sections of program codes or data sections to be used next by the processor kernel 14 and read the same into the prefetch module 12. Once the processor kernel 14 finds that it is unable to get required data or instructions from the cache module 11 (i.e., a cache miss occurred), the prefetch module 12 is checked to search the data or instructions. If the required data or instructions are already in the prefetch module 12, a successful access is then realized, and the required cache lines are stored in the cache module 11 by reading the same from the prefetch module 12. As a result, the cache miss penalty is greatly reduced. However, a prefetch miss still may occur if the required data or instructions are not in the prefetch module 12. It is still required to get the required cache lines from the external memory device 13. Thus, a significant system delay (called prefetch miss penalty) is occurred.
Conventionally, the architecture of the prefetch module 12 is configured to be the same as the cache module, and thus the cache line is employed as the data length of the prefetch module 12. In other words, the length of a burst transfer in a dynamic random access memory (DRAM) is taken as a data transfer unit. However, the interface either between the prefetch module 12 and the cache module 11 or between the pre-fetch module 11 and the processor kernel 14 is not a DRAM interface. Hence, it is not necessary to take the cache line as a data transfer unit. Practically, data transfer rate may be significantly lowered if the cache line is used as the data transfer unit.
Specifically, three interfaces are provided in the processor structure with cache module 11 and prefetch module 12. The first interface 15 is an external interface between the prefetch module 12 and the external memory device 13. The second interface 16 is provided between the prefetch module 12 and the cache module 11. The third interface 17 is provided between the cache module 11 and the processor kernel 14 for transferring data/instruction from the cache module 11 to the processor kernel 14. Conventionally, data transfer unit taken in each of the first and the second interfaces 15 and 16 is the same as the data length of the cache line. As for data access via the third interface 17, if it is related to data access of either first or second interface, the data access can be performed only after the cache line has been accessed. However, the data length of the cache line is not an optimum data transfer unit between the prefetch module 12 and any one of the memory device 13, the cache module 11, and the processor kernel 14. This is because a length of the cache line is related to structure of the cache module 11. Theoretically, the length of the cache line is fixed during the working cycles of the processor kernel 14. However, the processor kernel 14 is dynamic in accessing data/instruction when being executed. Hence, an optimum performance of the processor kernel 14 is not obtained if the cache line having the fixed length is taken as the data transfer unit. As a result, resources are wasted.
For example, several problems have been found when a cache line having a fixed length is taken as a data transfer unit as follows:
(1) In the process of data transfer via the interface, it can be known that a long data string is about to be accessed and data length thereof is longer than a data length of the current cache line. However, the data length of the cache line is fixed, resulting in an inhibition of longer burst length setting, an inhibition of reduction of times of initial delay, and time consuming.
(2) In the process of data transfer via the interface, it can be known that a short data string is about to be accessed and data length thereof is shorter than the data length of the current cache line. However, as stated above, the data length of the cache line is fixed. As a result, it is still required to access data by taking the length of the cache line as an access unit and thus unnecessary data is accessed, resulting in a waste of limited resources.