A processor device of a computer is, generally, equipped with a secondary cache and a primary data cache, a primary instruction cache and the like to enhance the access performance to a main memory.
In a processor, an instruction read out from the main memory via the secondary cache and the primary instruction cache is sent to an instruction decoder and decoded.
If the decoded instruction is an memory access instruction such as a load instruction, a store instruction and a memory copy instruction, an operand address generator calculates the memory address to be accessed, and an access to the primary data cache is performed with the calculated address.
Here, at the time of execution of an memory copy instruction, data of the copy source address (assumed as an “address A”) on the memory is copied to the copy destination address (assumed as an “address B”). Since the instruction length is fixed, there is a maximum copy size that can be specified at a time by a memory copy instruction.
When a data size that is equal to or smaller than the data transfer capacity in one cycle between the main memory and the secondary cache or between the secondary cache and the primary data cache is specified in one instruction as the copy size, a process illustrated in FIG. 1A is performed for example. That is, the decoded memory copy instruction is sequentially registered in the order of decoding in an instruction queue called CSE (Commit Stack Entry). In the example in FIG. 1A, it is assumed that the memory copy instruction is registered in an entry CSE0 of the CSE.
In each entry of the CSE, an IID (instruction identifier) for identifying each instruction and a valid flag for indicating validity or invalidity of the registered instruction are registered. The number of entries of the CSE is for example about several dozen entries. The processor is equipped with, other than the CSE, an instruction queue called RS (Reservation Station) in which each instruction can be registered with priority and can be executed out-of-order. An IID to identify each instruction is also registered in each entry of the RS. The memory copy instruction is processed in the operand address generator via the RS, and a memory copy process according to the memory copy instruction is performed. In this case, the instruction registered in the CSE in the order of decoding and the instruction executed out-of-order via the RS are linked by the IID. Then, the instruction for which execution is completed via the RS is compared with an entry in the CSE by the IID registered in the entry of the RS corresponding to the instruction, and the valid flag of the entry of the CSE in which the same IDD is registered is changed to a value indicating invalidity, to complete the execution of the instruction. The order of instructions executed out-of-order via the RS is ensured by the CSE according to the linked control.
In FIG. 1A, the data transfer capacity of a memory copy instruction is for example 16 bytes (16 B), and a “16 B memory copy” instruction indicates that it is a data transfer instruction up to 16 bytes.
On the other hand, if a data size that exceeds the data transfer capacity in one cycle between the main memory and the secondary cache, or between the secondary cache and the primary data cache is specified in one instruction as the copy size, a process illustrated in FIG. 1B is executed. In this case, the instruction decoder executes a process called multi-flow expansion for a “32 B memory copy” instruction that is a data transfer instruction for 32 bytes for example. In the multi-flow expansion, the “32 B memory copy” instruction is separated into two “16 B memory copy” instructions. Each of the “16 B memory copy” instructions decoded into a plurality of instructions in this way is registered in an individual CSE entry CSE0 and CSE1 as illustrated in FIG. 1B. Each of the “16 B entry copy” instructions registered respectively in the CSE0 and CSE1 is executed out-of-order via an individual RS entry linked via corresponding IID registered together with each of the instructions, and is subjected individually to a pipeline process in the operand address generator. As a result, 16-byte memory copy process is executed.
Here, when it is desired to perform copy of data exceeding the maximum size that can be specified by a memory copy instruction, the memory copy instructions are described successively in the program. That is, a memory copy process for a large size is described as a plurality of successive memory copy instructions. Furthermore, when the data size specified by each memory copy instruction exceeds the data transfer capacity in one cycle between the secondary cache and the primary data cache, each memory copy instruction is subjected to multi-flow expansion and executed. For example, it is assumed that the data transfer capacity between the secondary cache and the primary data cache is 16 bytes, and the maximum data size specified by one memory copy instruction is 256 bytes. In this case, a memory copy process for 1024 bytes for example is described as four successive 256-byte memory copy instructions, and each of the 256-byte memory copy instructions are subjected to multi-flow expansion into 16 16-byte memory copy instructions.
In this case, for each of the case in which the primary data cache was hit in the memory access according to each memory copy instruction, the case in which the primary data cache was missed and the secondary cache was hit in the memory access, and the case in which both were missed in the memory access, there are significant differences in data access time, as illustrated in FIG. 2. In FIG. 2, “L1$HIT” indicates the case in which the primary data cache is hit. In addition, “L1$miss, L2$HIT” indicates the case in which the primary data cache is missed and the secondary cache is hit. Furthermore, “L1$, L2$miss” indicates the case in which both the primary data cache and the secondary cache are missed. Of course, at the time of execution of memory access, a high-speed processing can be performed when the frequency of occurrence of “L1$miss, L2$HIT” is higher than “L1$, L2$miss” and “L1$HIT” than “L1$miss, L2$HIT”.
Therefore, when successive memory copy instructions are described and a memory copy instruction of the maximum size (256 bytes for example) that can be specified with one instruction is specified as each of the memory copy instructions, control as described below is performed. Meanwhile, in the following description, the memory copy instruction obtained by performing multi-flow expansion for each memory copy instruction is referred to as an MF memory copy instruction.
For the execution of the first MF memory copy instruction obtained by performing multi-flow expansion for each memory copy instruction, a prefetch request is issued. The prefetch instruction is not issued at the time of execution of the second and subsequent MF memory copy instructions obtained by performing multi-flow instruction for each memory copy instruction.
As a result, upon execution of the first MF memory copy instruction obtained by performing multi-flow expansion for each memory copy instruction, if both the primary data cache and the secondary cache are missed (L1$, L2$miss), a fetch operation and a prefetch operation as described below are performed.
That is, first, memory data of an address range of several blocks from the memory address specified by the first MF memory copy instruction are fetched from the main memory to the secondary cache, and a part of the memory data is further fetched also to the primary data cache. The address range of several blocks is an address range corresponding for example to one data transfer from the main memory to the secondary cache, for example 256 bytes.
Together with this operation, based on the miss of the primary data cache (L1$miss) at the time of execution of the first MF memory copy instruction, and based on the prefetch request issued with the instruction, a prefetch operation is performed. As a result, memory data of the an address range of the several blocks further from the several blocks beyond the memory address specified by the first MF memory copy instruction is prefetched to the secondary cache in advance.
When the primary data cache is hit (L1$HIT) for the first MF memory copy instruction, no prefetch operation is performed regardless of the prefetch request described above.
For the second and subsequent MF memory copy instructions other than the first MF memory copy instruction obtained by performing multi-flow expansion for each memory copy instruction, since no prefetch request is issued, the prefetch operation described above is not performed. When the primary data cache is missed (L1$miss) at the time of executing the second and subsequent MF memory copy instructions, the normal fetch operation for the secondary cache or the main memory is performed.
Here, the case in which after one memory copy instruction is subjected to multi-flow expansion and executed, the next memory copy instruction is executed successively is considered. In this case, the rate at which the memory data corresponding to each MF memory copy instruction above has been fetched to the secondary cache even if the primary data cache is missed (L1$miss) for each MF memory copy instruction corresponding to the next memory copy instruction, increases. That is, there is a high possibility that the secondary cache is hit (L2$HIT). Accordingly, control is performed so as to reduce penalty due to cache miss (L2$miss) for the second and subsequent memory copy instructions.
Meanwhile, when executing the first MF memory copy instruction corresponding to the next memory copy instruction described above, a prefetch request is issued again. As a result, when the primary data cache is missed (L1$miss) at the time of execution of the next memory copy, the prefetch operation is to be performed further for the following memory copy instruction. As a result, the memory data for the memory copy instruction following the memory copy instruction being executed by the current multi-flow expansion is to be prefetched sequentially to the secondary cache.
The first case in which a memory copy instruction of the maximum size is preformed based on the multi-flow expansion in the prefetch control process is described more specifically, based on the operation illustration in FIG. 3.
In the example of case 1 in FIG. 3, the memory block of one data transfer from the second cache to the primary data cache is 64 bytes (64 B), and the maximum data size that can be specified with one memory copy instruction is 256 bytes. In addition, one large-size memory copy process is performed with successive 256-byte memory copy instructions. Then, assuming that each of the address A, B is located at the block boundary of the memory block, the copy source start address is A and the copy destination start address is B in the first 256-byte memory copy instruction in the memory copy process.
In FIG. 3, first, a prefetch request is issued at the time of execution of the first MF memory copy instruction obtained by performing multi-flow expansion for the first (1st) memory copy instruction. In the first MF memory copy instruction, the copy source start address is A, and the copy destination start address is B. The prefetch instruction is not issued for the second and subsequent MF copy instructions obtained by performing multi-flow expansion for the first (1st) memory copy instruction.
As a result, if the primary data cache and the secondary cache are both missed (L1$miss, L2$miss) at the time of executing the first MF memory copy instruction corresponding to the first (1st) memory copy instruction, a fetch operation and a prefetch operation as described below is performed.
That is, first, copy source memory data of the address range of 4 memory blocks from the memory address A specified by the first MF memory copy instruction corresponding to the first (1st) memory copy instruction is fetched from the main memory to the secondary cache. The address range corresponds to 64 B×4 memory blocks=256 bytes, from A to A+255. Furthermore, a part of memory blocks in the memory data fetched to the secondary cache is also fetched to the primary data cache. In addition, the copy destination memory area of the address range (from B to B+255) corresponding to 4 memory blocks from the memory address B specified by the first MF memory copy instruction is reserved (fetched) in the secondary cache.
Next, based on miss of the primary data cache (L1$miss) at the time of executing the first MF memory copy instruction corresponding to the first (1st) memory copy instruction, and based on the prefetch request issued for the instruction, a prefetch operation is performed. That is, copy source memory data of the address range corresponding to further 4 memory blocks from the 4 memory blocks from the memory address specified by the first MF copy instruction described above is prefetched from the main memory to the secondary cache. The address range is from A+256 to A+511. The same applies for reserving the area (prefetch) in the secondary cache for the copy destination memory data (from B+256 to B+511).
For the second and subsequent MF memory copy instructions other than the first MF memory copy instruction obtained by performing multi-flow expansion for the first (1st) memory copy instruction, since no prefetch request is issued, the prefetch operation described above is not performed. When the primary data cache is missed (L1$miss) at the time of executing the second and subsequent MF memory copy instructions, the normal fetch operation is performed. In this case, at the time of executing the first MF memory copy instruction corresponding to the first (1st) memory copy instruction, a fetch operation for the address range corresponding to 4 memory blocks from the memory address A (or B) from the main memory to the secondary cache has been performed. For this reason, in the fetch operation in the case in which the primary data cache is missed (L1$miss) at the time of executing the second and subsequent MF memory copy instructions, the secondary cache is hit, realizing a high-speed memory access.
Here, the case in which after the first (1st) memory copy instruction is subjected to multi-flow expansion and executed, the second (2nd) memory copy instruction is executed successively is considered. In this case, even if the primary data cache is missed (L1$miss) for each MF memory copy instruction corresponding to the second (2nd) memory copy instruction, memory data corresponding to each MF memory copy instruction mentioned above has been prefeched in the secondary cache. That is to say, the secondary cache is hit. Accordingly, control so as to reduce penalty due to miss of the secondary cache (L2$miss) for the second (2nd) memory copy instruction is performed.
Here, at the time of execution of the first MF memory copy instruction corresponding to the second (2nd) memory copy instruction, a prefetch request is issued again, Therefore, if the primary data cache is missed (L1$miss) at the time of execution of the first MF memory copy instruction corresponding to the second (2nd) memory copy instruction, a prefetch operation for the third (3rd) memory copy instruction is performed based on the prefetch request. Accordingly, a prefetch operation from the main memory of the address range from A+512 to A+767 and from B+512 to B+767 to the secondary cache is to be performed.
As described above, with the miss of the primary data cache (L1$miss) at the time of execution of the first MF memory copy instruction corresponding to each memory copy instruction, the prefetch operation for the next memory copy instruction of the memory copy instruction being currently performed is performed sequentially.
Next, the second case in which a memory copy instructions of the maximum size are sequentially preformed, based on the multi-flow expansion, in the prefetch control process is described more specifically, based on the operation illustration in FIG. 4.
In the example of case 2 in FIG. 4, similar to the case 1 in FIG. 3, the memory block of one data transfer from the second cache to the primary data cache is 64 bytes (64 B), and the maximum data size that can be specified with one memory copy instruction is 256 bytes. In addition, similar to the case in FIG. 3, one large-size memory copy process is performed with successive 256-byte memory copy instructions. In the case in FIG. 4, assuming that the address A, B is located at the block boundary of the memory block, the copy source start address is A+16 and the copy destination start address is B+16 in the first 256-byte memory copy instruction in the memory copy process. That is, while in the case 1 in FIG. 2, the start address of the memory copy process is located on the block boundary (address A, B), in the case 2 in FIG. 4, the start address is not located on the block boundary.
In case 2 illustrated in FIG. 4, similar to the case 1 in FIG. 3, first, a prefetch request is issued at the time of execution of the first MF memory copy instruction obtained by performing multi-flow expansion for the first (1st) memory copy instruction. In the first MF memory copy instruction, the copy source start address is A+16, and the copy destination start address is B+16. The prefetch instruction is not issued for the second and subsequent MF copy instructions obtained by performing multi-flow expansion for the first (1st) memory copy instruction, similar to the case 1 in FIG. 3.
As a result, when the primary data cache and the secondary cache are both missed (L1$, L2$miss) at the time of executing the first MF memory copy instruction corresponding to the first (1st) memory copy instruction, a fetch operation and a prefetch operation as described below are performed.
That is, first, copy source memory data of the address range of 4 memory blocks from the memory address A+16 specified by the first MF memory copy instruction corresponding to the first (1st) memory copy instruction is fetched from the main memory to the secondary cache. The address range corresponds to 64 B×4 memory blocks=256 bytes, from A to A+255. Furthermore, a part of memory blocks in the memory data fetched to the secondary cache is also fetched to the primary data cache. The same applies to reservation (fetch) of the areas in the secondary cache for the copy destination memory data (from B to B+255).
Next, based on miss of the primary data cache (L1$miss) at the time of executing the first MF memory copy instruction corresponding to the first (1st) memory copy instruction, and based on the prefetch request issued for the instruction, a prefetch operation is performed. That is, copy source memory data of the address range corresponding to further 4 memory blocks from the 4 memory blocks from the memory address specified by the first MF copy instruction described above is prefetched from the main memory to the secondary cache. The address range is also specified in units of memory blocks, and is from A+256 to A+511. The same applies for the reservation of the area (prefetch) in the secondary cache for the copy destination memory data (from B+256 to B+511).
Here, the case in which after the first (1st) memory copy instruction is subjected to multi-flow expansion and executed, the second (2nd) memory copy instruction is executed successively is considered.
When executing the first MF memory copy instruction corresponding to the second (2nd) memory copy instruction, a prefetch request is issued again. Here, in the first MF memory copy instruction corresponding to the second (2nd) memory copy instruction, the copy source start address is A+272, and the copy destination start address is B+272. The memory block in which these addresses are included is the same one as the memory block that was accessed when the last MF memory copy instruction corresponding to the first (1st) memory copy instruction was executed. Therefore, in the case 2 in FIG. 4, at the time of executing the first MF memory copy instruction corresponding to the second (2nd) memory copy instruction, the primary data cache is hit (L1$HIT) without being missed. The prefetch operation from the main memory to the secondary cache is performed only when a prefetch request has been issued to the primary data cache and the primary data cache is missed (L1$miss). Therefore, at the time of execution of the first MF memory copy instruction corresponding to the second (2nd) memory copy instruction, although a prefetch request has been issued, a prefetch operation for the third (3rd) memory copy instruction is not to be performed.
As a result, when the primary data cache is missed (L1$miss) first at the time of performing multi-flow expansion for the third (3rd) memory copy instruction, no memory data for the third (3rd) memory copy instruction exists on the secondary cache. For this reason, the primary data cache and the secondary cache are both to be missed (L1$, L2$miss), there arises a need to fetch memory data for the third (3rd) memory copy instruction from the main memory to the secondary cache. After this, instruction execution of each MF memory copy instruction corresponding to the third (3rd) memory copy instruction is to be delayed until the fetch operation is completed, generating a large memory access penalty.
Furthermore, in the first MF memory copy instruction corresponding to the third (3rd) memory copy instruction, the copy source start address is A+528, and the copy destination start address is B+528. The memory block in which these addresses are included is the same one as the memory block that was accessed when the last MF memory copy instruction corresponding to the second (2nd) memory copy instruction was executed. Therefore, in the case 2 in FIG. 4, at the time of executing the first MF memory copy instruction corresponding to the second (3nd) memory copy instruction, the primary data cache is also hit (L1$HIT) without being missed. For this reason, also at the time of execution of the first MF memory copy instruction corresponding to the third (3rd) memory copy instruction, although a prefetch request has been issued, a prefetch operation for the fourth (4th) memory copy instruction is not to be performed.
By such a negative spiral, in the case 2 in FIG. 4, for all of the second (2nd) and subsequent memory copy instructions, no prefetch operation is to be performed even though a prefetch request is issued in the first MF memory copy instruction corresponding to each memory copy instruction. As a result, there has been a problem that the memory access efficiency of the memory copy instruction significantly decreases.
Related art is described, for example, in Japanese Laid-open Patent Publication No. 59-218691 and Japanese Laid-open Patent Publication No. 58-169384.