The present invention relates in general to a microprocessor and a compiler for use in the same. More particularly, the present invention relates to a microprocessor having a function of changing dynamically the execution of instruction in accordance with the operation characteristics of accessing or referencing a memory in execution of a program, and a method of compiling programs.
The performance of the recent microprocessors has been greatly enhanced by the improvements in the parallel computation (parallelism) of the instruction levels, the improvements in the operating frequency, and the like. On the other hand, the performance of a main storage has not been enhanced so much. For this reason, the main storage access cycle in the cache miss becomes one of the main causes by which the performance of a program is lowered. In the general microprocessor, a cache memory operates transparently from an application program. In other words, it can not be judged from the application program whether or not the cache miss has occurred. However, as described above, the memory operation such as the cache miss exerts a great influence on the performance of executing the application program. Therefore, it is desirable to control finely the execution of the application program in accordance with the fine memory access operation in the execution.
As for the techniques wherein the number of times of cache miss or the like in the memory access can be accessed or referenced from the application program, there is known a performance monitoring function. For example, the function of counting up the number of times of cache misses or the like on a special register to access this value is disclosed in an article of xe2x80x9cARCHITECTURE AND PROGRAMMING MANUAL, INTELxe2x80x9d, Pentium Family Developer""s Manual, the Last Volume. The same function is also disclosed in an article of xe2x80x9cPOWER PC 604 USER""S GUIDE, IBM MICROELECTRONICS AND MOTOROLAxe2x80x9d for example. In these functions, a register for counting up the generation number of times of cache miss is defined as a special register such as a performance counter. In order to be aware of whether or not the cache miss has occurred with respect to the individual memory accesses, it is required that the contents of the performance counter are read out to a general purpose register before and after the issue of the memory access instruction to judge on the basis of the comparison whether or not the value has been changed. Thus, complicated processings are required.
The technique for coping with such a problem to refer to simply the individual operation states when accessing a memory with a low overhead, for example, is disclosed in article of M. Morowitz et al.: xe2x80x9cINFORMING MEMORY OPERATIONS: PROVIDING MEMORY PERFORMANCE FEEDBACK IN MODERN PROCESSORSxe2x80x9d, In Proceeding of the 23-rd Annual International Symposium Computer Architecture, 1996. In this article, there is disclosed the technique wherein it is judged every memory access instruction whether or not the cache miss as one of the memory access operations has been generated, and if it is judged that the cache miss has occurred, then the branching is made to a handler code. According to the present article, in order that it may be monitored whether or not the cache miss has occurred when carrying out a certain memory access, the following three methods have been proposed.
(a) When a memory access causes a cache miss to be generated, a branching conditional code is set and the code is arranged with which after completion of the memory access, the conditional code is accessed to carry out conditional branching. In this method, for example, a flag exhibiting whether or not when carrying out a load instruction, a cache miss occurs is stored in the branching flag code area. In the next instruction, the branching flag code area is checked. Then, if it is judged that the cache miss has occurred, then a routine of executing a processing in the cache miss is called.
(b) When the memory access causes a cache hit, the next instruction of the memory access is made invalid. In this method, for example, the instruction of calling the routine of executing the processing in the cache miss is inserted into the position after the load instruction. If no cache miss is generated when executing the load instruction, then the execution of the next calling instruction is inhibited. On the other hand, if the cache miss has occurred when executing the load instruction, the calling instruction is executed, and then the routine of executing the processing in the cache miss is executed.
(c) When the memory access causes a cache miss to occur, the exception is generated so that the control is passed to an exceptional handler routine which has been specified by a special register. In this method, the addresses of the routine (the exceptional handler routine) which is to be executed when the cache miss has occurred are previously set in an MHAR (Miss Handler Address Register) as the special register. If the cache miss has occurred when executing the load instruction, then the exceptional handler routine in the addresses which were set in the MHAR is called.
In those methods, in the routine which is executed in the cache miss, for example, the processing such as the processing of adding the value of a counter for counting the number of times of cache miss is executed.
Of the techniques which have been described in the prior art, in the methods (a) and (c), when the cache miss has occurred, the branching occurs. In general, since the microprocessor carries out the branching estimation and executes estimatively one instruction execution path for which it has been estimated that its frequency is high, the branching instruction in accordance with which the path is executed for which it has been estimated that its frequency is low causes the penalty to occur in execution of the instruction. On the other hand, in the above-mentioned method (b), since it is decided on the basis of the hit/miss of the instruction directly before the subsequent instruction whether or not the subsequent instruction should be executed, there arises the problem that the complexity of the hardware/software is brought. In addition, in each of the above-mentioned methods, since the access to the memory is paired with the detection of its operation, the penalty will always occur due to the memory access. Therefore, for example, there arises the problem that whether or not the request of accessing a memory which was made previously has been completed can not be identified without accessing the memory.
In order to solve the above-mentioned problems associated with the prior art, a processor according to an aspect of the present invention comprises: a register file having at least one register for storing therein data which is used in an arithmetic operation or data of the result of the arithmetic operation; a cache memory for holding therein a copy of a part of the data which is stored in a memory; a hit/miss judgement circuit for judging whether or not data which is to be accessed in accordance with an access instruction is present in the cache memory; and an arithmetic operation unit receiving as its input a judgement result obtained from the hit/miss judgement circuit and a data which has been read out from the register file to carry out a predetermined arithmetic operation. The result of the arithmetic operation made by the arithmetic operation unit is stored as information exhibiting the access operation in the register within the register file. In the preferred aspect of the present invention, the processor has an access instruction followed by the processing of storing the result of the access operation in the register.
According to another aspect of the present invention, in a method of compiling programs for generating program codes which can be understood by a computer on the basis of the program which has been inputted thereto, the program which has been inputted is analyzed and on the basis of the analysis result, a part of the program having the possibility that a delay of processing may occur due to occurrence of cache miss is extracted. A memory access instruction which is contained in the extracted part of the program is converted into an access instruction in accordance with which the result of the operation of accessing the memory is acquired to generate a code for selecting a processing which is dynamically performed on the basis of the acquired result of the memory access operation.