1. Field of the Invention
The present invention relates generally to a language processing of a computer. More particularly, the invention relates to a language processing method and a language processing system for improving use efficiency of a cache memory.
2. Description of the Related Art
It is a typical method of speeding up a computer to reduce the frequency of access to program memory (hereinafter simply referred to as xe2x80x9cmemoryxe2x80x9d) by holding data in a cache memory (hereinafter simply referred to as xe2x80x9ccachexe2x80x9d). However, since the cache memory has a limited capacity, it is not possible to hold all data in the cache. Therefore, it is possible for a miss-hit (cache miss) to occur due to access contention of the cache memory.
On the other hand, in the recent years, due to improvements in the computing speed of microprocessors, it frequently occurs that memory access becomes a bottle neck in improvement of operation speed of the computer. Therefore, it is necessary to reduce the number of times that memory is accessed by avoiding cache miss as much as possible.
For avoiding cache miss, there has been known a memory access optimizing method to schedule codes in descending order of the number that they are called. Scheduling is done during compiling of a source (prime) program written in a programming language, such as FORTRAN, COBOL, C language or assembly language or the like, into an object program to be executed in the computer having cache memory. Examples are found in Japanese Unexamined Patent Publication No. Heisei 5-324281, Japanese Unexamined Patent Publication No. Heisei 7-84799, and Japanese Unexamined Patent Publication No. Heisei 8-328870.
Furthermore, as a method for improving the use of memory through scheduling of codes by the compiler in order to enhance the use efficiency of an instruction cache memory, an algorithm for determining an arrangement of functions that takes into consideration call relationships between functions (sub-routines) is disclosed in A. H. Hashemi, D. R. Kaeli, B. Calder, xe2x80x9cEfficient Procedure Mapping Using Cache Line Coloringxe2x80x9d, ACM SIGPLAN, 1997.
In the above-identified report, a direct mapping system, in which lower bits of an address of a memory block directly correspond to an address of a cache block where the memory block is mapped, has been discussed. However, this algorithm is easily applicable even in the case of a set associative system, in which entries of cache blocks equal to associative degree is determined by a lower bit of on address of the memory block. In this algorithm, as function call information, a function call graph, having a weight indicative of the number of times the function is called, and information of the code side per function, are input. The function call information includes profile information collected by executing a program and information obtained from a result of static analysis of the call relationship of the functions in the source program.
Then, the functions are divided into a group having high utilization and a group having low utilization. Concerning the group having high utilization, arrangement is determined in sequential order from functions on both ends of the side having the largest weight in the function call graph. Upon determining an arrangement, coloring (virtual indicator marking) is performed for the cache blocks occupied by respective functions to avoid conflict in the cache memory. In such arrangement, gaps in the cache memory where data is not written are filled by the functions having low utilization.
However, in the function arrangement system by coloring disclosed in the report, pairs of caller side functions/callee side functions (caller/callee) are arranged in the memory in descending order based on the number of times each function is called. Therefore, if there are functions that are called at high frequency by a large number of functions, an arrangement in which a conflict of such functions is caused in the cache block can occur.
For example, in the case of a processor having no floating point instruction, the compiler provides functions for conversion using arithmetical operations or an integer type floating point library and enables use of such by generating codes that call library functions for floating point operations in a user program. Such library functions are referred to as runtime routines, have relatively smaller size than normal library functions, and are called from many functions at particularly high frequency.
However, in the conventional compiler device, such runtime routines are arranged in on the memory together with general functions in descending order from the one having the largest number of calls with respect to the pairs of caller side functions/callee side functions. Therefore, if there are a plurality of functions calling the same runtime routine at a high frequency, an arrangement where a plurality of runtime routines causes conflict on the cache memory may occur.
FIGS. 7A and 7B are explanatory illustrations of the conventional function arrangement method for discussing occurrence of conflict of the runtime routines on the cache memory.
Here, it is assumed that a call relationship between a function A, a function B, a runtime routine addf and a runtime routine mulf are as shown in a function call graph of FIG. 7A. On the other hand, in a memory 201 storing a program, as shown in FIG. 7B, the runtime routine addf as identified by 205 is optimally arranged relative to the function A as identified by 203, and, in the similar manner, the runtime routine mulf as identified by 206 is optimally arranged relative to the function B as identified by 204. In this case, in order to perform a process relating to the function B, the function B as identified by 204A and the runtime routine mulf as identified by 206A are loaded to the cache memory 202 from the memory 201, and the runtime routine addf as identified by 205A is further loaded. However, the runtime routine addf and the runtime routine mulf have overlapping addresses in the cache memory, and so a conflict is caused.
Particularly, the arithmetical operation or so forth in the runtime routine is generally performed repeatedly in the function. Therefore, between the runtime routines conflicting in the cache memory, a condition in which counterpart runtime routines expel each other may occur. Such condition is referred to as xe2x80x9cthrashingxe2x80x9d. The expelled function has to be loaded from the memory again, at next call. Therefore, if thrashing is caused, the execution speed of the program becomes significantly lower.
An object of the present invention is to provide a language processing method, a language processing unit and a storage medium recording a language processing program which can avoid occurrence of thrashing between runtime routines upon generation of an object program by translating a source program.
In accordance with a preferred embodiment of the invention, callee side functions that overlap in cache memory and are called by the same caller side function are detected, and all but one of the overlapping callee side functions called by a given caller side function are replaced with a replica of the original callee side function that will be stored in the cache in a non-overlapping position. The resulting set of callee side functions called by the caller side function are non-overlapping when stored in the cache, and thus the thrashing problem is avoided. This procedure may be repeated a predetermined number of times, or until all instances of overlap have been eliminated through the use of replicated callee side functions.
Other objects, features and advantages of the present invention will become clear from the detailed description given hereinafter.