The present invention relates to an optimization method of a compiler program in a computer system. More particularly, the present invention relates to a new method for handling memory allocation system calls (MASCs) inside a counted loop by grouping these calls into a single call.
A computer system typically consists of a processor, a main memory and an I/O device with which the computer system communicates with an end-user. The end-user provides the computer system with a computer program written in one of several different computer languages. The program typically consists of a set of instructions or codes directing the processor to perform a series of tasks. Different computer languages serve to handle a wide variety of applications. For example, there are a number of computer languages designed to handle only scientific and engineering applications. Other languages are written to handle graphics intensive environments. However, regardless of the application, a computer program in a high-level language should be translated into machine language for execution by the computer system. The translation is accomplished by a computer program called a compiler.
A compiler takes as input a source program and produces as an output, an object program. To do this, the source program goes through several phases. Each phase transforms the source program from one representation to another until it is translated into an equivalent object program understandable by the computer system. In doing the translation, a compiler typically identifies and eliminates errors and inefficiencies in the source program.
Improving the efficiency of computer systems has been a goal of computer system designers and architects since the inception of the modern computer systems. An area that has been widely impacted by this goal involves reducing memory latency by utilizing cache memory. Memory latency is a time inefficiency stemming from the central processing unit (CPU) of a computer system operating at a much faster data rate than the data rate associated with a corresponding memory unit. The difference in speed results in the CPU staying idle while the slower memory delivers a requested data. To reduce memory latency, a faster but smaller level of intermediate memory known as cache has been developed.
The way cache works is as follows. When the processor requests data, that data is transferred from memory to cache and then from cache to the CPU. Thus, a copy of the data will remain in cache. On the next CPU request for data, the much faster cache is checked prior to sending the request to memory to see whether the requested data is available locally in cache. If it is, then there is no need to retrieve the data from the memory and the processor can get its request from the cache (a cache hit). On the other hand, when the cache does not contain the requested data or code, a cache miss occurs. In this case, the data is retrieved from the memory, and the CPU is unable to save any time as it would through a cache hit. Thus it is extremely desirable to reduce cache misses or increase cache hits.
Several methods have been suggested to reduce cache misses. Some of these methods involve hardware while others involve software. For example, software prefetching can be an effective technique for reducing cache misses. A common prefetching technique, known as inline or next-in-sequence, is to prefetch the next consecutive cache line on a cache access. This technique takes advantage of a phenomenon known as spatial locality which refers to the fact that most computer codes execute out of a small area repetitively. This space is not necessarily in a single address range of main memory, but may be spread around quite significantly.
Spatial locality is particularly applicable when programs contain loops which in turn include other loops, ad infinitum. A loop is simply a procedure which is repeated according to the trip-count of the loop or according to another criteria.