The present invention relates to computer systems and, more particularly, to an improved method for generating computer code which executes more efficiently on computers having cache memories and the like.
Conventional computing systems include a central processing unit which executes instructions which are stored in a memory. The cost of providing higher speed central processing units has decreased much faster than the cost of computer memory. This decrease in cost at the central processing level has resulted from such innovations as reduced instruction set architectures and instruction pipelining. Similar advances have not taken place in memory circuitry. In addition, the size of computer programs has increased as more complex tasks are set-up for computer implementation. This increased program size further aggravates the central processor memory mismatch.
One solution to the mismatch between central processing speed and the cost of computer memory having comparable speed is the introduction of intermediate memory units. Small high speed cache memories are placed between the central processing unit and the computer's main memory. The cache memory has a speed which is comparable to that of the central processing unit; hence, instruction fetches from the cache do not introduce processing delays. The size of the cache is much less than that of the main memory; hence, the cost of a slow main memory and a cache is much less than a main memory having the speed of the central processing unit.
When the central processing unit requests an instruction, the request is intercepted by the cache. If the instruction in question is already stored in the cache, it is returned to the central processing unit. If the instruction is not present, the cache loads a small block of memory locations which includes the requested instructions from the main memory. The cache then returns the requested instruction. The need to load the small block, referred to as a "line", delays the execution of the program; hence, such cache loads must be limited to a small fraction of the instruction fetches if the cache architecture is to provide a substantial speed enhancement. This will be the case if the computer code is organized such that the execution sequence of the code is the same as the storage sequence. That is, if instruction A is to be executed after instruction B, instruction B should be stored immediately after Instruction A. In this case, each instruction in each line of code will be executed in the order stored and cache loads will only be needed at the end of each line of code. Furthermore, the cache loads at the end of lines could then be anticipated in advance and thus performed before the lines in question were actually requested.
The presence of JUMP instructions in the computer code results in difficulties in providing this ideal code organization. When the central processing unit encounters a JUMP instruction, the next instruction will be at a point in the code which may be very distant from the location at which the JUMP instruction is stored. Hence, the cache may not contain the line which includes the next instruction. Furthermore, if the JUMP instruction is a conditional one, there is no method of determining in advance whether or not the jump will be executed.
JUMP instructions cause additional problems for pipelined central processing units. In a conventional pipelined processor, a number of instructions are being processed at any given time. Each time an instruction is executed, a new instruction is introduced into the "pipe". If a jump is executed, the next instruction in the pipe is not likely to be the target of the jump. Hence, a portion of the pipe must be emptied and reloaded starting with the target instruction. Hence, computer code which minimizes the use of JUMP instructions is highly desirable.
Broadly, it is the object of the present invention to provide an improved computer code optimization method
It is another object of the present invention to provide a code optimization method that places code blocks which are executed in sequence in memory locations which are close to each other.
It is yet another object of the present invention to provide a code optimization method that reduces the number of JUMP instructions in the executed computer code.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.