1. Technical Field
The present invention generally relates to computer processing systems and, in particular, to methods for renaming stack references in a computer processing system.
2. Background Description
A memory serves as a repository of information in a computer processing system. FIG. 1 is a block diagram illustrating a typical layout of a memory 100 of a computer program according to the prior art. The layout consists of distinct memory areas, including a program text area 104, a program data area 106, a heap 108, and a program stack 110. Program text area 104 is used to store program text (i.e., computer instructions). Program data area 106 is used to store program data (for static data references). Heap 108 is used for dynamically allocated objects and program stack 110 is used for function-local variables.
As shown, memory 100 stores different types of data in distinct memory areas. The following different mechanisms are used to access these memories:                1. Program text area 104 stores the computer instructions describing the actions of a program, and possibly program constants. Program text area 104 is usually read-only and accessed using the program counter.        2. Program data area 106 holds static data references, e.g., global program variables. Program data area 106 is accessed using either a global data pointer or a table of contents data structure.        3. Heap 108 holds dynamically allocated objects and is accessed using pointers held in any of the processor registers.        4. Program stack 110 usually holds function-local variables and is accessed using special-purpose registers, such as the stack pointer (SP), frame pointer (FP), or argument pointer (AP).        
Usually, all program memory can be accessed through the use of pointers which are stored in a register. However, the access mechanisms described above are generally used for each area in typical programs.
In general, a processor accesses information from the memory, performs computations thereon, and stores the results back to memory. Unfortunately, memory access incurs a number of costs. A description of some of these costs will now be given.
When a memory access operation is first detected, the address to be accessed must be resolved. Moreover, the registers employed for the address computation must be available.
If the processor wants to reorder memory read operations with respect to other memory operations, and it cannot be determined that the read addresses are different at the time when they are to be reordered, then checks for memory address ambiguities need to be performed.
In addition, since store operations modify the processor state, they typically have to be performed in-order. This causes further slowdowns in achievable processor performance by serializing operations when multiple live ranges are assigned to the same memory location. Thus, limitations are typically imposed on the degree of reordering that can be performed in a superscalar processor, when multiple independent values are assigned to the same memory address.
Moreover, load and store operations typically require access to a cache(s). However, accessing a cache is slower in comparison to accessing processor registers, which represent a higher level in the memory hierarchy of a computer processing system.
Many of the serializing effects of memory references result from the way in which programs are written by programmers. However, serializing effects of memory references may also result from the way programs are translated from their source level representation to the actual machine. In such a case, references are made to the program stack.
The program stack stores stack frames, that is, records containing the values for local variables of functions, as well as parameters passed between functions. Stack locations are reused frequently, with different functions using memory locations with the same address to store unrelated objects.
Consider the following example code written in the C programming language:
int mult3 (int a){return a * 3;}int inc (int b){return b+1;}int compute(int a, int b){int tmp1, tmp2;tmp1 = mult3 (a);tmp2 = inc (b);return tmp1+tmp2;}
When this code is translated to Intel x86 machine code, the following intructions will be generated:
1mult3:2imull $3,4 (%esp),%eax3ret45inc:6movl 4(%esp),%eax7incl %eax8ret910compute:11pushl %esi12pushl %ebx13movl 12(%esp),%eax14movl 16(%esp),%ebx15pushl %eax16call mult317addl $4,%esp18movl %eax,%esi19pushl %ebx20call inc21addl $4,%esp22addl %esi,%eax23popl %ebx24popl %esi25retThe immediately preceding code illustrates several examples of the inefficiencies of holding the processor stack in memory:                1. The values of registers ESI and EBX are stored on the stack at instructions 11 and 12′ and restored at instructions 23 and 24. These values could have been held in processor-internal registers.        2. The parameters a and b which were pushed onto the stack by the calling function must be read from the stack into a processor register, and then stored on the stack for functions mult3 and inc, respectively.        3. The parameters a and b for functions mult3 and inc, respectively, are stored at the same stack location, so operations from function inc cannot be scheduled at the same time as the instructions for function mult3. This serialization is not necessary.        
The serializing effects of stack references due to the reuse of memory locations and the manipulation of the stack pointer is described by Postiff et al., in “The Limits of Instruction Level Parallelism in SPEC95 Applications”, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII) Workshop on Interaction between Compilers and Computer Architectures (INTERACT-3), October 1998. Postiff et al. further describe the performance improvements which can be achieved by resolving these serializing effects.
3. Problems with the State of the Art
It is to be appreciated that previous memory renaming has been based on renaming of general memory references, and has tended to ignore multiprocessor effects. Some of these prior art approaches will now be described.
It is to be appreciated that memory renaming typically includes the prediction of data dependencies. A mechanism to predict data dependencies dynamically without computation of the address is described by A. Moshovos and G. Sohi, in “Streamlining Inter-operation Memory Communication via Data Dependence Prediction”, Proceedings of 30th Annual International Symposium on Microarchitecture Research Triangle Park, N.C., December 1997. Predicting dependencies is necessary because the addresses of load and store operations may be unresolved. To ensure correctness of the predictions, these memory operations need to be eventually executed. A similar approach for predicting dependencies is described by G. Tyson and T. Austin, in “Improving the Accuracy and Performance of Memory Communication Through Renaming”, Proceedings of 30th Annual International Symposium on Microarchitecture Research, Triangle Park, N.C., December 1997. Moshovos & Sohi and Tyson & Austin provide a general technique for promoting accesses to memory into processor-internal registers. This requires hardware of significant complexity. Moreover, prediction is used, which is not as accurate as actual decoding of the instruction, and may require expensive repair actions. An address resolution buffer which supports out-of-order execution of memory operations and memory renaming is described by M. Franklin and G. Sohi, in “ARB: A Hardware Mechanism for Dynamic Reordering of Memory References”, IEEE Transactions on Computers, Vol. 45, No. 5, May 1996. Disadvantageously, this buffer is expensive, the required hardware is complex, and the buffer does not consider multiprocessor systems and their consistency requirements.
U.S. Pat. No. 5,911,057, entitled “Superscalar Microprocessor Having Combined Register and Memory Renaming Circuits, Systems, and Methods”, issued on Jun. 8, 1999, the disclosure of which is incorporated herein by reference, describes an architecture for renaming memory and register operands and in uniform fashion. Memory coherence is based on “snooping” memory requests. While this approach is sufficient for in the in-order execution of memory operations in a multiprocessor computing system, out-of-order operation in a multiprocessor system may generate incorrect results. U.S. Pat. No. 5,838,941, entitled “Out-of-order Superscalar Microprocessor with a renaming Device that Maps Instructions from memory to Registers”, issued on Nov. 17, 1998, the disclosure of which is incorporated herein by reference, describes symbolic renaming of memory references. The invention deals with equivalence of all types, and requires lookup of an associative array to establish equivalence between expression and names. This results in a complex architecture with potentially severe cycle time impact.
Thus, it would be desirable and highly advantageous to have a method for eliminating serializing effects resulting from stack references. It would be further desirable and advantageous if such method was applicable in a multiprocessor system.