1. Field of the Invention
The present invention relates generally to computing systems and, in particular, to super-scalar stack based computing systems.
2. Discussion of Related Art
Most computing systems are coupled to a random access memory system for storing and retrieving data. Various ways to increase the speed of computing systems using random access memory systems are well known in the art. For example using caches between a central processing unit of a computing system and the memory system can improve memory throughput. Furthermore, super-scalar architectures and pipelining can improve the performance of central processing units.
However, other memory architectures such as stacks are also used in computing systems. As shown in FIG. 1, a stack based computing system 110, which can implement for example, the JAVA Virtual Machine, is coupled to a stack 120. In classical stack architectures, data is either "pushed" onto the stack or "popped" off the stack by stack based computing system 110. For example, to add the numbers 4 and 5, a stack based computing system 110 first pushes the number 4 onto the top of stack 120. Then, stack based computing system 110 pushes the number onto the stack. Then, stack based computing system 110 performs an add operation which pops the number 5 off stack 120 and the number 4 off stack 120 and pushes the number 9 onto the top of stack 120. A major advantage of stack based computing system 110 is that operations using data at the top of the stack do not need to use memory addresses. The top of stack is also referred to as the first location of the stack, and the location just under the top of the stack is also referred to as the second location of the stack. Similarly, the memory location in the stack just after the second location is also referred to as the third location of the stack.
Stack based computing system 110 can become more flexible by also allowing stack based computing system 110 to use some random access techniques with stack 120. Thus, in some implementation of stack based computing system 110 and stack 120, the memory locations in stack 120 are part of a random-access memory architecture. Thus, each memory location in stack 120 has a memory address. As used herein, a memory location having a memory address equal to x is referred to as memory location x.
Even in stack based computing systems using random-access techniques, most operations by the stack based computing system use data from or near the top of stack 120. For example, assume a value V1 from a memory location ADDR1 is to be added to a value V2 from a memory location ADDR2, and the sum stored at a memory location ADDR3, stack based computing system 110 first executes a stack load instruction, which retrieves value V1 from memory location ADDR1 and pushes value V1 onto the top of stack 120. Next, stack based computing system 110 executes another stack load instruction, which retrieves value V2 from memory location ADDR2 and pushes value V2 onto the top of stack 120. Then, stack based computing system 110 executes an add instruction which pops the top two locations of stack 120, which now contain value V1 and value V2, and pushes the sum of value V1 and value V2 onto the top of stack 120. Finally, stack based computing system 110 executes a stack store instruction which pops the value from the top of stack 120, i.e. the sum of value V1 and value V2, and stores the value in memory location ADDR3.
Some of the techniques used to improve the performance of random access memory systems can be adapted to improve stack performance. For example, as shown in FIG. 2, stack 120 can contain a data cache 210, a stack cache 220, a stack cache management unit 240, and a memory circuit 230. Data cache 210 is formed with fast memory circuits, such as SRAMS, to improve the throughput of memory circuit 230. Stack cache 220 specifically caches a top portion of stack 120 using fast memory circuits, such as SRAMS. Stack cache management unit 240 manages stack cache 220 by copying data from memory circuit 230 into stack cache 220 as data is popped off of stack 120 or spilling data from stack cache 220 to memory circuit 230 as data is pushed onto stack 120. Thus, stack cache 220 maintains the top of stack 120 in fast memory circuit, so that a stack based computing system can perform stack operations with low stack latency. Specific implementations of stack caches and stack management units are described in U.S. patent application Ser. No. 08/828,899, entitled "Stack Caching Circuit with Overflow/Underflow unit", by Sailendra Koppala, now U.S. Pat. No. 6,167,400, which is hereby incorporated by reference.
Once stack latency is reduced, the operating speed of a stack based computing system may be limited by the rate at which stack operations can be performed. In general-purpose processing units, such as RISC microprocessors, pipelining and super-scalar implementation are used to improve the performance of the processing units. However, the techniques used for RISC processors are not easily adapted to stack based computing systems. For example, in super-scalar architectures, data dependencies determine which instructions can be issued simultaneously. However, for stack based computing systems, most stack operations use the top of the stack and would thus have a data dependency conflict. Hence, there is a need for a stack based computing system architecture to improve the performance of stack based computing systems.