In one important application, stack memories provide temporary memory for procedures, also called subroutines or routines, and programs used in processing systems. Stack memories are generally ‘first in last out’ structures. When data A is ‘pushed’ or placed onto the stack and then data B is pushed onto the stack, data B must be ‘popped’ or removed before data A can be popped from the stack.
In the simplest model of a stack, a register referred to here as the stack pointer (SP) in the processor maintains the address of the “top” entry in the region of memory designated as the stack. In a stack that grows downwards with each push, the stack pointer is generally set to the highest address to be used for the stack at program initialization, such as 0xffffffff in a 32-bit system where 0 would be the lowest address. This implementation of a stack is only one of many possible variants and is for discussion purposes only. Embodiments of the invention may be applied to stack architectures that grow by either increasing or decreasing addresses, where the stack base is not at one end of the address space, etc.
In this discussion of a simple stack, the stack pointer is moved to a lower address dictated by the size of the item pushed onto the stack. Pushing data onto the stack causes the amount of in-use data on the stack to grow. The stack pointer moves to a higher address dictated by the size of the item popped from the stack. Popping data from the stack causes the amount of in-use data on the stack to shrink.
Items on the stack are typically referenced by an offset relative to the current stack pointer. For example, if word A and then word B are pushed, B is “at” the stack pointer having an offset of zero words. Word is A is one word prior to the stack pointer having an offset of one word. There are many other embodiments possible, such as one in which B is interpreted as having an offset of one word, and A as having an offset of two words.
Stacks typically have a word size and an alignment that are a function of the size of the general registers of the CPU. For example, an architecture that has 32-bit wide registers would preferably have 32-bit stack words.
On high-performance processors, the use of a cache often decreases the access latency for frequently-referenced regions of memory. Because of their use in holding locally-relevant temporary data, the region of external memory such as in a DRAM corresponding to the active portion of the stack often ends up residing in the cache. The active portion of the stack is the portion used by the current routine. Similarly, other portions of the stack, such as from parents of the currently-active routine, may also be in the cache until they are replaced by other, more-frequently-referenced data. Note that the external memory as referenced in the embodiments could be implemented as one or more similar or disparate types of memory. The type of external memory does not matter.
Caches typically hold blocks of memory that are a power of two in size and alignment, such as a 16-byte block aligned on a 16-byte boundary. A cache line may store a block of external memory in the cache while the contents of that block are being frequently read or written by the CPU. If the cache line is written or modified by the CPU, it is considered to be “dirty” and must be written back to the external memory when it is evicted from the cache. Otherwise the external memory would not have the latest copy of the data and the modifications to that cache line by the CPU would be lost. There are many properties of caches, such as size, degree of associativity, allocation and replacement policies, external memory writing policies, as examples, that can be varied within the scope of the embodiments.
It is possible to implement memory policies recognizing some of the special properties of stack data to determine when external memory reads and writes of stack data are not needed, and to avoid these unnecessary external memory reads and writes, increasing the overall efficiency of the system.