1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for call stack protection.
2. Description Of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
One of the areas in which large advances have been made is in memory management, including memory management in parallel computer systems with high performance compute nodes having little tolerance for TLB or cache misses. Such systems often includes, for example, advanced measures for call stack protection. A call stack is a data structure in computer memory that stores information about the active subroutines of a computer program. The active subroutines are those which have been called but have not yet completed execution by returning. This kind of stack is also known as a execution stack, control stack, function stack, or run-time stack, and is often shortened to just ‘the stack.’ In this specification, however, such a stack, for clarity of explanation, is generally referred to as a ‘call stack.’ Since the call stack is organized as a stack-type data structure, a calling routine pushes its return address—and optionally other information also—onto the stack, and a called subroutine, when it finishes, pops that return address off the call stack and transfers processor control to that address. If a called subroutine calls on to yet another subroutine, it will push its return address onto the call stack, and so on, with the information stacking up and unstacking, pushing and popping, as the application program dictates.
There is typically one call stack associated with each thread of a process. If the number of active subroutines grows very large or if large amounts of data are pushed onto the stack, then the storage occupied by the stack may spill into other areas of process storage which may be allocated for other uses such as the program heap space. Conversely, allocations of non-stack storage such as heap may inadvertently or maliciously be extended into the current stack space.
Many computer systems implement a guard mechanism to detect these types of conflicts. These guard mechanisms are implemented by inserting unmapped or un-accessible address ranges within the address translation tables for the process within the computing system. These additional mappings cause additional fragmentation of the address ranges within the translation table. Within a computing system, there is usually a hardware address translation mechanism containing a fixed number of address translation mappings, typically referred to as a Translation Look-aside Buffer, or ‘TLB.’ If an address being referenced is not in the TLB, a miss condition occurs in which the operating system must obtain the correct mapping from the table and load this into the TLB so that the hardware can translate the address. In computing systems where pages of storage are faulted in from disk, using TLB mappings to implement a guard area is sufficient since the additional performance penalty for the handling of additional TLB misses is small compared to the overall time to handle a page fault.
However, at large compute node counts in highly parallel systems, for example, a phenomenon call “OS Noise” becomes a dominate term and can steal significant performance from applications as random processor interrupts (such as TLB misses) steal cycles from the total peak performance of the system. Ultrascaling high performance computing systems, such as those that implement IBM's BlueGene architecture, have been carefully designed to avoid TLB misses entirely by statically allocating the TLB layout. Since no TLB misses will be taken, the traditional guard page mechanism cannot be implemented on these newer systems.
Another limitation with the traditional guard area implementations is that the granularity of the protection is typically limited to a multiple of the page size. Also the location of the guard area is typically fixed for the process, thereby not adapting to the changing memory usage within the process and not making the most efficient use of the available memory.