A subprogram is a program which invokes the execution of another program and whose execution is eventually returned to from an invoked program, or a program whose execution is invoked by another program and which ends by returning execution to another program, such as the program that invoked it. Of course, a program can be both an invoking and an invoked subprogram. Subprograms are also known by other names, such as routines, subroutines, functions, and procedures. The term "subprogram" as used herein encompasses all such variations.
Subprograms are extensively used in structured, or modular, programming--a commonly-used programming technique that breaks a task up into a sequence of sub-tasks. Structured programming can result in long chains of subprograms, wherein an initial, or main, subprogram invokes subprogram A, which in turn invokes subprogram B, which invokes subprogram C, and so on. Such chains of subprograms are characteristic of certain types of application programs, such as protocol handlers, transaction handlers, database query servers, and call processing systems. These types of application programs are also characterized by throughput constraints. All other things being equal, throughput is directly proportional to the speed of program execution. Consequently, it is important that the invocations of subprograms and the returns from those invocations be executable as quickly as possible.
Of course, one way of speeding up program execution is to use a faster computer. But computer speed is typically directly proportional to computer cost, and hence this approach is costly. Furthermore, technology invariably sets practical limits to the speed of program execution that can be achieved at any time in the continuum of technological development. Consequently, it is important that the invocation of subprograms and the returns from those invocations be implemented as efficiently as possible in order to achieve the fastest possible program execution with a given computer or computer technology.
The traditional and ubiquitous manner of implementing an invocation of a subprogram and a return from that invocation is the CALL and the RETURN statements. These high-level instructions work together with a stack--a last-in, first-out memory structure. Each subprogram that has not completed execution has a frame of information on the stack. Stored in the stack frame is information such as arguments or parameters, local variables, return values, and other information associated with the subprogram. The CALL statement results in the storage of the context of the calling subprogram in a stack frame, and creation of a stack frame for the called subprogram. The context includes information such as general registers contents, instruction pointer contents, and a condition code. The RETURN statement results in deletion of the stack frame of the returning, previously-called, subprogram from the stack, and the restoration of the processor to the stored context of the returned-to, previously-calling, subprogram.
All of this manipulation of memory contents and processor state is time-consuming, and hence detracts from system throughput. Furthermore, in a long chain of subprogram calls, the stack can grow to occupy a significant area of memory, thereby reducing the amount of memory available for other uses. Aside from the obvious memory limitations that this can impose, it can also detract from system throughput by increasing the frequency of occurrence of certain activities, such as swapping of pages to and from main memory.
In view of these disadvantages of the conventional subprogram call-and-return arrangement, attempts have been made to improve upon it, but with limited success.
One known assembler and link editor combination employs a feature known as "leaf proc", Which causes the link editor to change a standard call instruction into a branch-and-link (BAL) instruction. The BAL instruction leaves an address of the following (in terms of compilation, as opposed to execution, order) instruction stored in a predetermined off-stack location, such as a general-purpose register, and then performs a conventional branch operation to the target instruction as specified by the operand of the BAL instruction. The state of the stack remains unchanged thereby. A return is then accomplished with a branch instruction, whose operand is the general-purpose register into which the BAL instruction had stored the return address. However, the BAL instruction is limited in use for calls to subprograms that are written in assembly language, that do not call other subprograms, including themselves, and that do not require more than a few general-purpose registers.
The UNIX.RTM. operating system employs two pairs of features known as "setjmp" and "longjmp". One pair is implemented as algorithms of the operating system itself, while the other pair is implemented as library functions in the user interface to the operating system.
At the operating system kernel level, the "setjmp" algorithm performs a context switch, but instead of storing the saved context on the stack, saves it in the new process memory area which contains process control information that need be accessed only in the context of that process (the u area). Execution then continues in the context of the old process. When the kernel wishes to resume the context it had saved, it uses the "longjmp" algorithm, which restores the saved context from the u area. This technique is confined to implementations of operating system kernels that include the notion of processes. It is a system-level function available only to the operating system kernel, and is not accessible for use by application programmers.
At the user interface level, the "setjmp" function performs a context switch and saves the old context on the stack, but stores pointers to that context in a predetermined off-stack location. Thereafter, the "longjmp" function, when given the address of the predetermined off-stack location as a parameter, restores and returns to the stored context. While the "longjmp" function can save much of the overhead of the conventional RETURN statement, the "longjmp" function does nothing to reduce the overhead of the conventional CALL statement, such as CPU instruction cycles and stack memory consumption. Also, any subprogram that contains a longjmp instruction must have an ancestor subprogram (a preceding subprogram in the chain of subprogram invocations) that executed a setjmp instruction. Consequently, a subprogram that uses the "longjmp" function cannot be called "normally", as opposed to a subprogram that can be called without restriction from any context. Furthermore, a setjmp cannot be used within a recursive invocation chain (a sequence of subprogram invocations wherein the last invocation of the sequence invokes the subprogram that made the first invocation in the sequence, thereby forming a loop), though the setjmp can be used by an ancestor subprogram of the recursive chain.
Finally, certain known interpreters make use of a feature known as "tail recursion". Usable only in a self-referential recursive function, and usable only when the last action that the recursive function performs before returning is to call itself, this feature suppresses generation of a stack frame for the called iteration of the function and instead reuses the frame of the calling iteration of the function. While effective in eliminating much of the overhead of a conventional CALL-and-RETURN statement sequence, the "tail recursion" feature is limited only to the self-referential recursive function calls, and therefore has limited applicability and usefulness.