Field of the Invention
The present invention relates generally to systems and methods for increasing the reliability and improving the behavior of software programs. More particularly, the present invention relates to exception-handling systems and methods which assist software developers in the task of ensuring that programs operative on digital computers can recover from exceptional conditions and runtime program errors.
Before a digital computer may accomplish a desired task, it must receive an appropriate set of instructions. Executed by the computer's microprocessor, these instructions, collectively referred to as a "computer program," direct the operation of the computer.
Computers essentially only understand "machine code," that is, the low-level instructions for performing specific tasks interpreted as specific instructions by the computer's microprocessor. Since machine language or machine code is the only language computers actually understand, all other programming languages represent ways of structuring "human" language so that humans can get computers to perform specific tasks.
While it is possible for humans to compose meaningful programs in machine code, practically all software development today employs one or more of the available programming languages. The most widely-used programming languages are the "high-level" languages, such as C++/C or Pascal.
A program called a "compiler" translates these instructions into the requisite machine language. In the context of this translation, the program written in the high-level language is called the "source code" or source program. The ultimate output of the compiler is an "object module," which includes instructions for execution by a target processor. Although an object module includes code for instructing the operation of a computer, the object module itself is not in a form which may be directly executed by a computer. Instead, it must undergo a "linking" operation before the final executable program is created.
Linking may be thought of as the general process of combining or linking together one or more compiled object modules to create an executable program. This task usually falls to a program called a "linker." In typical operation, a linker receives, either from the user or from an integrated compiler, a list of object modules desired to be included in the link operation. The linker scans the object modules from the object and library files specified. After resolving interconnecting references as needed, the linker constructs an executable image by organizing the object code from the modules of the program in a format understood by the operating system program loader. The end result of linking is executable code (typically an .exe file) which, after testing and quality assurance, is passed to the user with appropriate installation and usage instructions, or to a factory for installation in products with embedded computer systems.
Development of programs is largely a trial and error process. Errors that emerge from this program development cycle can be divided into broad classes, including compile-time errors, linkage errors, runtime errors, and errors arising at runtime due to unexpected failures beyond programmer control. Examples of such unexpected failures include failures at external resources shared via a network, and failure of the network. Proper development methodologies and quality controls will remove both compile-time errors (such as syntax and format violations) and linkage errors (such as library and global naming inconsistencies), but runtime errors are less amenable to systematic elimination. Indeed, the supreme importance of runtime errors stems from the fact that they are usually discovered by, and provide major frustration to, the end user. Unless handled properly, runtime errors simply abort (terminate) execution, leaving the system in a questionable state and the user uncertain as to what went wrong and what to do next. There are many reasons for the intractability of the runtime error problem. First, it is difficult to predict every user action during program execution. Although the conscientious programmer guides the user with helpful menus and prompts, and aims to insert code that checks the validity of each user response, in practice, it remains a major programming challenge to anticipate and respond to arbitrary user input.
Second, it is difficult, and often impossible, to predict the availability of the diverse hardware and software resources required as program execution unfolds. For instance, the running program might request RAM (random access memory) and disk storage allocations at diverse points of its execution, in the absence of which the program cannot usefully continue. Similarly, the running program might call operating system, library, or other toutines that are, for various reasons beyond the programmer's control, unavailable at that moment. A common error, for instance, occurs when a program seeks access to a file that is not available due to a network failure, for example. As with hardware resource exceptions, the program must either take evasive action or simply terminate (exit or abort). Exceptions of this type are especially common in modem computing environments where a set of independent user applications, or a set of independently executing threads within the same program, must share the same resources.
Apart from resource availability and unpredicted user actions, a further source of runtime errors involves genuine coding bugs not detectable during compilation or linkage. For example, an arithmetical expression, accepted as legal by the compiler, may produce a runtime error for certain values of its variable components. Typical cases are the "divide-by-zero" error and similar situations where the expression cannot be correctly evaluated. Such errors are predictable and avoidable in theory. In practice, however, traditional exception-handling solutions have involved a hard-coded plethora of conditional tests of variable values before each expression is evaluated, followed by ad hoc routines to bypass invalid evaluations. The approach is at best tedious and prone to error.
Most of the high-level languages currently used for program development exploit the concept of modularity whereby a commonly required set of operations can be encapsulated in a separately named subroutine, procedure, or function. Once coded, such subroutines can be reused by "calling" them from any point in the main program. Further, a subroutine may call a subroutine, and so on, so that in most cases an executing program is seldom a linear sequence of instructions. In the C language, for example, a main( ) program is written which calls a sequence of functions, each of which can call functions, and so on. If all goes well, control eventually returns to main( ). This nesting of function calls simplifies the construction of programs but, at the same time, complicates the handling of exceptions. The essence of a function call is that it must pass any arguments (or parameters) to the target function, transfer control to the memory section holding the function's executable code, return the result of the call, and at the same time, store sufficient information to ensure that subsequent execution resumes immediately after the point where the original function call was made. This function-calling mechanism, as is well-known in the art, is usually achieved by pushing and pulling data and memory addresses on and off a stack prior to, during, and after, the call. A stack is simply a dedicated portion of memory usually organized as a LIFO (last in, first out) data structure. The stack is not normally manipulated directly by the programmer, but its contents are changed as a result of the function calls coded by the programmer. Programs do have direct access to another portion of memory, often called the heap, and a key element in exception handling involves the management of this vital resource.
After a successful function call, the stack is unwound, that is to say, all data which were "pushed" onto the stack are "popped" off in reverse order, leaving the stack in its pre-call state ready for further function calls; execution resumes in the function which made the call. Note that, since function calls can be nested to arbitrary levels, the stack must maintain a vital, complex sequence of return values and instruction pointers essential to the proper execution of the program. Eventually, absent any problems, control ends back in main( ), and after the final successful function call in main( ), the program terminates. Any interruption to this unwinding process leads to an unbalanced stack with unpredictable results. For instance, a called function expects to find its arguments in a particular section, known as the function's stack frame, at the top of the stack; if the stack is unbalanced, the function will pull off erroneous data, further compounding the runtime error.
Clearly, exceptional conditions and errors occurring in a nested function can create a particularly difficult problem. Several exception-handling approaches have been attempted to address the problem. One approach, for instance, is to have each function return an error indication, either in a separate variable, or as a special range of values for the normal return value. The immediate onus of exception handling then rests on the calling function. If the calling function is unable to cope, it must return an error indication to its calling function, and so on up the chain until either a function is reached that can handle the exception, or until main( ) is reached. If main( ) cannot correct the problem, it terminates as gracefully as possible, perhaps displaying an explanatory message for the user.
As an illustration, suppose that main( ) calls funcA( ) which, in turn, calls funcB( ). funcB( ) is programmed to return, say, zero for success or a positive number indicating the reason for failure. For example, funcB( ) might return 1 for "insufficient memory," 2 for "file not found," and so on. funcA( ) always tests the value returned by funcB( ). If this test indicates success, funcA( ) carries on and eventually returns control to main( ). If funcA( ) detects that funcB( ) suffered an "insufficient memory" error, it may well be able to correct the situation (by "collecting garbage" or by defragmenting the heap) and then call funcB( ) again. But if funcA( ) detects the "file not found" error, it may have no means of handling this situation other than displaying a warning. Unable to continue, funcA( ) must then return an error value to main( ). What, if anything, main( ) can do with this error will, of course, depend on the particular application.
The merit of this "error chaining" scheme is that the stack is always unwound correctly, but there are several serious disadvantages. Each function in the chain is saddled with code that "looks" for exceptions occurring in its called functions. This code must also "decide" which exceptions can be handled and which ones have to be returned to the calling function. When the function calls are deeply nested, and the number of different exception types increases, the testing and chaining of exceptions becomes a major, error-prone programming headache. A significant obstacle to well-formulated, easy-to-read, maintainable code is apparent from the simple example outlined above. If main( ) is left to handle an exception returned by funcA( ), it may need to know both the type of exception and where the exception occurred. The type of exception is clear from the error code, but the fact that it occurred in funcB( ) and not in funcA( ) or, as the program is changed and extended, some other function in the chain, is not immediately apparent without additional error encoding.
One response to this problem is the global (or long) go to label instruction that can transfer control from any point of any function to a routine residing anywhere in memory, at the address given by the identifier, label. Under this regime, the funcB( ) of the preceding example need not return error codes up the function chain, but, on detecting an error can send control directly to an appropriate exception handler.
For example an exception handler routine at no-mem-handler is presumed to handle all "insufficient memory" errors and, if necessary, use the value of serr to determine in which function the error occurred. In the current terminology, funcB( ) "throws" the "insufficient memory" exception, while the routine at no-mem-handler "catches" the exception.
This simple global go to approach has the merit of offering a single, readable place for each exception handler, but in practice it creates other problems. First, the standard go to instruction in the C and C++ languages operates only within a function; it lacks the required, long-distance power to transfer control between functions. Second, as it stands, the direct transfer to a handler fails to correctly unwind the stack, as described earlier. Finally, and related to the first two objections, additional mechanisms to allow control to return, if necessary, to the throwing function are needed. In order to resume execution in the throwing function on those occasions when the handler is able to "correct" the error, the exception-handling mechanism must allow the preservation and restoration of the state or context of the throwing function.
When funcB( ) throws an exception, for example, its local variables will hold particular values. As the name implies, the scope and existence of local variables is limited to the "life-span" of the function: they disappear when the function yields control. These local values and other parameters such as the current values in the registers of the central processor constitute the state of funcB( ). In particular, the state includes the stack status and the current IP (instruction pointer) that marks the place in memory where execution must be resumed. This state must be completely saved before the handler is called, and then completely restored before execution of funcB( ) can be safely resumed.
Some of the deficiencies of the global go to "solution" have been alleviated by the introduction of two Standard C library functions, setjmp( ) and longjmp( ). setimp( ) can be called in any function at the point at which control should be resumed if a matching longjmp( ) is called in another function. Typically, longjmp( ) is called when an exception is thrown. setjmp( ) takes as an argument the address of (pointer to) a programmer-supplied memory buffer in which the state of the current function will be saved. As discussed earlier, this state holds the processor registers, including the current instruction pointer IP (also called program counter PC), needed to resume execution immediately after the setjmp( ) call. longjmp( ), unlike go to, can transfer control across different functions as follows: longjmp( ) takes as one of its arguments the same buffer address which is used in the matching setjmp( ). When longjmp( ) is called, it recovers the state saved by setjmp( ), and transfers control to the address found in the stored IP, namely the instruction following the setjmp( ) call. Further, longjmp( ) takes a second numeric argument which can be tested in the function that called setjmp( ), thereby providing a mechanism for determining which particular longjmp( ) caused the jump.
In funcA( ), funcB( ), or in any function they call, or in any function these functions call (and so on), the statement "longjmp(aJmpBuf, status); " ensures that the setjmp( ) in funcA( ) will be "recalled" under special circumstances in order to return value status in retval, following which, control will revert to the if (retval) line in funcA( ). In the absence of any longjmp( ) calls in subsequent functions, setjmp( ) returns zero (false), so that the if (retval) test fails. Thus, the setjmp( ) and longjmp( ) pair offer a global go to method for exception handling. Exception handlers can be encapsulated into any convenient set of functions, and after suitable handling, control can, if required, be safely transferred back to the functions in which the exception occurred.
However, the setjmp( )/longjmp( ) solution also has disadvantages. First, there is no guarantee that the function to which longjmp( ) returns is still active. In the previous example, it is possible that fna( ) has already returned, relinquishing its place on the stack, before a matching longjmp( ) is encountered. The only solution to this problem is to restrict setjmp( ) calls to the main( ) program. Second, the stack unwinding problem in the presence of nested setjmp( )s and longjmp( )s requires careful explicit programming. Finally, many popular program overlaying and virtual memory techniques employ special stacks, so that a function's status is not completely stored by setjmp( ). All told, present-day approaches have failed to adequately address the problem of handling exceptions.
The state of local variables are even more complex in languages like C++, where local variables or objects must have special associated functions, known as destructors, which must be called before they disappear.
Some programming languages, for example C++ and other high-level languages have specified mechanisms to ease programming for exceptions, replacing and augmenting the previously described schemes. However, the implementation of these mechanisms is complicated. There are problems that lead to trade off situation between speed and space, which is well documented in the prior art.
A specific problem relates to how to optimally map the location of the return address to the calling function, to information necessary to unwind the stack of calling functions to the point of a handler, or to the point of a decision to call the function "terminate ( )". The information which must be mapped to the return address comprises a pointer to a table which holds necessary data for unwinding the stack-frame, that is restoring registers, restoring the stack pointer and information regarding the general stack-frame layout, and includes a description of allowed and caught exceptions in this frame. Alternatively, the table information could be compacted and stored instead of the pointer. The optimal layout of a stack-frame is highly dependent on the function which calls it, and cannot be guessed without more information than the return address and the value of the stack pointer and/or frame pointer as applicable to a particular implementation. It is desirable for implementation of exception handlers to give as little overhead as possible when the exceptions are not thrown, as they are meant to be used only in exceptual situations.
The predominant implementation in the prior art relates to program counter based tables. However, the time to look up the table of program counter ranges using a current value of the return address is relative to the size of the program. Given a binary search, which is typically used, the time of the search is logarithmic based on the number of calls in a program. Of course, this searching technique could be optimized using hash functions and the like known in the art. However no known implementation uses the hash function solution, probably because it would introduce an extra linker step and the time of the search is not considered important to many designers.
An alternative implementation is based on providing information to locate the information by storing it at locations that are addressed in the code which calls the exception close to the return address. However, the prior art techniques require program space or processor overhead in skipping the extra information in the calling code, using conventional calling techniques. For example, the skipping could be implemented at the return, with any instruction having no visible effect on the data flow such as a no operation NOP with an unused data or address field costing program space, or a move instruction which moves otherwise unused data, in which the unused field or data holds the desired information for the exception handler costing program space and processor overhead. See, Chase, Implementation of Exception Handling, Part1. The Journal of C Language Translation (ISSN1042-5721), Volume 5, Number 4, June 1994 (second part in Volume 6, Number 1, September 1994).