The present invention relates generally to systems and methods for increasing the reliability and improving the behavior of software programs. More particularly, the present invention relates to exception-handling systems and methods which assist software developers in the task of ensuring that programs operative on digital computers can recover from exceptions (also known as runtime errors).
Before a digital computer may accomplish a desired task, it must receive an appropriate set of instructions. Executed by the computer's microprocessor, these instructions, collectively referred to as a "computer program," direct the operation of the computer. Expectedly, the computer must understand the instructions which it receives before it may undertake the specified activity.
Owing to their digital nature, computers essentially only understand "machine code," that is, the low-level, minute instructions for performing specific tasks--the sequence of ones and zeros that are interpreted as specific instructions by the computer's microprocessor. Since machine language or machine code is the only language computers actually understand, all other programming languages represent ways of structuring "human" language so that humans can get computers to perform specific tasks.
While it is possible for humans to compose meaningful programs in machine code, practically all software development today employs one or more of the available programming languages. The most widely-used programming languages are the "high-level" languages, such as C or Pascal. These languages allow data structures and algorithms to be expressed in a style of writing which is easily read and understood by fellow programmers.
A program called a "compiler" translates these instructions into the requisite machine language. In the context of this translation, the program written in the high-level language is called the "source code" or source program. The ultimate output of the compiler is an "object module," which includes instructions for execution by a target processor. Although an object module includes code for instructing the operation of a computer, the object module itself is not in a form which may be directly executed by a computer. Instead, it must undergo a "linking" operation before the final executable program is created.
Linking may be thought of as the general process of combining or linking together one or more compiled object modules to create an executable program. This task usually falls to a program called a "linker." In typical operation, a linker receives, either from the user or from an integrated compiler, a list of object modules desired to be included in the link operation. The linker scans the object modules from the object and library files specified. After resolving interconnecting references as needed, the linker constructs an executable image by organizing the object code from the modules of the program in a format understood by the operating system program loader. The end result of linking is executable code (typically an .exe file) which, after testing and quality assurance, is passed to the user with appropriate installation and usage instructions.
Development of programs is largely a trial and error process. Errors that emerge from this program development cycle can be divided into three broad classes: compile-time errors, linkage errors, and runtime errors. Proper development methodologies and quality controls will remove both compile-time errors (such as syntax and format violations) and linkage errors (such as library and global naming inconsistencies), but runtime errors are less amenable to systematic elimination. Indeed, the supreme importance of runtime errors stems from the fact that they are usually discovered by, and provide major frustration to, the end user. Unless handled properly, runtime errors simply abort (terminate) execution, leaving the system in a questionable state and the user uncertain as to what went wrong and what to do next.
There are many reasons for the intractability of the runtime error problem. First, it is difficult to predict every user action during program execution. Although the conscientious programmer guides the user with helpful menus and prompts, and aims to insert code that checks the validity of each user response, in practice, considering the complexities of current graphical user interfaces, it remains a major programming challenge to anticipate and respond to arbitrary user input.
Second, it is difficult, and often impossible, to predict the availabilty of the diverse hardware and software resources required as program execution unfolds. For instance, the running program might request RAM (random access memory) and disk storage allocations at diverse points of its execution, in the absence of which the program cannot usefully continue. Similarly, the running program might call operating system, library, or other routines that are, for various reasons beyond the programmer's control, unavailable at that moment. A common error, for instance, occurs when a program seeks access to a file that is not available. As with hardware resource exceptions, the program must either take evasive action or simply terminate (exit or abort). Exceptions of this type are especially common in modern computing environments where a set of independent user applications, or a set of independently executing threads within the same program, must share the same resources.
Apart from resource availability and unpredicted user actions, a further source of runtime errors involves genuine coding bugs not detectable during compilation or linkage. For example, an arithmetical expression, accepted as legal by the compiler, may produce a runtime error for certain values of its variable components. Typical cases are the "divide-by-zero" error and similar situations where the expression cannot be correctly evaluated. Such errors are predictable and avoidable in theory. In practice, however, traditional exception-handling solutions have involved a hard-coded plethora of conditional tests of variable values before each expression is evaluated, followed by ad hoc routines to bypass invalid evaluations. For example:
______________________________________ if (X != 0) Y/X; // OK to divide by X else // problem: how/where to handle the divide- // by-zero exception? }; ______________________________________
The approach is at best tedious and prone to error.
Most of the high-level languages currently used for program development exploit the concept of modularity whereby a commonly required set of operations can be encapsulated in a separately named subroutine, procedure, or function. Once coded, such subroutines can be reused by "calling" them from any point in the main program. Further, a subroutine may call a subsubroutine, and so on, so that in most cases an executing program is seldom a linear sequence of instructions. In the C language, for example, a main() program is written which calls a sequence of functions, each of which can call functions, and so on. If all goes well, control eventually returns to main(). This nesting of function calls simplifies the construction of programs but, at the same time, complicates the handling of exceptions. The essence of a function call is that it must pass any arguments (or parameters) to the target function, transfer control to the memory section holding the function's executable code, return the result of the call, and at the same time, store sufficient information to ensure that subsequent execution resumes immediately after the point where the original function call was made. This function-calling mechanism, as is well-known in the art, is usually achieved by pushing and pulling data and memory addresses on and off a stack prior to, during, and after, the call. A stack is simply a dedicated portion of memory organized as a LIFO (last in, first out) data structure. The stack is not normally manipulated directly by the programmer, but its contents are changed as a result of the function calls coded by the programmer. Programs do have direct access to another portion of memory, called the heap, and a key element in exception handling involves the management of this vital resource.
After a successful function call, the stack is unwound, that is to say, all data which were "pushed" onto the stack are "popped" off in reverse order, leaving the stack in its pre-call state ready for further function calls; execution resumes in the function which made the call. Note that, since function calls can be nested to arbitrary levels, the stack must maintain a vital, complex sequence of return values and instruction pointers essential to the proper execution of the program. Eventually, absent any problems, control ends back in main(), and after the final successful function call in main(), the program terminates. Any interruption to this unwinding process leads to an unbalanced stack with unpredictable results. For instance, a called function expects to find its arguments in a particular section, known as the function's stack frame, at the top of the stack; if the stack is unbalanced, the function will pull off erroneous data, further compounding the runtime error.
Clearly, exceptions occurring in a nested function can create a particularly difficult problem. Expectedly, several exception-handling approaches have been attempted to address the problem. One approach, for instance, is to have each function return an error indication, either in a separate variable, or as a special range of values for the normal return value. The immediate onus of exception handling then rests on the calling function. If the calling function is unable to cope, it must return an error indication to its calling function, and so on up the chain until either a function is reached that can handle the exception, or until main() is reached. If main() cannot correct the problem, it terminates as gracefully as possible, perhaps displaying an explanatory message for the user.
As an illustration, suppose that main() calls funcA() which, in turn, calls funcB(). funcB() is programmed to return, say, zero for success or a positive number indicating the reason for failure. For example, funcB() might return 1 for "insufficient memory," 2 for "file not found," and so on. funcA() always tests the value returned by funcB(). If this test indicates success, funcA() carries on and eventually returns control to main(). If funcA () detects that funcB() suffered an "insufficient memory" error, it may well be able to correct the situation (by "collecting garbage" or by defragmenting the heap) and then call funcB() again. But if funcA() detects the "file not found" error, it may have no means of handling this situation other than displaying a warning. Unable to continue, funcA() must then return an error value to main(). What, if anything, main() can do with this error will, of course, depend on the particular application.
The merit of this "error chaining" scheme is that the stack is always unwound correctly, but there are several serious disadvantages. Each function in the chain is saddled with code that "looks" for exceptions occuring in its called functions. This code must also "decide" which exceptions can be handled and which ones have to be returned to the calling function. When the function calls are deeply nested, and the number of different exception types increases, the testing and chaining of exceptions becomes a major, error-prone programming headache. A significant obstacle to well-formulated, easy-to-read, maintainable code is apparent from the simple example outlined above. If main() is left to handle an exception returned by funcA(), it may need to know both the type of exception and where the exception occurred. The type of exception is clear from the error code, but the fact that it occurred in funcB() and not in funcA() or, as the program is changed and extended, some other function in the chain, is not immediately apparent without additional error encoding.
One response to this problem is the global (or long) goto label instruction that can transfer control from any point of any function to a routine residing anywhere in memory, at the address given by the identifier, label. Under this regime, the funcB() of the preceding example need not return error codes up the function chain, but, on detecting an error can send control directly to an appropriate exception handler:
______________________________________ retVal funcB ( ) // suggestive code only; not legal C serr = `B`; // optional: identify this function . . . // request memory here if (no.sub.-- mem) goto no.sub.-- mem.sub.-- handler; . . . // try to access file here if (no.sub.-- file) goto no.sub.-- file.sub.-- handler; . . . // funcB does its thing return result; // return to calling function } main ( ) { . . . no.sub.-- mem.sub.-- handler: { // check value of serr and handle exception } no.sub.-- file.sub.-- handler: { // check value of serr and handle exception } . . . } ______________________________________
The routine at no.sub.-- mem.sub.-- handler is presumed to handle all "insufficient memory" errors and, if necessary, use the value of serr to determine in which function the error occurred. In the current terminology, funcB() "throws" the "insufficient memory" exception, while the routine at no.sub.-- mem.sub.-- handler "catches" the exception.
This simple global goto approach has the merit of offering a single, readable place for each exception handler, but in practice it creates more problems than it solves. First, the standard goto instruction in the C and C++ languages operates only within a function; it lacks the required, long-distance power to transfer control between functions. Second, as it stands, the direct transfer to a handler fails to correctly unwind the stack, as described earlier. Finally, and related to the first two objections, additional mechanisms to allow control to return, if necessary, to the throwing function are needed. In order to resume execution in the throwing function on those occasions when the handler is able to "correct" the error, the exception-handling mechanism must allow the preservation and restoration of the state or context of the throwing function.
When funcB() throws an exception, for example, its local variables will hold particular values. As the name implies, the scope and existence of local variables is limited to the "life-span" of the function: they disappear when the function yields control. These local values and other parameters such as the current values in the registers of the central processor constitute the state of funcB(). In particular, the state includes the stack status and the current IP (instruction pointer) that marks the place in memory where execution must be resumed. This state must be completely saved before the handler is called, and then completely restored before execution of funcB() can be safely resumed.
Some of the deficiencies of the global goto "solution" have been alleviated by the introduction of two Standard C library functions, setjmp() and longjmp(). setjmp() can be called in any function at the point at which control should be resumed if a matching longjmp() is called in another function. Typically, longjmp() is called when an exception is thrown. setjmp() takes as an argument the address of (pointer to) a programmer-supplied memory buffer in which the state of the current function will be saved. As discussed earlier, this state holds the processor registers, including the current IP (instruction pointer), needed to resume execution immediately after the setjmp() call. longjmp(), unlike goto, can transfer control across different functions as follows: longjmp() takes as one of its arguments the same buffer address which is used in the matching setjmp(). When longjmp() is called, it recovers the state saved by setjmp(), and transfers control to the address found in the stored IP, namely the instruction following the setjmp() call. Further, longjmp() takes a second numeric argument which can be tested in the function that called setjmp(), thereby providing a mechanism for determining which particular longjmp() caused the jump. The following program snippet illustrates the use of these functions:
______________________________________ #include &lt;setjmp.h&gt; // makes the setjmp ( ) and longjmp ( ) library functions // available jmp.sub.-- buf aJmpBuf; // create a state (jump) buffer void fna ( ) int retval; retval = setjmp(aJmpBuf); // store current state in state buffer if (retval) { printf("Got here via longjmp( ).backslash.n"}; exit(-1); } fnb ( ); fnc ( ); } ______________________________________
In fnb(), fnc(), or in any function they call, or in any function these functions call (and so on), the statement EQU longjmp(aJmpBuf, status);
ensures that the setjmp() in fna() will be "recalled" under special circumstances in order to return value status in retval, following which, control will revert to the if (retval) line in fna(). In the absence of any longjmp() calls in subsequent functions, setjmp() returns zero (false), so that the if (retval) test fails. Thus, the setjmp() and longjmp() pair offer a global goto method for exception handling. Exception handlers can be encapsulated into any convenient set of functions, and after suitable handling, control can, if required, be safely transferred back to the functions in which the exception occurred.
However, the setjmp()/longjmp() solution has pronounced disadvantages. First, there is no guarantee that the function to which longjmp() returns is still active. In the previous example, it is possible that fna() has already returned, relinquishing its place on the stack, before a matching longjmp() is encountered. The only solution to this problem is to restrict setjmp() calls to the main() program. Second, the stack unwinding problem in the presence of nested setjmp ()s and longjmp ()s requires careful explicit programming. Third, setjmp() and longjmp() are not compatible with the Microsoft Windows API (Applications Programming Interface). Finally, many popular program overlaying and virtual memory techniques employ special stacks, so that a function's status is not completely stored by setjmp(). All told, present-day approaches have failed to adequately address the problem of handling exceptions.