Most computer users have experienced times when their computer seemingly has “lost its mind” and starts behaving in seemingly unexplainable ways. For example, sometimes we command the computer to do something—but instead of doing what we ask, the computer “stops responding” and needs to be “rebooted” (e.g., turned off and back on again). This process can waste significant time while the computer restarts. Work product is sometimes lost—frustrating users to no end.
Productivity and efficiency are significant problems, but there are contexts in which undesired or unexpected software behavior can have even more significant impact. Imagine for example a situation in which an emergency call from a police officer or someone in the military does not get through due to a software failure. Consider a situation in which a hospital life support device ceases to operate properly because of a software defect. Suppose an aircraft or spacecraft automatic pilot or computer-based navigation system ceases to operate properly because of a problem with the way the software was written. Imagine the result if an electrical power grid fails due to a software failure.
Ultimately, most such problems are caused by programming errors (sometimes called “bugs”). As computer programs become increasingly complex, it is more difficult for the people writing the computer code to take into account every possible condition that the computer program may encounter. Unfortunately, a computer program will “break” if the code encounters an undefined condition it does not “know” how to handle.
Another range of problems relates to attackers taking advantage of undefined computer program behavior to do harm. Several of the undefined behaviors of C and C++ have received much attention in the popular press as well as technical journals, because their effects have inflicted billions of dollars of damage in the USA and worldwide. In particular, the “buffer overflow” (also known as “buffer overrun”) and “null pointer indirection” behaviors have created vulnerabilities in widely-used software from many different vendors. This problem of buffer overflows is no longer an obscure technical topic. This is the vulnerability through which most worms and viruses attack. The worldwide total costs due to malicious hacker attacks during 2002 have been estimated to be between 40 and 50 billion USD; costs for 2003 were estimated between 120 and 150 billion USD. See e.g., David Berlind, “Ex-cybersecurity czar Clarke issues gloomy report card” (ZDNet TechUpdate Oct. 22, 2003).
Much work has been done in the past to make software more robust and reliable. However, further improvements are possible and desirable. The technology herein provides new and useful techniques that can be used individually and/or in combination to test and/or certify that software—including but not limited to software written in the “C” family of programming languages—is safe, secure and/or substantially defect-free. While the techniques discloses herein can advantageously be incorporated into purely automatic, machine-operated scenarios (i.e., one piece of software can be used to test other software) to provide a comprehensive testing and certification solution, these techniques can also be used together or separately in less automated contexts (e.g., in conjunction with analysis and review by humans) to provide testing and/or verification capabilities. Such techniques can be used to test and certify any kind of software, including but not limited to compilers, that perform any kind of functionality imaginable. The software being tested could be intended to run on any type of computing device including for example personal computers, embedded controllers, mainframe computers, networking contexts, servers or any other type of computing device. The techniques herein are generally applicable to a wide range of problems and solutions, and should by no means be limited to the particular scenarios described below.
For example and in greater detail, an international standard has been developed for the programming language C, which is designated ISO/IEC 9899:2002(E) (“the ISO C99 standard”, i.e., “the C standard”). Similarly, an international standard has been developed for the programming language C++, which is designated ISO/IEC 14882:2003(E) (“the ISO C++ standard”, i.e., “the C++ standard”). The previous international standard for the programming language C was designated ISO/IEC 9899:1990(E) (“the ISO C90 standard”). Each of these standards defines certain situations using the category of “undefined behavior”. The C Standard contains the following definition: “3.4.3 undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements. NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).” The C++ Standard contains a similar definition: “1.3.12 undefined behavior: behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. Undefined behavior may also be expected when this International Standard omits the description of any explicit definition of behavior. [Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.]”
Some undefined behaviors can be eliminated by using techniques already known in the current art. The next sections will describe some exemplary such techniques.
Design-Time Choices [dt]
Several undefined behaviors can be addressed by design choices; these undefined behaviors are marked with “dt” in column one of the table below. In general, the guiding principle behind these design choices is that non-portable behavior is generally not as bad as undefined (unsafe) behavior. For example; byte-ordering affects the numeric value of results, but so long as address bounds are not exceeded, byte-ordered integer values produce something well-defined on each hardware platform.                a. The representation of a null pointer can be all-bits-zero.        b. The representation of pointers can be binary two's-complement with non-signaling wraparound.        c. Every possible binary value can be interpreted as a valid data element. Every data value can be fetched safely; in that sense, there are no “trap representations”. A “trap” can result if fetch or store of an invalid pointer is attempted, but not upon calculation or comparison of addresses. Therefore, uninitialized memory can be fetched safely. An incompletely-read buffer after a read error (such as in Standard C subclauses 7.19.7.2, 7.19.7.7, 7.24.3.2, etc) still contains data bytes which will not cause traps upon fetch. If any hardware datatype does contain “trap representations” at the assembler-code level, then the implementation can catch any such trap (invisibly to the C/C++ code) and replace the value in the register with a value that conforms to Safe Secure design-time choices (such as a “quiet NaN” for floating-point values).        d. A request to the allocation functions malloc and calloc to allocate zero bytes can cause the allocation of the smallest non-zero allocation.        e. If the number-of-elements argument is zero, string and wide-string and sorting and searching functions can do-nothing gracefully.        f. The sorting and searching functions can be limited to no more than an implementation-defined maximum number of iterations.        g. The algorithms for converting between wide characters and (narrow) characters can produce deterministic results for all inputs, in either direction. Therefore, when a stream was written wide-oriented and read byte-oriented, the behavior can be implementation-defined and not undefined, and similarly for a stream written byte-oriented and read wide-oriented.        h. The wcstok function can be implemented so that, if it is invoked with a null pointer, then the pointer argument need not be equal to the pointer argument of the previous, but can require only that the “saved” pointer must designate some non-const array of characters, null-terminated.        i. The wcstok and strtok functions can be implemented so that, if the first invocation passes a null pointer, the function can ignore it and return a null pointer; alternatively, the function can invoke a “Code-Generation Choice” (see below).        j. The compiler can be configured for each accompanying set of Standard C++ Library functions, so that several undefined behaviors can be eliminated by design-time choices.        k. The compiler can issue a fatal diagnostic for all visible attempts to modify a string literal. When a string literal has become the target of a pointer, the methods shown in this Application will ensure that the pointer will not be used to modify storage outside the bounds of the string literal's array. In-bound modifications made to that array will exhibit well-defined behavior according to the underlying machine model: if the array has been allocated in a ROM or write-protected segment, the attempt to write will either cause a Code-Generation Choice or a no-op.        l. [reserved—no L]        m. The allocation functions can always return one minimum-sized storage allocation in response to the request to allocate zero bytes. The Requirement of any subsequent fetch-or-store through that pointer must be met, regarding both range and type.        n. Each static variable can be accompanied by an initialization-guard flag. Upon entry to the construction or destruction of a block-scope, file-scope, or dynamically-loaded object with static storage duration, the flag is set. This flag is cleared when construction or destruction is complete.        o. The implementation can analyze the code of each C++ special function (constructor and destructor) to determine whether any undefined behavior would result from re-entering that function before a prior invocation has returned. If so, the generated code for that function shall test the initialization-guard flag to prevent such re-entry.        p. The implementation can provide a dummy function to be invoked any time the user program erroneously calls a pure virtual C++ function, which will invoke a Code-Generation Choice In non-Debug mode, a no-op can be performed.        q. The implementation can provide an API which will incorporate the functionality of the C atexit function, along with extra information to allow the execution of destructors for static objects in the reverse order of construction, even including dynamic libraries.        
In this Application, by way of non-limiting example only, the undefined behaviors of C and C++ are itemized in several tables. In each table, the first column is headed “SSM#” and represents the “Safe-Secure Method Name”; for example, in the following table, each entry in column one specifies “dt” for the “Design-time choices [dt]” subsection of this Application. The second column is headed either “C-Std #” for “C Standard Number” or “C++-Std #” for “C++ Standard Number”, i.e., the subclause number of the ISO/IEC standard for C or C++. The third column is headed “Description” and describes the specific undefined behavior.
The methods shown in this section can be used to eliminate the following undefined behaviors:
SSM#C-Std#Descriptiondtc7.19.2A byte input/output function is applied to a wide-oriented stream,or a wide character input/output function is applied to a byte-oriented streamdtc7.13.2.1After a longjmp, there is an attempt to access the value of anobject of automatic storage class with non-volatile-qualified type,. . .dtc7.13.2.1. . . local to the function containing the invocation of thecorresponding setjmp macro, that was changed between thesetjmp invocation and longjmp calldtc6.5.16.1An object containing no pointers is assigned to an inexactlyoverlapping object or to an exactly overlapping object withincompatible typedtc6.5.16.1An object containing pointers is assigned to an inexactlyoverlapping object or to an exactly overlapping object withincompatible typedtc7.14.1.1A signal occurs other than as the result of calling the abort orraise function, and the signal handler refers to an object withstatic storage duration other than by assigning a value to anobject declared as volatile sig_atomic_t, or . . .dtc7.14.1.1. . . calls any function in the standard library other than the abortfunction, the _Exit function, or the signal function (for the samesignal number)dtc6.2.6.1A trap representation is produced by a side effect that modifiesany part of the object using an lvalue expression that does nothave character typedtc6.2.6.1A trap representation is read by an lvalue expression that doesnot have character typedtc6.3.1.4Conversion to or from an integer type produces a value outsidethe range that can be representeddtc6.3.1.5Demotion of one real floating type to another produces a valueoutside the range that can be representeddtc6.4.5The program attempts to modify a string literaldtc6.5Between two sequence points, an object is modified more thanonce, or is modified and the prior value is read other than todetermine the value to be storeddtc6.5.6The result of subtracting two pointers is not representable in anobject of type ptrdiff_tdtc6.5.7An expression having signed promoted type is left-shifted andeither the value of the expression is negative or the result ofshifting would be not be representable in the promoted typedtc6.5.7An expression is shifted by a negative number or by an amountgreater than or equal to the width of the promoted expressiondtc6.5accAn object has its stored value accessed other than by an lvalue ofan allowable typedtc6.7.3An attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualifiedtypedtc6.7.3An attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualifiedtypedtc6.7.8The value of an unnamed member of a structure or union is useddtc6.9.1The } that terminates a function is reached, and the value of thefunction call is used by the callerdtc7.11.1.1The program modifies the string pointed to by the value returnedby the setlocale functiondtc7.11.2.1The program modifies the structure pointed to by the valuereturned by the localeconv functiondtc7.13.2.1The longjmp function is invoked to restore a nonexistentenvironmentdtc7.14.1.1A signal handler returns when the signal corresponded to acomputational exceptiondtc7.14.1.1A signal is generated by an asynchronous signal handlerdtc7.14.1.1A signal occurs as the result of calling the abort or raise function,and the signal handler calls the raise functiondtc7.14.1.1The value of errno is referred to after a signal occurred other thanas the result of calling the abort or raise function and thecorresponding signal handler obtained a SIG_ERR return from acall to the signal functiondtc7.19.5.2The stream for the fflush function points to an input stream or toan update stream in which the most recent operation was inputdtc7.19.6.1,A % conversion specifier is encountered by one of the formattedc7.19.6.2,input/output functions, but the complete conversion specificationc7.24.2.1,is not exactly %%c7.24.2.2dtc7.19.6.2,A c, s, or [ conversion specifier with an I qualifier is encounteredc7.24.2.2by one of the formatted input functions, but the input is not a validmultibyte character sequence that begins in the initial shift statedtc7.19.7.2,The contents of the array supplied in a call to the fgets, gets, orc7.19.7.7,fgetws function are used after a read error occurredc7.24.3.2.dtc7.19.8.1A partial element read by a call to the fread function is usedc7.19.8.1,The file position indicator for a stream is used after an errorc7.19.8.2occurred during a call to the fread or fwrite functiondtc7.20.3A non-null pointer returned by a call to the calloc, malloc, orrealloc function with a zero requested size is used to access anobjectdtc7.20.3.3The value of the object allocated by the malloc function is useddtc7.20.3.4The value of any bytes in a new object allocated by the reallocfunction beyond the size of the old object are useddtc7.20.4.5,The string set up by the getenv or strerror function is modified byc7.21.6.2the programdtc7.20.5The comparison function called by a searching or sorting utilityfunction alters the contents of the array being searched or sorted,or returns ordering values inconsistentlydtc7.20.5.1The array being searched by the bsearch function does not haveits elements in proper order
SSM#C++-Std#Descriptiondt2.13.4 para 2Attempting to modify a string literaldt3.6.1 para 4std::exit is called to end a program during the destruction of anobject with static storage duration.dt3.7.3.1 para 2Dereferencing a pointer returned as a request for zero sizedt4.8 para 1Floating-point conversion, source value out-of-range of targetdt4.9 para 1Floating-point to integer conversion, source value out-of-range oftargetdt5.3.4 para 6In a direct-new-declarator the expression evaluates to a negative valuedt6.7 para 4Control re-enters initialization recursively.dt14.6.4.2 para 1Function lookup would have been ill-formed, or better match, if alltranslation units were considered.dt17.4.3.6 para 2A replacement function that does not implement Requiredbehaviordt17.4.3.6 para 2A handler function that does not implement Required behaviordt17.4.3.6 para 2A template argument does not implement Requirementsdt17.4.3.6 para 2Replacement function, handler function, or dtor throws anexception (unless specifically allowed)dt18.1 para 5Taking offsetof of a non-POD typedt26.2 para 3Result of function is not mathematically defined, or not in range ofrepresentable valuesdt27.4.2.7 para 1ios_base object is destroyed before basic_ios::init initializes themembersdt27.4.4 para 2error value P(O(−1)) is used as arg to . . . member that acceptstraits::pos_typedt27.4.4.1 para 2basic_ios object is destroyed before init initializes the membersdt27.7.1.3 para 14the sp arg to seekpos has not been obtained by previoussuccessful call to a positioning functionText Streams and Character Representations [code]
An exemplary implementation can use a specific choice among the Unix/POSIX/Linux encoding of text files (with LF line terminators), the Macintosh encoding of text files (with CR line terminators), or the Microsoft Windows encoding of text files (with CR/LF line terminators). All mbstate_t conversions can produce implementation-defined results, even after changing the LC_CTYPE category.
An implementation can make truncated-result behavior well-defined in strxform, strftime, wcsxform, or wcsftime.
The multibyte functions can behave gracefully when given a sequence not in the initial shift state, or when given any mbstate_t object.
The wide-character classifying and conversion functions can be well-defined for any wint_t input and for any LC_CTYPE setting.
The Standard C++ Library can be designed to provide a valid result for operator* at end-of-stream.
The methods shown in this section can be used to eliminate the following undefined behaviors:
SSM#C-Std#Descriptioncodec7.19.2Use is made of any portion of a filebeyond the most recent wide characterwritten to a wide-oriented streamcodec7.19.6.1, c7.19.6.2,The format in a call to one of thec7.23.3.5, c7.24.2.1,formatted input/output functions or to thec7.24.2.2, c7.24.5.1strftime or wcsftime function is not avalid multibyte character sequence thatbegins and ends in its initial shift statecode24.5.3 para 2The result of operator* on end-of-streamSecure Library [slib]The secure (or “Bounds-checking”) library enhancements being standardized by ISO/IEC JTC 1 SC22/WG14 will eliminate many opportunities for undefined behavior (see www.open-std.org/jtcl/sc22/wg14/www/docs/n1093.pdf). Furthermore, if a formatted I/O function produces more than INT_MAX chars of output, then it can return INT_MAX.
The methods shown in this section can be used to eliminate the following undefined behaviors:
SSM#C-Std#Descriptionslibc7.19.6.1, c7.19.6.3,The number of characters transmitted byc7.19.6.8, c7.19.6.10a formatted output function is greaterthan INT_MAXSs_unwind [longj]
The longjmp function (and any other functions which “unwind” the stack), can check whether execution of atexit-registered functions has started. If so, one of the following implementation-defined actions can be performed: cause a return from the function that invoked the unwind or longjmp function, invoke an “extreme exit” cleanup function; or invoke the abort function. Optionally, at the point of catching the ss_unwind, a system sanity check can be performed before continuing or re-starting. Another option for the implementation of an ss_unwind capability is provided by the new “Bounds-checking” library of C (see “Secure Library [slib]”), known as the “abort” version of the “constraint handler”; this handler causes either a breakpoint in a debugger or immediate execution of an abort. In C++, a similar constraint handler can optionally cause a breakpoint in a debugger, abort, or throw a specified exception. (These are the “SSCC-compatible constraint handlers.) The constraint handler can be invoked by code compiled as Debug mode or as Production mode. Therefore, each instance where distinctions are made between Debug and Production mode is revised to an implementation-specified choice among the following alternative behaviors: (1) invoke the current SSCC-compatible constraint handler; (2) invoke an implementation-specified “unwind” function (which has been generically referred to as “ss_unwind” herein; (3) execute an implementation-specified form of “Keep-On-Running” behavior such as Modwrap, Saturation, or ZeroBound. This implementation-specified choice among behaviors is referred to as an “unwind” or an “ss_unwind” or the “Code-Generation Choice” herein.
The methods shown in this section can be used to eliminate the following undefined behavior:
SSM#C-Std#Descriptionlongjc7.20.4.3During the call to a function registered with theatexit function, a call is made to the longjmpfunction that would terminate the call to theregistered function
SSM#C++-Std#Descriptionlongj18.7 para 4If autos would be destroyed by thrown exceptiontransferring to destination, longjmp to thatdestination has undefined behaviorSpecial Behavior of Atexit Functions [atex]
The exit function can check whether execution of the exit function has previously started. If so, one of the following implementation-defined actions can be performed: invoke an “extreme exit” cleanup function; or invoke the abort function.
The methods shown in this section can be used to eliminate the following undefined behavior:
SSM#C-Std#Descriptionatexc7.20.4.3The program executes more than one call to theexit functionArithmetic Exceptions [exc]
If at compile-time the right operand of division or remainder is zero, a fatal diagnostic message can be produced. In Debug mode, if at run-time the right operand of division or remainder is zero, an “unwind” (such as ss_unwind) can be invoked, and the implementation may throw an exception of an implementation-defined type. In non-Debug mode, if at run-time the right operand of division or remainder is zero, the result can be the maximum value of the result type, which for a floating-point type may be an infinity.
If at compile-time the left operand of division or remainder is the maximum negative value of its type and the right operand is −1, a fatal diagnostic message can be produced. In Debug mode, if at run-time the left operand of division or remainder is the maximum negative value of its type and the right operand is −1, an “unwind” (such as ss_unwind) can be invoked, and the implementation may throw an exception of an implementation-defined type. In non-Debug mode, if at run-time the left operand of division or remainder is the maximum negative value of its type and the right operand is −1, the result can be the maximum value of the result type.
If at compile-time the result of an integral arithmetic operation is too large for its type, a fatal diagnostic message can be produced. In Debug mode, if at run-time the result of an integral arithmetic operation is too large for its type, an “unwind” (such as ss_unwind) can be invoked, and the implementation may throw an exception of an implementation-defined type. In non-Debug mode, if at run-time the result of an integral arithmetic operation is too large for its type, the result can be the value of the twos-complement operation with wrap-around.
The methods shown in this section can be used to eliminate the following undefined behaviors:
SSM#C-Std#Descriptionexcc6.5.5The value of the second operand of the / or %operator is zeroexcc6.5excAn exceptional condition occurs during theevaluation of an expressionControl of Dangling Pointers [dang]
One category of undefined behavior arises from accessing freed storage. Furthermore, each freed pointer must previously have been allocated.
These undefined behaviors can be eliminated by use of garbage collection, either conservative (see, e.g., Hans-J Boehm, “A Garbage Collector for C and C++”, or accurate (see e.g., Fergus Henderson, “Accurate Garbage Collection in an Uncooperative Environment”, ISMM'02, June 2002, Berlin, Germany, ACM 1581135394/02/0006, supplemented with the following special treatment of pointers to terminated stack frames. Directly assigning an address in the current function's stack frame to a longer-life pointer can be prohibited. Define a pointer-retainer function as a function which stores a pointer argument in heap or static storage. Passing a pointer to stack to a pointer-retainer function can be prohibited. (Whatever data resides in the stack can be copied to heap or to static, to avoid the prohibition.)
Memory that could contain pointers can be initialized to zeroes. Therefore, (as in Boehm conservative garbage-collection) malloc allocates space that might have pointers in it, so the space is zero-filled. There can be a new attribute to describe a state named e.g. “not_ptrs” for any storage which is guaranteed not to contain pointers, and a different version of malloc can be used for such storage (equivalent to GC_malloc_atomic in the Boehm library):
void * malloc not_ptrs(size t n);
If storage with the not_ptrs attribute is cast to pointer-to-anything, then a fatal diagnostic message can be produced. The not_ptrs attribute can be removed from any storage by assigning zero to the bytes of the storage; a byte-oriented alias is mandatory (char, or unsigned char, or a library function such as memset which modifies the bytes of memory).
An alternative method for prevention of dangling pointers is known (see e.g., Todd M. Austin et al., Efficient Detection of All Pointer and Array Access Errors, Proceedings of the ACM SIGPLAN '94 Conference on Programming Language Design and Implementation, June 1994), which is a feasible solution for an implementation which operates entirely in BSAFE mode (see below).
The methods shown in this section can be used to eliminate the following undefined behaviors:
SSM#C-Std#Descriptiondangc7.20.3.2,The pointer argument to the free or reallocc7.20.3.4function does not match a pointer earlierreturned by calloc, malloc, or realloc, or thespace has been deallocated by a call to freeor reallocdangc7.20.3The value of a pointer that refers to spacedeallocated by a call to the free or reallocfunction is useddangc6.2.4An object is referred to outside of its lifetimedangc6.2.4The value of a pointer to an object whoselifetime has ended is usedInclusion of C 1999 Extensions [c99]
In C99 programs which are not C++ programs, some undefined behaviors can be eliminated by using techniques already known in the current art. The next paragraphs will describe some exemplary such techniques.
The compiler can produce a fatal diagnostic message for the following situations which can be detected at compile-time: a function with external linkage is declared with an inline function specifier, but is not also defined in the same translation unit; the CX_LIMITED_RANGE, FENV_AX2ESS, or FP_CONTRACT pragma is used in any context other than outside all external declarations or preceding all explicit declarations and statements inside a compound statement; an argument to a floating-point classification or comparison macro is not of real floating type; a complex argument is supplied for a generic parameter of a type-generic macro that has no corresponding complex function; the type of an argument to a type-generic macro is not compatible with the type of the corresponding parameter of the selected function; part of the program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, but was translated with the state for the FENV_AX2ESS pragma off.
The implementation of library functions can in Debug mode strictly validate argument values, and in non-Debug mode either strictly validate or adjust argument values to acceptable argument values.
The compiler can use Saturation semantics to produce well-defined results for the following situation: the value of the result of an integer arithmetic or conversion function cannot be represented.
The methods shown in this section can be used to eliminate the following undefined behaviors:
SSM#C-Std#Descriptionc99c6.7.4A function with external linkage is declared with an inline functionspecifier, but is not also defined in the same translation unitc99c7.3.4, c7.6.1,The CX_LIMITED_RANGE, FENV_AX2ESS, or FP_CONTRACTc7.12.2pragma is used in any context other than outside all externaldeclarations or preceding all explicit declarations and statementsinside a compound statementc99c7.12.3, c7.12.14An argument to a floating-point classification or comparison macrois not of real floating typec99c7.22A complex argument is supplied for a generic parameter of a type-generic macro that has no corresponding complex functionc99c7.22The type of an argument to a type-generic macro is not compatiblewith the type of the corresponding parameter of the selectedfunctionc99c7.6.1Part of the program tests floating-point status flags, sets floating-point control modes, or runs under non-default mode settings, butwas translated with the state for the FENV_AX2ESS pragma offc99c7.6.2The exception-mask argument for one of the functions that provideaccess to the floating-point status flags has a nonzero value notobtained by bitwise OR of the floating-point exception macrosc99c7.6.2.4The fesetexceptflag function is used to set floating-point statusflags that were not specified in the call to the fegetexceptflagfunction that provided the value of the corresponding fexcept_tobjectc99c7.6.4.3, c7.6.4.4The argument to fesetenv or feupdateenv is neither an object setby a call to fegetenv or feholdexcept, nor is it an environmentmacroc99c7.8.2.1, c7.8.2.2,The value of the result of an integer arithmetic or conversionc7.8.2.3, c7.8.2.4,function cannot be representedc7.20.6.1,c7.20.6.2, c7.20.1Conditionally-Defined Behaviors [cdef]
Many of the situations defined as undefined behavior could be more precisely delineated by permitting a reduced range of the alternatives (as has been described in various places in the current art). The compiler can implement a choice for each behavior: either produce a fatal diagnostic message, or produce a specified implementation-defined behavior, for each of the situations coded with “cdef” in column one of the following table.
The methods shown in this section will eliminate the following undefined behaviors:
SSM#C-Std#Descriptioncdefc5.1.1.2A nonempty source file does not end in a new-line character which is notimmediately preceded by a backslash character or ends in a partialpreprocessing token or commentcdefc6.6A constant expression in an initializer is not, or does not evaluate to, oneof the following: an arithmetic constant expression, . . .cdefc6.6. . . a null pointer constant, an address constant, or an address constant foran object type plus or minus an integer constant expressioncdefc6.6An arithmetic constant expression does not have arithmetic type; hasoperands that are not integer constants, floating constants, . . .cdefc6.6. . . enumeration constants, character constants, or sizeof expressions; orcontains casts (outside operands to sizeof operators) other thanconversions of arithmetic types to arithmetic typescdefc6.6An expression that is required to be an integer constant expression doesnot have an integer type; has operands that are not integer constants,enumeration constants, character constants, sizeof expressions whoseresults are integer constants, or . . .cdefc6.6. . . sizeof expressions whose results are integer constants, or immediately-cast floating constants; or contains casts (outside operands to sizeofoperators) other than conversions of arithmetic types to integer typescdefc6.7.5.3In a context requiring two function types to be compatible, they do nothave compatible return types, or . . .cdefc6.7.5.3. . . their parameters disagree in use of the ellipsis terminator or the numberand type of parameter (after default argument promotion, when there is noparameter type list or when one type is specified by a function definitionwith an identifier list)cdefc5.1.1.2A nonempty source file does not end in a new-line character which is notimmediately preceded by a backslash character . . .cdefc5.1.1.2A nonempty source file . . . or ends in a partial preprocessing tokencdefc5.1.1.2Token concatenation produces a character sequence matching the syntaxof a universal character namecdefc5.1.2.2.1A program in a hosted environment does not define a function named mainusing one of the specified formscdefc5.2.1A character not in the basic source character set is encountered in asource file, except in an identifier, a character constant, a string literal, aheader name, a comment, or a preprocessing token that is neverconverted to a tokencdefc5.2.1.2An identifier, comment, string literal, character constant, or header namecontains an invalid multibyte character or does not begin and end in theinitial shift statecdefc6.10.1The token defined is generated during the expansion of a #if or #elifpreprocessing directive, or the use of the defined unary operator does notmatch one of the two specified forms prior to macro replacementcdefc6.10.2The #include preprocessing directive that results after expansion does notmatch one of the two header name formscdefc6.10.2The character sequence in an #include preprocessing directive does notstart with a lettercdefc6.10.3There are sequences of preprocessing tokens within the list of macroarguments that would otherwise act as preprocessing directivescdefc6.10.3.2The result of the preprocessing operator # is not a valid character stringliteralcdefc6.10.3.3The result of the preprocessing operator ## is not a valid preprocessingtokencdefc6.10.4The #line preprocessing directive that results after expansion does not match oneof the two well-defined forms, or its digit sequence specifies zero or a numbergreater than 2147483647cdefc6.10.6A #pragma STDC preprocessing directive does not match one of the well-definedformscdefc6.10.8The name of a predefined macro, or the identifier defined, is the subject of a#define or #undef preprocessing directivecdefc6.2.2The same identifier has both internal and external linkage in the same translationunitcdefc6.2.6.2The arguments to certain operators are such that could produce a negative zeroresult, but the implementation does not support negative zeroscdefc6.3.2.1A non-array lvalue with an incomplete type is used in a context that requires thevalue of the designated objectcdefc6.3.2.1An lvalue having array type is converted to a pointer to the initial element of thearray, and the array object has register storage classcdefc6.3.2.2An attempt is made to use the value of a void expression, or an implicit or explicitconversion (except to void) is applied to a void expressioncdefc6.3.2.3Conversion between two pointer types produces a result that is incorrectly alignedcdefc6.4An unmatched ' or character is encountered on a logical source line duringtokenizationcdefc6.4.1A reserved keyword token is used in translation phase 7 or 8 for some purposeother than as a keywordcdefc6.4.2.1A universal character name in an identifier does not designate a character whoseencoding falls into one of the specified rangescdefc6.4.2.1The initial character of an identifier is a universal character name designating a digitcdefc6.4.2.1Two identifiers differ only in nonsignificant characterscdefc6.4.2.2The identifier _func_is explicitly declaredcdefc6.4.7The characters ', \,, , c, c, or , c* occur in the sequence between the < and >delimiters, or the characters ', \, , c, c, or , c* occur in the sequence between thedelimiters, in a header name preprocessing tokencdefc6.5.4A pointer is converted to other than an integer or pointer typecdefc6.6The value of an object is accessed by an array-subscript [ ], member-access . or ->,address &, or indirection * operator or a pointer cast in creating an addressconstantcdefc6.7An identifier for an object is declared with no linkage and the type of the object isincomplete after its declarator, or after its init-declarator if it has an initializercdefc6.7.1A function is declared at block scope with an explicit storage-class specifier otherthan externcdefc6.7.2.1A structure or union is defined as containing no named memberscdefc6.7.2.3When the complete type is needed, an incomplete structure or union type is notcompleted in the same scope by another declaration of the tag that defines thecontentcdefc6.7.3The specification of a function type includes any type qualifierscdefc6.7.3Two qualified types that are required to be compatible do not have the identicallyqualified version of a compatible typecdefc6.7.5.1Two pointer types that are required to be compatible are not identically qualified, orare not pointers to compatible typescdefc6.7.5.2In a context requiring two array types to be compatible, they do not havecompatible element types, or their size specifiers evaluate to unequal valuescdefc6.7.5.2The size expression in an array declaration is not a constant expression andevaluates at program execution time to a nonpositive valuecdefc6.7.5.3A storage-class specifier or type qualifier modifies the keyword void as a functionparameter type listcdefc6.7.8The initializer for a scalar is neither a single expression nor a singleexpression enclosed in bracescdefc6.7.8The initializer for a structure or union object that has automatic storageduration is neither an initializer list nor a single expression that hascompatible structure or union typecdefc6.7.8The initializer for an aggregate or union, other than an array initialized by astring literal, is not a brace-enclosed list of initializers for its elements ormemberscdefc6.9.1A function definition includes an identifier list, but the types of theparameters are not declared in a following declaration listcdefc6.9.1A function that accepts a variable number of arguments is defined withouta parameter type list that ends with the ellipsis notationcdefc6.9.1An adjusted parameter type in a function definition is not an object typecdefc6.9.2An identifier for an object with internal linkage and an incomplete type isdeclared with a tentative definitioncdefc7.1.2A header is included within an external declaration or definitioncdefc7.2The argument to the assert macro does not have a scalar typecdefc7_17The member designator parameter of an offsetof macro is an invalid rightoperand of the . operator for the type parameter, or designates a bit-fieldcdefc7_18.4The argument in an instance of one of the integer-constant macros is not adecimal, octal, or hexadecimal constant, or it has a value that exceeds thelimits for the corresponding typeDynamic Monitoring of Allocated Storage [dyna]
The methods described below will in some cases require a fatal diagnostic for situations in which the compiler and linker are given insufficient information to determine that fetch or store operations do not introduce undefined behavior. A recent article has published a method which can alternatively be applied to these most-difficult cases: “A Practical Dynamic Buffer Overflow Detector”, by O. Ruwase and M. S. Lam, (http://suif.stanford.edu/papers/tunji04.pdf). In this alternative, unverifiable fetch-or-store operations can be checked by the cited methods, requiring that all potential fetched-or-stored objects be entered into the cited tables.
It would be desirable to eliminate further undefined behaviors in the execution of programs in the “intersection” of C and C++; that is, in C programs which use only the features described in the C++ standard, and of C++ programs which use only the features described in the C standard.
Furthermore, it would be desirable to eliminate undefined behaviors in the execution of programs in “full C++”, i.e., of C++ programs which use features which are not described in the C standard.
Additionally, it would be desirable to eliminate further undefined behaviors in the execution of programs in “full C99”, i.e., of C99 programs which use features which are not described in the C++ standard or in the 1990 C standard.
It would furthermore be desirable to automate (e.g., through compiler design) techniques to provide safe secure development of software, including but not limited to techniques for addressing undefined behavior in the full C and C++ programming languages.
Advantageous features provided by exemplary illustrative non-limiting implementations of the technology herein include:                A Safe Secure Compiler (“SSC”) which produces Safe Secure Object Files or fatal diagnostic messages.        A Safe Secure Inputs Check-List (“SSICL”) which records checksum information for the inputs to the execution of a Safe Secure Compiler.        A Safe Secure Bounds Data File (“SSBDF”) which records Requirements and Guarantees for the defined and undefined symbols in one or more corresponding object files, as well as checksum information.        A Safe Secure Linker (“SSL”) which combines object files and the corresponding Safe Secure Bounds Data Files, producing either fatal link-time diagnostics or a Safe Secure Executable Program.        A Safe Secure Semantic Analyzer (“SSSA”) which uses the parse tree to determine Requirements and Guarantees.        A Safe Secure Diagnostic Generator (“SSDG”) which generates fatal diagnostic messages in situations where undefined behavior would result and generates various warning messages to call the programmer's attention to various other situations.        A Safe Secure Code Generator (“SSCG”) which generates object code which is free from the designated sets of undefined behaviors (including “buffer overflow” and “null pointer indirection”).        A Safe Secure Pointer Attribute Hierarchy (“SSPAH”) which controls the inference of attributes based upon other attributes.        A Safe Secure Pointer Attribute Predicate Table (“SSPAPT”) which controls the determination of attributes resulting from predicate expressions.        A Safe Secure Bounds Data Table (“SSBDT”) which tabulates the Guarantees and Requirements for expressions, sub-expressions, declarations, identifiers, and function prototypes.        A Safe Secure Interface Inference Table (“SSIIT”) which controls the inference of Requirements on the interface of each externally-callable function.        A Safe Secure Bounds Data Symbol Table (“SSBDST”) which tabulates the Requirements and Guarantees for defined and undefined symbols during the Safe Secure Linking process.        A Safe Secure Link-Time Analyzer (“SSLTA”) which matches Requirements to Guarantees for function-call, external array, and external pointer linkage contexts.        A Safe Secure Link Diagnostic Generator (“SSLDG”) which generates a fatal diagnostic at link-time if any Requirement is unsatisfied; this prevents the production of any executable program.        