Common classes of programming errors involve those that cause a thread to perform memory reads and/or writes beyond allocated memory (i.e., an out-of-bounds memory access). For example, one class of programming errors is caused by improper use of memory reserved by memory allocation functions. Many programming languages and/or libraries provide one or more memory allocation function calls (e.g., malloc( ) in the C Standard Library) that enable a process to request allocation of a block of memory of a specified size (e.g., from a pool of available memory, such as a heap), along with one or more memory deallocation function calls (e.g., free( ) in the C Standard Library) to later deallocate that memory. In general, memory allocation functions locate and reserve a contiguous block of available memory of the specified size from a memory pool, and return a pointer to a memory address at the beginning of the block. The thread can then access memory locations within this reserved block of memory based on integer offsets from that pointer. However, many programming languages may provide little to no protection against the thread actually accessing memory addresses outside of a reserved block. If a thread writes to memory outside of its reserved block, there is a risk that it could improperly overwrite valid memory values (e.g., values that are part of a different data structure and/or that are used by another thread). If a thread reads from memory outside of its reserved block, there is a risk that it could read unintended data (e.g., data from different data structure and/or that was written by another thread), read undefined data (e.g., a memory location that has not yet been written to), or cause an access violation by attempting to access inaccessible memory.
Another common class of programming errors involve those that cause a thread to improperly access memory from a memory location after its validity state has transitioned. For example, in many computer architectures each executing thread is associated with a memory region called a “stack,” which stores temporary local information as the thread executes. In general, a new “stack frame” is added to the stack each time a function is called, and that function's stack frame is removed from the stack when the function terminates. Thus, the stack dynamically grows and shrinks during execution of the thread. Each stack frame allocates one or more memory locations for any of the function's local variables. These memory locations are “valid” for the function to use while the function executes, but become “invalid” for any function to use when the stack frame is removed from the stack. However, coding errors may result in accesses (reads and/or writes) to those memory locations even after the stack frame has been removed from the stack (and the memory locations have become invalid). Programming languages may provide little to no protection against the thread performing these types of improper stack-based memory accesses.
These types of improper memory accesses can be particularly difficult to locate and debug, since they may not actually cause a thread's execution to fail (fault) in all situations. As used herein, an improper memory access that causes a fault is one that causes an error (e.g., a segmentation fault, an access violation, an unhandled exception, etc.) that leads the thread's execution to terminate. This is in contrast to proper memory accesses that may cause events that are commonly termed “faults” (e.g., page faults), but that do not actually cause a thread's execution to terminate in error. A fault that causes execution to fail may occur when invalid data is read and relied upon and causes the execution to “derail” in some manner, or may occur when a thread accesses a memory location that it is not permitted to access or that does not actually correspond to a legal memory address. However, every access beyond allocated memory, or every access to memory that is no longer valid, will not necessarily cause one of these faults to occur. For example, even though a memory access may be improper, it may read valid data (e.g., data that the thread previously wrote and which was not subsequently overwritten), it may be to a memory location the thread is permitted to access, etc.
Thus, for the purposes of this specification, the term “non-faulting” is inclusive of page faults that are not errors (such as used in most virtual memory systems to allow memory to be “paged out” temporarily). Similarly, the term “execution faulting” (or variants thereof) is more restrictive by excluding non-error page faults, and is intended to cover faults that impact the ability to continue execution (e.g., access violations, segmentation faults, unhandled exceptions, and the like). Of course, the term “non-execution-faulting” (or variants thereof) thus indicates the inverse of “execution faulting”.
Prior attempts have been made to locate improper non-faulting memory accesses—but they are not able to detect all of instances and adversely may alter program execution state. For example, one attempt is to use debuggers to set write breakpoints to observe each memory write, and manually determine if it is within bounds. However, this is tedious and is not practical to do on production software. Another attempt is to parse through memory dumps after a program has faulted, in order to try to determine its cause. However, this is again tedious and loses state.
Other prior attempts try to encourage some of these improper memory accesses to fault. For example, some tools insert memory page(s) within a thread's address space adjacent an allocated buffer (e.g., within the heap and/or after a thread's stack space), in which these page(s) comprise memory addresses that are not legal addresses or that the thread is not permitted to access. A fault (e.g., a segmentation fault, an access violation, etc.) will then occur if the thread tries to read too far beyond an allocated buffer and into one of these “guard pages.” However, guard pages would only be able to detect the first class of programming errors (i.e., reading beyond an allocated buffer), and even then, guard pages cannot be used to detect all accesses beyond an allocated buffer. For example, there could still be memory locations the thread is permitted to access that exist between the allocated buffer and the guard page. These memory locations could include padding for memory alignment, other allocated buffers, etc. Another prior attempt to try to encourage improper memory accesses to fault involves pre-filling stack locations with a predefined arbitrary value, to increase the chance of causing a fault if the uninitialized value is read. However, pre-filling stack locations introduces additional execution overhead, and does not catch all uses of uninitialized values.
Further, recent developments in debugging technology have surrounded what is frequently referred to as “time travel” tracing (TTT). In general, TTT involves recording a bit-accurate trace of live execution of one or more threads of an application program, enabling a full and accurate replay of the prior execution of these thread(s) at later time. Thus, TTT enables creation of “time travel” debuggers, which are able to faithfully replay prior execution of one or more threads in both forward and reverse directions and perform other types of rich analysis.