With the proliferation of the internet and electronic commerce (“eCommerce”), businesses have begun to rely on the continuous operation of their computer systems. Even small disruptions of computer systems can have disastrous financial consequences as customers opt to go to other web sites or take their business elsewhere.
One reason that computer systems become unavailable is failure in the application or operating system code that runs on them. Failures in programs can occur for many reasons, including but not limited to, illegal operations such as dividing by zero, accessing invalid memory locations, going into an infinite loop, running out of memory, writing into memory that belongs to another user, accessing an invalid device, and so on. These problems are often due to program bugs.
Ayers, Agarwal and Schooler (hereafter “Ayers”), “A Method for Back Tracing Program Execution,” U.S. application Ser. No. 09/246,619, filed on Feb. 8, 1999 and incorporated by reference herein in its entirety, focuses on aiding rapid recovery in the face of a computer crash. When a computer runs an important aspect of a business, it is critical that the system be able to recover from the crash as quickly as possible, and that the cause of the crash be identified and fixed to prevent further crash occurrences, and even more important, to prevent the problem that caused the crash from causing other damage such as data corruption. Ayers discloses a method for recording a sequence of instructions executed during a production run of the program and outputting this sequence upon a crash.
Traceback technology is also important for purposes other then crash recovery, such as performance tuning and debugging, in which case some system event or program event or termination condition can trigger the writing out of an instruction trace.
The preferred method for traceback disclosed by Ayers is binary instrumentation in which code instrumentation is introduced in an executable. The instrumentation code writes out the trace.
Agarwal, “Test Protection, and Repair Through Binary-Code Augmentation,” U.S. Pat. No. 5,966,541, issued on Oct. 12, 1999 and incorporated by reference herein in its entirety, discloses a method of binary instrumentation for aiding in testing programs through test coverage. The instrumentation marks instructions which were executed during a test run. Software test engineers or other testers could then write specific tests for the untested code, thereby improving overall test quality. One of the key aspects of the instrumentation technology is that it introduces virtually no overhead since it adds vew few extra instructions into the code directly, and does not involve expensive procedure calls. Improved testing also helps to discover and fix bugs, thereby resulting in higher availability for the system.