Code coverage is a measure used in software testing that indicates the degree to which the source code of a computer program has been tested. Current code-coverage tools typically use either a modified execution environment (virtualized execution) or rely on various types of execution instrumentation to instrument the entire binary code, such as by inserting code to log coverage at the start of every basic block. Each of these current methods, however, has a non-zero runtime overhead. Runtime is the period during which a computer program is executing.
Code-coverage tools often use software breakpoints to record the execution of code deemed interesting by the user. In general, a breakpoint is a means of acquiring knowledge about a program during its execution. This is normally achieved by having the programmer manually insert (by manually indicating instruction addresses/offsets, function names, and so forth) breakpoints in the code. More particularly, a breakpoint is an intentional stopping or pausing place in a program that is placed there for debugging purposes. During the pause the programmer inspects the test environment to determine whether the program is functioning as expected.
One type of testing is fuzz testing. Conventional fuzz testing, or “fuzzing,” is a technique used to test for security and reliability problems in software. It is an automated or semi-automated technique that uses invalid, unexpected, or random data as inputs to a computer program. This can be achieved by mutating good input for a program into possibly bad input. For example, fuzzing may involve changing small parts of a file and delivering that content to an application in an attempt to cause the application to crash. The program then is monitored for exceptions such as crashes or failing built-in code assertions.
“Smart” fuzzing, which is similar to conventional fuzzing, uses knowledge of the structure of the input data or feedback from the program under test to inform test case generation. Smarter fuzzing often enhances the code coverage when delivering fuzzed content by providing input that will match the expected input data structure more closely. Smart fuzzing is usually achieved by either requiring an extensive input structure definition to be provided at the start of fuzzing or with expensive runtime instrumentation and monitoring. Creating the input structure definitions requires significant engineering time. Typical runtime instrumentation and monitoring significantly increases the time needed to execute the program under test, which significantly reduces the fuzzing throughput.
One problem with conventional fuzzing and smart fuzzing techniques is that they are only as good as the input received. Both techniques typically start with a static set of inputs and then fuzz from this static set. This means that these techniques usually are fuzzing from the same starting point. This makes it difficult for the fuzzing to get better over time. Besides the actual crashes that are detected, one challenge is how to make progress into new areas that otherwise are covered. Detecting new coverage is desirable because it indicates an opportunity to find new bugs in the parts of the execution code previously untested through fuzzing.
One current technique that attempts to increase code coverage uses a constraint solver to try and solve the constraints generated from execution races. In other words, during the execution trace the tool logs all conditional branches in the execution flow and derives symbolic representations of the conditional (what is being compared). The constraint solver can then try to solve the inverse of that conditional (to figure out what input would cause the alternate branch to be taken). However, this constraint solver technique is expensive, degrades performance, and has limitations on what it can solve.
Another current technique modifies the binary code being tested to insert code. This inserted code then notifies the monitoring process the code that is actually being executed. However, this again is expensive and has the disadvantage that it modifies the binary code.