Modern computer programs are often enormously complex logical systems, designed to perform some useful function based on one or more input data streams. A program may operate correctly when the input data streams are close to what the program's designer anticipated, but some unexpected configurations of data may cause the program to operate erratically or to crash. Since erratic program operation may expose data or protected facilities in an undesirable way, malicious users often seek out data streams that cause unexpected behavior. Similarly, programmers hoping to improve the security and robustness of their programs may also search for problematic data input streams, as these can often help locate bugs and incorrect assumptions embodied in the code.
Software testing traditionally has relied on hand-specification of input streams based on a priori knowledge of the expected input structure. For example, if a program is designed to receive a text string at a certain point in its input, the tester might provide a text string with unusual characters (e.g., two-byte characters from an Asian character set), or a zero-length string, or an extremely long string. A “correct” or “expected” response to the given input is specified, and the program's actual response is compared to the expected response to determine whether the program is behaving properly. As a program is refined and extended, a library of these test inputs may be collected, and a new version of the program may be qualified by confirming that it produces the expected responses to all of the test inputs.
This testing method is effective for finding errors that have directly-observable consequences, but it can miss many other classes of errors. In addition, it is often a time-consuming task to design a set of tests that provide adequate coverage of so-called “boundary cases:” tests that exercise the program when various portions of its state are at or near minima or maxima. (For example, a banking program that used a sixteen-bit integer to hold an account balance would need extensive testing for balances near −32,768 (−215) and 32767 (215−1)).
Automated approaches that can provide more comprehensive testing may be of use in detecting subtle programming errors.