Fuzz testing provides a technique for testing computer programs with the use of randomized input. For instance, fuzz-based testing techniques may be used to generate and modify test inputs, including file documents, that conform with a defined text format such as HyperText Markup Language (HTML), Portable Document Format (PDF) or Cascading Style Sheet (CSS) language. When the document is provided to an application for processing, the application may be monitored for unexpected or undesirable behaviors, such as crashes or exposing data to unauthorized access.
Certain generation-based fuzz techniques may randomly generate or change test documents based on a manually-specified grammar. For example, the requirements of a defined format may be written as a set of computer instructions that generate or change a sequence of random values such that the sequence remains fully consistent with the format. Complicated formats may make it difficult and cumbersome to create computer instructions that fully implement the grammar, e.g., are capable of iterating through all of the requirements or iterating through the requirements in unexpected ways. Moreover, small changes to the requirements of the defined format may require substantial changes to the computer instructions.
Certain mutation-based fuzz techniques may make small changes to an existing test document, analyze the results and then repeat the process. By way of example, a mutation-based fuzz technique may involve: selecting a document that conforms with a defined text format; mutating (e.g., modifying) the selected document by randomly changing characters (e.g., by bit flipping or byte incrementing), deleting characters, adding characters, or swapping strings of characters; processing the document using the application being tested; scoring the document based on its coverage (e.g., the identity of routines and the number of unique lines of code that were executed in the application as a result of processing the document) and; using the score as a fitness function in a genetic algorithm or the like to determine whether the document should be further mutated and scored. Documents that result in crashes or allow potentially malicious actions (e.g., buffer overflow) may also be selected for additional mutation and testing. Although mutation-based fuzz techniques are effective for certain formats such as media formats, they may be less effective than generation-based fuzz techniques when used in connection with complicated text formats.