The ability to trace through the execution of individual instructions or follow a control flow in computer programs is valuable because tracing allows software developers to follow program logic in a step-wise manner in debugging mode. A tracing ability allows software developers to examine program states during the program's execution and solve logic and programming problems.
While the capability to trace the computer program's control flow and view program states has previously existed, tracing a dataflow within large computer programs has been a more difficult problem to address. Tracking dataflow or data propagation in a computer program is a harder problem than tracing a computer program's control flow because a dataflow tracing tool may interpret the dataflow or data propagation effects of a large number or even all of machine instructions the computer program executes. For example, tracking dataflow can include tracking the effect of dataflow on memory and registers modified by machine instructions. Typical modern programs execute several billion machine instructions in even the simplest runs. The combination of a large number of instructions and program states introduces a high level of complexity and performance issues for dataflow tracking. Therefore, tracking data propagation for large numbers of machine instructions can be computationally problematic and time consuming.
Tracking the propagation and influence of data for a computer program is desirable but existing tools do not provide sufficient dataflow tracking capabilities. An area of particular interest is tracking the dataflow of tainted data. Data received from untrusted sources (including a user) can be referred to as tainted data or tainted information.
Some dataflow tracking systems have been available in runtime environments or hardware configurations. However, existing runtime solutions that perform dynamic dataflow and data taint tracking have suffered from performance problems. The use of extensively instrumented code in a compiled program can impact the program's execution performance and slow performance up to 40 times as compared to un-instrumented execution speed. Specialized hardware has also been used for tracking tainted information. However, specialized chip hardware for dataflow tracking is expensive to design and manufacture.
For programs with large volumes of data as inputs, simply keeping track of which input bytes affect other bytes in a program state at any point in time may use more memory than is practically available on typical software development hardware.