For a number of software engineering applications, it would be helpful to know how two related versions of a computer program compare. In particular, if changes are made to a “baseline version” of a program, resulting in a newer or updated version, and if source code is available for both versions, the source code difference of the baseline and current versions is easy to obtain through standard textual comparison tools, such as the UNIX “diff” command.
There are two major problems with this approach. First, the source code may not be available, especially for the older baseline version. Second, and more fundamentally, a source-code difference does not directly point out all the portions of a program that may have different semantics. For instance, if the type, or format, of a program variable is changed, then all the executable code, i.e., computation and logic, that mentions or references that variable will in general be different as well.
For software testing applications, it is desirable to know which code should be re-tested when a program is modified. As shown above, the source code difference is generally insufficient. While this problem can be addressed through additional source-level tools, such as dataflow slicing, that is, determining the dataflow representation for a program, a more direct approach is to compare the executable program binaries obtained by compiling the source code into machine code which incorporates any changes such as, for example, variable format changes.