Software maintenance is widely recognized as the dominant cost phase in the lifecycle of computer programs. One primary activity in software maintenance is migrating code from one platform to another. A substantial effort goes into determining the boundaries of the software being migrated from the myriad of code artifacts resident on the source platform. While application files that have to be migrated in their entirety are relatively simple to identify, and tools exist to assist similar inventory tasks (e.g. GNU Autoconfig determines the support available in a given environment refer to http://www.gnu.org/software/autoconf/manual/autoconf-2.57/ps/autoconf.ps.gz), determining bounds of application code where it merges with third-party software packages is harder to discern.
The diff utility finds differences in two files and presents its results line by line in many formats selectable by command options. The diff algorithm in Hunt, J. W., and Mcllroy, M. D., “An algorithm for differential file comparison”, Computing Science Tech. Rep. 41, AT&T Bell Laboratories, Murray Hill, N. J., June 1976 uses an LCS (longest common subsequence) technique. However, interpreting the textual differences obtained by diff can be a hard task.
Given the increasing use of open-source software, the code merge problem is only increasing with time.