Legacy applications, i.e., existing sets of executable instructions for execution by a processor such as application software, are often written in verbose languages, e.g., common business-oriented language (COBOL), algorithmic language (ALGOL), etc., and comprise a million or more lines of code. These applications have been modified over the course of time, e.g., many years such as decades In many instances, frameworks or libraries which may have been used to curtail the proliferation of repetitive and duplicative code were unavailable or unused by application developers.
Different approaches have been used to decompose legacy computer applications in order to discover duplicate source code within the applications. The information related to the duplicate source code discovery may be used as the basis for further decomposition tasks such as creating reengineering specifications for sections of an examined legacy application which may be revealed during an approach.
Prior approaches used to discover duplicate source code provide hierarchical organization and visualization tools which provide a user with the ability to view various parent-child relationships among legacy application constituent source code artifacts. The tools used may also provide search capabilities for exploring source code interdependencies.