Nowadays, numerous companies and organizations are utilizing large-scaled computer systems for various purposes. When running a large-scaled computer system, it is necessary to perform maintenance works such as fixing a bug of software in operation and adding a required function. The companies and organizations that utilize the computer system need to accurately recognize the cost for such maintenance work.
A technique of estimating the maintenance cost of software from positions or the number of code clones is known. The code clone refers to a partial copy created on a source code of a program. It is known that, in general, the presence of a large number of code clones leads to decline of maintenance performance of the software. This is because, for example when a bug is found in the code clone, the developer has to review whether each of the code clones having the same content has to be fixed in order to eliminate the bug. In a large-scaled system, in particular, it takes enormous man power to review all the code clones contained in the software.
Various studies have thus far been made on detection of the code clone. Non Patent Literature 1 classifies the code clone into the following three types. The type 1 includes code clones that fully match except for blanks, parenthesis, and the like. The type 2 includes code clones different that are different from each other only in a part of reserved words, for example a user-defined name such as a variable name, a label name, and a procedure name, and a type of the variable. The type 3 includes those code clones of type 2 but in which a sentence has been inserted, deleted, or modified. Non Patent Literature 1 discloses a technique of detecting the code clone of types 1 and 2.
Non Patent Literature 2 discloses a system and software to detect the code clone. The software disclosed by Non Patent Literature 2 (CC Finder X) lexically analyzes the source program to be processed, and detects the code clone contained in the source program. The software is capable of detecting the code clone of types 1 and 2.
Patent Literature 1 discloses a system for detecting the code clone with respect to each function and evaluating the similarity between the code clones. The system according to Patent Literature 1 analyzes the source program with respect to each function, and detects the code clone that satisfies a similarity detection criterion designated by the user. The user can instruct to detect either the code clones that fully match (type 1), or those partially different from each other (type 2 or type 3). The system according to Patent Literature 1 also evaluates the detected code clones according to two viewpoints, which are whether the corresponding source codes match and whether the function interface is identical, and classifies the code clones into four categories according to the evaluation result. The user can find, by looking up the category, for example whether there are other code clones that are different only in function interface, or whether there are other code clones of the identical source code, which is useful for reutilization of the code clone.