For efficiency in programming, software developers often duplicate sections of source code in numerous locations within programming projects. Reusing a portion of source code via copy-and-paste with or without some degree of modifications or adaptations is called “code cloning” and the resulting copied portion is called “code clone,” or more simply a “clone.” Code cloning is a common behavior of software developers for quick code reuse. In general, the amount of code clones is likely to increase as the scale of code bases increases. A source code base of a programming project (e.g., an application or an operating system) is the collection of the source code of the bulk of the computer programs that make up the project.
In many cases, unbridled code cloning negatively impacts the overall code quality by introducing potentially undetected cloning errors and potentially unnecessary redundancy. Consequently, unbridled code cloning increases testing and code maintenance costs. For example, it is a common problem that developers fix a bug in one piece of code but forget to apply the fix to its clones. This problem is more severe if the bug reveals a security issue. Also, for example, a high degree of clone coding may cause code bloating. Code bloating occurs when code clone is unnecessarily repeated in the code base.
It still remains a great challenge today to efficiently detect, analyze, and report code clones in large-scale code bases to enable developers to take effective action.