Plagiarism of software source code is a serious problem in two distinct areas of endeavor—cheating by students at schools and intellectual property theft at corporations. Plagiarism detection programs and algorithms have been around for a number of years but have gotten more attention recently due to two main factors. First, the Internet and search engines like Google have made source code very easy to obtain. Second, the open source movement has grown tremendously over the past several years, allowing programmers all over the world to write, distribute, and share code.
In recent years, plagiarism detection techniques have become more sophisticated. A summary of available tools is given by Paul Clough in his paper entitled “Plagiarism in natural and programming languages: an overview of current tools and technologies.” Clough discusses tools and algorithms for finding plagiarism in generic text documents as well as in programming language source code files.
There are a number of plagiarism detection programs currently available including the Plague program developed by Geoff Whale at the University of New South Wales, the YAP programs (YAP, YAP2, YAP3) developed by Michael Wise at the University of Sydney, Australia, the JPlag program, written by Lutz Prechelt and Guido Malpohl of the University Karlsruhe and Michael Philippsen of the University of Erlangen-Nuremberg, and the Measure of Software Similarity (MOSS) program developed at the University of California at Berkeley by Alex Aiken.
One deficiency of the aforementioned programs is that they only compare functional code. One program, CodeMatch®, developed by Robert Zeidman, the inventor of the present invention, overcomes this deficiency by dividing program source code into elements including functional code (statements, identifiers, and instruction sequences) and non-functional code (comments and strings) and compares these different elements in the source code files of different programs to each other.
Clever programmers will often make significant changes to the appearance, but not the functionality, of the functional source code in order to disguise copying. The resulting functional code looks very different but functions identically to the original code from which it was copied.
In cases of trying to disguise copying, a programmer may copy a function from one program's source code into another program's source code and comment it out in order to use the code as a guide for writing a similar function. Often programmers making changes to disguise functional statements do not make changes to the commented code because it is non-functional and escapes their notice. All of the previously mentioned tools will not find this sure sign of plagiarism. Accordingly, it would be beneficial to have a plagiarism detection tool that can compare functional code in one source code file to nonfunctional code in another source code file in order to overcome the above limitations of the conventional techniques.