The advancement in technology, especially innovations related to information availability on Internet, has led to the increased unauthorized use of information. The easy availability of source code files of a software have led to the frequent source code plagiarism. Any plagiarized code may potentially lead to costly penalties and consequences to an enterprises. There exist large number of approaches to detect copied code in different software. Generally, some approaches leverage semantics of the language while others leverage the comments and/or metrics of the source code.
To combat source code plagiarism, many tools are available in the market for detecting plagiarism in source code. Typically, these tools are designed to work in a way that they depend on threshold percentages to report matching files or matching code segments. For a source code file to be reported as a plagiarized file or to say that the file contains plagiarized code, it must cross a certain threshold percentage match. This threshold is determined by the designers of the tool.
Though a predefined threshold or adjustable threshold has been set up for some tools, however, the existing technique does not consider the scenario where threshold percentage method may miss a potential candidate of plagiarism. For example, a piece of code may be copied from a freely accessible source and is split across various files by dividing the copied segment into smaller parts. In this scenario, if existing plagiarism checks are done on these set of files, the files which have a very small piece of code copied in them may miss out from being reported since they did not cross the threshold percentage. Hence, the file may not be reported as plagiarized.
Though existing techniques are adequate to find the copied code as they compare two programs to find code clones. Additionally, some techniques exist which detects plagiarism even when the user has made some modification in the format of the code after copying. However, the existing techniques are not capable to detect plagiarism if the user intelligently copy patterns based code and split it across source code files.
Therefore, there is a general need to implement a technique which utilizes plagiarism detection method in a program source code files based on design pattern.