1. Technical Field
The present invention relates to a data processing method, apparatus, and computer program product for performing a similarity comparison between software programs.
2. Description of the Related Art
Advances in communication systems, including the Internet, have unwittingly led to rapid increases in software piracy.
The concept of software piracy encompasses the illegal copying of software for use or sale, as well as plagiarism or theft of software, which involves unauthorized use of some or all of the code forming a software program.
Methods of identifying software piracy can be classified into static analysis and dynamic analysis.
Static analysis may involve a method of identifying software piracy by analyzing the binary files of a software. As the amount of data that must be analyzed is relatively small, the analysis can be performed relatively easily.
However, since the content of binary files can be changed through source code obfuscation, it may be more difficult to identify an illegally copied software.
Dynamic analysis may involve a method of identifying software piracy by using the characteristics of changes that occur during the execution of a software. Although source code obfuscation may alter the code, the characteristics found during the execution of a software may remain unchanged, so that illegally copied software can be identified with greater accuracy.
However, the amount of data that is created during execution may be very large, and as such, the amount of data that must be analyzed may be significantly larger compared to the case of static analysis.
Despite these drawbacks, dynamic analysis is used more often, as it provides greater accuracy in identifying software piracy.
FIG. 1 illustrates the results of data extraction for identifying software piracy according to the related art.
FIG. 1 corresponds to a case of identifying software piracy by using the characteristics of function call sequences of software programs. The results show function call data from execution to finish for Win32-based software programs.
From FIG. 1, it can be seen that the extraction of function call data results in more than 1 Mbyte of data extracted. Applying dynamic analysis to such data in order to identify software piracy can be very time consuming.
As such, research is under way aimed at reducing the time required for identifying software piracy by reducing the amount of data that has to be analyzed when applying a dynamic analysis technique.