The present invention, in some embodiments thereof, relates to estimating a similarity of binary records and, more particularly, but not exclusively, to estimating a similarity of binary records according to a semantic match probability between code strands decomposed from each of the compared binary records.
Identifying the origins of executable binary records is a major challenge. This is due to the fact that the source code gets ported, modified, compiled and/or built using various combinations of tool chains, targeting different processor architectures compilers, employing different optimization schemes and/or the like. The challenge is becoming even harder with the binary records being stripped from any debug information to prevent code theft, duplication, reverse engineering and/or the like.
The need for identifying the origins on the other hand is constantly rising. A plurality of applications may require the ability of comparing the binary records in order to identify a common source code origin and/or the like. Such applications may include, for example, deployed software maintenance and/or vulnerability analysis, code theft detection, reverse engineering, security applications for detecting common origins of malicious code agents and/or the like.