Identifying components that are incorporated into software applications can be problematic. Typically applications today are built using various components and are not written from scratch. All software developers, in whatever language and platform, whatever methodology, will realize that there is some software that they do not want to write and that already exists. What frequently happens is that software developers will use software components directly, e.g., from a repository. Identifying the unknown component that was used as-is directly from a repository is very easy because the entire file can be fingerprinted (such as by hashing) and if the hashes are the same then it is known that the entire file is the same.
What also is common is that software developers modify these components in subtle ways. For example, the components might be recompiled from the original source to make them compliant with licensing. When that is done, there is a subtle change when considering the file as a whole (such as due to the compiler time stamp) even through the functional contents are essentially the same. Another example is the OSGI (Open Services Gateway Initiative) framework for Java, in which additional metadata is added to the components themselves to make them work in certain environments. In an OSGI situation, 99.9% of the contents may be the same as the original source. However, even a minor change will result in a different hash and thus a determination that the files are not the same.