Digital documents, files, and media may all be easily copied. Such copying of information may not be immediately apparent on its face; however, such copying can often prove to be problematic if it is initially undetected, but later discovered by a party having legal rights in the copied information.
To take an example, unlike in traditional software development environments where a single entity controls the entire development of a software element, in collaborative development environments the software elements being developed are shared among a variety of entities. Accordingly, when one develops software in a collaborative environment, copying is more difficult to detect. As such, the risk that the legal rights of another entity will be infringed, for example by developers inappropriately importing the other entity's constituent software elements into their aggregated software product, is increased.
Techniques have been developed to identify portions within an aggregated software product that are suspected of having been copied. The suspected source or sources of the copied sections may also be indicated. For example, it is known to compare portions of an aggregated software product, and/or subsets of information derived from the aggregated software product, against a database of constituent software elements and/or subsets of information derived from the constituent software elements. If the comparison yields a match or matches, it is likely that the portion of the aggregated software product being compared against the database was copied from one or more of the constituent software elements.
Problems can arise, however, where the number of constituent software elements included within the database is large and, for example, multiple versions and/or copies of one or more of the constituent software elements exist within the database or multiple ones of the constituent software elements include the same or very similar functions, procedures, and/or sub-programs. In such exemplary situations, comparing portions of an aggregated software product, and/or subsets of information derived from the aggregated software product, against a database of constituent software elements and/or subsets of information derived from the constituent software elements can result in an unwieldy large number of matches. A user inspecting an aggregated software product for potential copying may, when presented with too many possible sources from which a given portion of the aggregated software product may have been copied, become frustrated, especially if the user can not himself efficiently and easily narrow down the number of possible sources.