Clone detection is the process of finding passages of text that are identical, or very similar, to other text in a collection or a corpus of text. Such passages may be undesirable for several reasons: they may cause redundant work, they may be inconsistent, and they may be a result of unethical behavior, such as unauthorized copying. For example, clone detection in academia is used to detect plagiarism and is therefore also known as plagiarism detection.
In software development, clone detection is used to find instructions to the computer that have been copied from one part of the software to another. Such copying makes the software unnecessarily larger, and therefore more expensive to maintain, and increases the cost of fixing defects because defective instructions must be fixed in every copy. In this context, clone detection is also known as copy-paste detection.