Conventional search systems may include some logic to detect and remove duplicate documents. This logic tends to be fixed and pre-defined. This logic also tends to rely solely on text-based comparisons. Thus, these conventional systems may compare document content, the URL of documents, and/or document metadata to determine whether documents are duplicates. These conventional systems may adequately identify duplicate documents that appear in different locations. However, these duplicates tend to be exact duplicates (e.g., same document stored in different locations). Some items may be so similar, or may refer to items that are so related (e.g., meeting, email) that they do not justify separate hits in response to a search. Conventional systems may not identify these items as duplicates.