People using e-mail, voicemail, or other messages often receive nearly the same message from multiple sources. For example, on the Internet, messages of little value to the receiver (spam) are forwarded to many destinations. Also, people often have multiple e-mail addresses or change e-mail addresses frequently. Thus, users may receive the same e-mail in multiple inboxes. Duplicate messages reduce the productivity of the workforce by forcing them to examine messages to determine if they are duplicates. Duplicate messages also use storage space, network bandwidth, and processing time, reducing the efficiency of the messaging system and raising the cost of operating a messaging network.
An existing manual solution to the problem is to institute policies about who can send certain types of e-mail. Another manual solution is for receivers to notify all but one sender that they are receiving duplicates or unsubscribe from mailing lists, but this only works when the duplicate messages are being sent repeatedly. These methods and others that rely on user compliance are error prone, ineffective, expensive, difficult to implement, and often not applicable. Another existing solution is to run a program to find duplicates within the messaging storage system on a client within a client-server system. However, messages are often not exactly duplicated because they are forwarded, come from difference sources, have a signature or other sender annotations, etc. Yet another existing solution, typically used for attachments, is to detect and store the duplicates of attachments on the server. However, users still receive and must read multiple e-mails, just with a reference to the duplicate content. Also, this only works for attachments and of email messages in general.