1. Field of the Invention
The invention relates to information processing. More particularly, the invention relates to a system and a family of methods that provide for fast and reliable comparison of information contents.
2. Description of Related Technology
An organization may receive thousands of emails every day. The received emails may be automatically stored in a relational database from which customer service representatives may retrieve, read, and act upon. For various reasons, some malicious, some by mistake, others due to errors in the infrastructure, a number of duplicate copies of an email may be received or stored in the relational database.
There are many problems with storing duplicate copies of an email. Storing large number, sometimes thousands, of identical email in a database severely affects the system performance, and wastes personnel time. Since the received emails are typically large in size, they are usually stored as Binary Large Objects (BLOBs). The BLOBs are not searchable for determining whether they include any duplicates, and even if they were searchable, it would be prohibitively time consuming. That is because the emails have to be stored in the relational database before being searched, and the existing search techniques are limited to the size and type of data to be searched.
There is a need, therefore, for detecting duplicate emails, before storing them in the system, in a fast and reliable way.