One of the challenges in data management is that the same entity can be represented in a database management system (DBMS) by multiple instances. For example, a customer entity “Bob Smith” may be represented in the database by more than one different instances, including “Bob Smith,” “Smith, Bob F.,” “Robert Symthe,” etc. Such instances are also known as fuzzy duplicates. Duplication of entities can be caused by a number of reasons, such as disparate entity originations, typographical errors, etc. Existing solutions for identifying and processing such fuzzy duplicates, such as approximate string matching, de-duplication, etc., are generally specialized, stand-alone applications. For example, some solutions for managing fuzzy duplicates require extracting the data out of the database and doing the entity matching in the application layer by applying the appropriate string matching logic. Subsequently, the matched data is pushed back into the database for further processing. Generally, such solutions are separate from the underlying DBMS where the data is stored. As a result, such solutions do not leverage the query engine capabilities of database for composing queries. The additional steps employed in extracting the data from the database and pushing back of the data in the database also can also result in loss of performance efficiency and in lower application scalability.