Enterprises commonly accumulate large amounts of data, such as customer data, product description data, etc. Commonly, such data are accumulated by various departments and/or groups within the enterprise, wherein each department and/or group maintains a separate database. As a result, there can exist a non-trivial amount of duplicate records in the data within and across databases in an enterprise.
Moreover, in many instances, such duplicates records are not linked. For example, a person might have mobile and broadband connections from the same enterprise, but the enterprise is unaware of this because of a lack of linking of appropriate records within and across various databases.
Consequently, de-duplication is important for enterprises interested in a single view of its customer data to provide more efficient services and more efficient customer data management. However, considerable human effort is required to manually analyze individual columns of data to implement a matching rule. Accordingly, a need exists for techniques to derive a multi-pass matching algorithm that implements blocking and matching steps.