Customer records and varied information that are often a part of complex data systems and business applications present a variety of challenges in management, identification, analysis and segregation, for example. It is generally believed, based on estimates in certain industries, that the amount of unstructured data that may reside in planned and developing databases could easily exceed eighty percent. With such an amount of unstructured data, additional complexities arise, especially when the data is present on different platforms, is of varying freshness, and may be inconsistent across the platforms.
Determining which records are duplicates of one another, for instance, where within a complex data system there may exist multiple databases, each having records comprising particular data within, can be a particularly difficult issue. Quality issues also arise in which data stewardship matters become a central concern. Further, in matters concerning customer relationship management (CRM) which often necessitate data integrity to realize optimal returns on data structure investments, removing “bad customer data” and especially duplicate customer data is of key concern.
The IBM Websphere® Customer Center (WCC) is a real-time, service-oriented customer application that provides users with a single view of the customer and also with business processes which provide for maintaining customer data shared between a front and back office arrangement. The WCC effectively acts as an intelligent customer data hub to manage customer data through its Customer Master Data Management (CMDH) hub. Via the CMDH, the WCC manages business rules, event detection, data validation and duplicate suspect processing (DSP). While other vendors have attempted to merely singly match search results to transactional records as a solution to searching in response to the issues raised previously, the WCC via its DSP, enables a client to persist and process duplicate suspects of any customer (i.e., customer data) in the system.
In operation, the DSP searches for potential suspect candidates for a given customer (or party as used herein) and then provides the suspect candidate list to a matching engine which then scores values for each of the candidates residing on the list of suspect candidates. The DSP creates a suspect table (i.e., SuspectTable) as part of the process. The values determined by the DSP in this process are then assessed to determine similarity or dissimilarity as between candidate suspects.
A challenge in the process of assessment of candidate suspects, is the final decision to remove customer data that only appears to be “similar” to that of other data in the system. In particular, the deletion of one set of customer data based on the similar presence of a second but different set of other customer data also present, in relation to a determinative evaluation by a single matching engine and its respective scoring scheme, without a significant confidence of accuracy, is often disconcerting to business management. Typically then, for business management to gain additional confidence in the initial determination of results to make a final decision, further analysis is often conducted and additional searching, comparison, scoring and assessments are performed over time in multiples “passes” on data which is then both different in content and time-value to the original data analyzed.
Unfortunately, as a result, business management may often have multiple results and suspects created for data sets which are neither static nor original in content when compared to the initial set of data, thereby creating further hesitation and confusion in making a final decision. Operatively, data users typically direct findings determined from the traditional process above for further integration steps and processes, thereby creating multiple reviews and assessment passes. Certain of these additional integration activities may involve further matching engine investigation or analysis tool, undertaking data transformation steps, and using Extract, Transform and Load (ETL) tools, for instance.
It is therefore desirable to have an improved method for determining and identifying, in one pass and with heightened confidence, suspect candidates from customer data in relation to results of a plurality of predetermined matching engines via comparative assessments at a predefined time. The present invention addresses such a need.