Businesses and individuals increasingly rely on management of contact data for communication, marketing, monitoring and other basic functions.
Management of data about persons or entities through Contact Data Management (CDM) Systems is a multi-billion dollar market, which experiences double digit growth each year, and which includes, but is not limited to, Contact Resource Management (CRM) systems. Management of contact data is critical to domestic and global productivity, and businesses of all sizes increasingly depend on the effective implementation of all business functions. It is also critical to governments and medical institutions and for security and tracking purposes.
A problem common to all CDM systems is the duplication of data entered into the system. It is estimated that most large scale CDM systems experience fluctuating duplication rates between 10 to 30 percent, and must devote considerable system resources to addressing complications caused by the duplication of data.
Because most CDM systems require periodic repetition of de-duplication processes, CDM systems experience fluctuating duplication rates between the times that the de-duplication process is performed rendering them more error prone.
Errors and redundant resources allocated to duplicated data cause a loss of business productivity, and compromise the overall functionality of CDM systems.
A typical scenario occurs when customer records are recorded in an inconsistent manner. For example, a contact may be shown as salesman Mary Smith in a company, “GRP Transport, Srvcs.” A second listing may appear for Bob Jones with “Group Transportation Services.” Several types of problems may arise from this hypothetical scenario.
One problem that may arise from this hypothetical may be that the contact entity receives multiple mailings or calls from the end user. Another problem is that the end user of the CDM system may not ever have consistent information about each customer. The customer information may be inconsistent because every time the customer record has to be updated, only one record is updated. There is no assurance that the most recently updated record will be subsequently revised, which results in inconsistent information being accessed. Another problem with duplicated records is that the end user is unable to accurately monitor user activity. Another type of problem may occur when multiple client personnel are contacted by the user's staff and duplicate services are rendered or inconsistent pricing and/or information is offered.
Data entered into a CDM may come from various sources (e.g., list broker services, hand entered data, web crawled data, social networks, association lists, magazine subscriber lists, e-mail signatures). Each of these disparate data sources may obey different rules on how they treat data, or may have no rules. As data is gathered using increasingly sophisticated technology and data mining tools, new types of data duplication errors and data record inconsistencies (e.g., spelling, abbreviation, punctuation, deconstruction, reconstruction anomalies and other differences in records referencing the same contact in disparate data sources) occur more frequently. De-duplication processes in existing CRM and CDM technologies cannot be effective unless data is conformed (“normalized”) so that it can be adequately compared. Moreover, decisions on how to normalize data and the conventions to be used for normalizing data can differ greatly depending on the needs of a particular user (tenant) environment.
Most CRM and CDM de-duplication technologies rely on “library-style” retrieval, hash code and character comparisons to detect duplicates. These CRM and CDM components are not effective for screening massive amounts of data from disparate data sources during an updating process. The de-duplication components of CDM and CRM systems known in the art are also not designed to be dynamically updated to anticipate an infinite number of disparate data sources and tenants with unique requirements.
It is desirable to have a data normalization tool which can interface with various CDM systems and process data from infinite disparate data sources.
It is further desirable to have a tool which can normalize data specific to the needs of one or more tenants, and which can be adapted for multi-tenant environments.
It is further desirable to have a tool that can be dynamically updated to normalize data and to address an evolving range of potential data entry variations that can result from changes in data retrieval technology.