1. Field of the Invention
Embodiments of the invention described herein pertain to the field of computer systems and software. More particularly, but not by way of limitation, one or more embodiments of the invention enable systems and methods for adaptive matching of similar data in a data repository to determine if two or more data items are related in accordance with configurable criteria and to learn which match criteria settings are appropriate based on previous user input or results.
2. Description of the Related Art
Use of large data repositories in making business decisions is a common strategy employed by successful businesses. Successful businesses have a need for business data that is as accurate as possible to allow effective business decisions to be made. When the data in these systems is not consistent, problems arise. Keeping data consistent across multiple distributed enterprise-wide computer systems is non-trivial. Establishing effective communication links between heterogeneous systems is the first step for making the data consistent. However, simply allowing all computer systems within an organization to communicate does not solve the problem. Even when data is shared throughout an enterprise, problems still arise since data may exist in different forms in different locations within the enterprise. Since the goal of absolutely accurate data is elusive, it is common for companies to maintain data in independent computer systems. For example, because of the difficulties associated with identifying and matching similar data, some companies maintain data for each corporate division in independent computational zones and only utilize such data within a division to make a business decision associated with that particular division. It is common after one company acquires another company for the computer systems of each company to remain autonomous. Thus, the possibility of identifying and matching common data items within each repository is generally very low.
To solve the problem of having data in multiple similar forms, businesses attempt to identify similar data and integrate the data in a way that ensures the data remains consistent. Performing the integration is difficult and breaks down when new corporate computer systems are added through acquisition or changes in business systems and software occur. One method that is used by some organizations is to maintain “master data”. Master data for example may be an organization's ideal form of a data item. Solutions for keeping the data consistent through the organization, i.e., propagating master data throughout the organization, are generally non-robust and brute force communication schemes that do not allow new data entries to be matched against existing data items to effectuate data consolidation at data entry time.
The inability to keep master data items consistent harms an organization's ability to leverage its assets and lower the cost of doing business. All areas of a business are affected by the inability to keep data as accurately as is possible. In summary, existing computer systems and methods lack effective mechanisms for performing data matching in a way that allows the system to learn when data matches are appropriate. For example, existing systems and methods do not have an ability to learn and consolidate two data items that originally where thought to be independent, but which have been matched above a threshold. The ability to learn which patterns in data are actually indicative of a match between two data items is not found in existing enterprise computing solutions.
Because of the limitations described above there is a need for a system and method for adaptive matching of similar data in a data repository.