1. Field of the invention
The present invention generally relates to ranking multiple data sources, and more particularly, to a methods, systems and machine-readable mediums for ranking multiple regulated and non-regulated data sources.
2. Description of the Related Art
In information sharing and processing environments, many applications have been developed to process information for purposes such as making a decision or evaluating the information based on one or more criteria. In many cases, the reliability of these applications is often limited by their ability to reliably acquire accurate information. For example, acquiring accurate customer information is important for businesses to serve their customers efficiently. The customer information includes, for example, various data elements like demographic information such as postal address, age, year of birth, and customer history such as credit history or purchase history. The evolution of distributed network environments (such as the Internet) has resulted in an explosion of both the quantity and availability of the customer information from various sources. These data sources can be regulated data sources (e.g., credit bureaus, consumer reporting agencies) or non-regulated data sources (e.g., banks, mortgage issuers, credit union lenders, property lease information repositories, and customer surveys conducted by various business units). The regulated data sources are often considered more accurate and more reliable compared to the non-regulated data sources. Many organizations rely on a single data source to get the customer information. The problem with this approach is that different data sources may have different accuracies for different segments of customers. For example, different data sources may have different accuracies based on geographical locations. In particular, Equifax® may present more accurate data for the Eastern states of the USA, while Experian may present more accurate data for the Western states of the USA. Further, different data sources may have different accuracies in different demographic groups. For example, mortgage bank repositories may provide more accurate data for customers aged 50 and above, but not for customers aged between 20 and 30.
To overcome this problem, some organizations use multiple data sources, and rely on a priority order of the data sources prepared by their employees to select customer information. Unavailability of customer information at a high priority data source may prompt the use of customer information available from a lower priority data source. The employees manually assess various data sources to prioritize the data sources for selecting customer information. Such assessment may involve historic accuracy of the data source, or a control set of records from the data source, verified with the customers themselves. For example, a manual assessment of the accuracy of date of birth information obtained from various data sources based on a sample set of customers. The problem with this approach is that this fails to provide a true representation of all the data. Moreover, it is difficult to handle a large volume of data using the manual process and data verification with the customers themselves.
Therefore, a long felt need exists for a method and system that overcomes these and other problems associated with current techniques to determine the most reliable data source and data sources with lower reliability.