Multiple data sources exist in the art of tracking various types of information concerning individuals. For example, credit bureaus have been designed to track the credit history of many individuals that are consumers in commerce. In addition to the credit histories of each individual, the credit bureaus also track the mailing addresses and other contact information for each individual.
Unfortunately, information concerning individuals can change frequently, such as a frequency of a monthly basis. Such changes in information during such a short time span can make it difficult to track the locations and mailing addresses of these individuals. This change of information makes it difficult to track the individuals over time and it makes it difficult to track individuals accurately at the aforementioned intervals such as a monthly basis.
Further, when tracking individual consumers, this leads to enormous amounts of data and that can slow up any processing by a single computer. For example, processing data files comprising 280,000,000 records is not uncommon for credit bureaus. Such a size of records also decreases the speed at which records can be processed.
In addition to the problems associated with amount of data being tracked for a group of individuals, it is well understood in the conventional art that third party data sources other than credit bureaus can contain inaccurate information. Such inaccurate information can be obtained by the third part data sources from consumers who complete voluntary surveys. For example, such information found in third party data sources can include information gathered from warranty registration cards, tickets for raffles, and other like survey acquisition methods.
With inaccurate data, inaccurate linking between files or records within a database can occur. For example, while it is possible to have multiple addresses for one person, it is also possible that more than one person is being tracked by two or more files even though the information, such as addresses, contained within the files appear to indicate that the people associated with these files should be grouped together. On the other extreme is the absence of linking data where two files are intended to track the same individual.
For example, a first credit reporting agency may have new or recent information on new accounts that have been opened by consumer X. Meanwhile, a second credit reporting agency has old data about consumer X. If the information of the first credit reporting agency is not linked with the second credit reporting agency, and if someone relies completely on the data maintained by the second credit reporting agency, then consumer X may receive a business product that would not be suitable for a consumer with a deeper credit history.
While inaccurate linking between files or records within the conventional art poses a significant problem, another problem exists in the art that involves a philosophy of the conventional art and how it perceives two or more files that may be similar but not identical to each other. More specifically, the philosophy of the conventional art considers two files that are similar but not identical as duplicates and often drops one of the perceived duplicates when collecting or combining data.
Such a process is often referred to as a merge/purge process. The merge/purge process of the conventional art is typically performed once and the results from one iteration will usually be different than the results of the second iteration because different data sources are typically used for the second iteration.
In addition to the merge/purge problems in the conventional art, the conventional art utilizes encryption to address the volume of records and the processing power needed to handle the volume. The conventional art typically truncates the information relating to individuals or creates a key for certain fields in order to shorten the amount of characters in a field being processed. This truncation drops data and reduces variation in order to conserve processing power.
While encryption offers some advantages, it is well known that the algorithms involving encryption can become very complex and require expensive computer programmer development time. For systems utilizing an all-encryption environment, the complexities involving encryption can be very cumbersome; meanwhile, performance with such an all-encryption environment may not yield consistent and accurate results. Further, in an all-encryption environment, modifications to rules and programming can be difficult and time consuming.