The invention relates to a system and method for creating and maintaining data records. More particularly, the invention relates to a system and method for processing data from different sources which is in various formats to create records.
It is a common experience to call a telephone operator at a call center for information assistance. In a typical information assistance call, a customer identifies to the operator the name and address of a party whose telephone number is desired. In response, the operator locates the desired destination number using a computer database and the destination number is provided to the customer.
Typically, the computer database includes data from various data providers. These data providers may be telecommunication companies, such as Pacific Bell, GTE, or AT&T, to name a few. The data usually includes records having listing names, addresses, and telephone numbers of individuals and businesses throughout the United States and other countries. When processing millions of records (e.g., telephone listings) the reliability of the records becomes an issue. For example, a 5% error rate in data, comprising 300,000,000 records, yields 15,000,000 erroneous records. Since each data provider submits similar data, combining data from the various data providers is useful in creating accurate records. However, combining data is not easy because each data provider uses a different data format and the quality of data varies from provider to provider. Often data among the various data providers is inconsistent. For example, data representing a listing Diana Elizabeth Nicholls of 20 West 64TH Street from one data provider may look like Nicholls, Diana E. of 20 W 64th St from another data provider. Even though these listings represent the same person, the differences in format and nomenclature make it difficult for a computer database to determine whether the listings represent the same person or entity, and to combine and compare data. Accordingly, there is a need for a technique for processing data having various formats, which effectively selects and combines the data to create accurate records.