1. Technical Field
The present invention relates in general to a method and system for determining demographic data accuracy. More particularly, the present invention relates to a system and method for assessing the accuracy of selected demographic data elements that may be purchased from third party data vendors about households and individuals in those households.
2. Description of the Related Art
Understanding customers and what may explain their behavior and preferences is a primary factor in being successful in serving those customers. Because businesses typically have little information about their customers other than name, address, and transaction history, it is useful to add information about the customer from third party sources (“consumer appending” vendors).
Consumer data is compiled from a variety of sources including surveys, phone books, credit applications, public records, and other self-reported information. Consumer appending vendors attempt to complete a demographic profile of every household by combining data from these sources. However, due to the variety of sources used and the ability to match households across sources, there is some inherent level of inaccuracy. Reasons for inaccuracy range from misspellings to deliberate statements of misinformation.
There are some data inaccuracies that may not be overcome, although attempts are being made in improving the technology used in data compilation. Among these “data holes” are incompleteness, inaccuracy and mismatched data. For example, consumers often misunderstand survey questions or fill in the wrong blank by accident. Some of these errors are the result of programming mismatches, but many are related to the actual sources that contribute data to the data provider. Perfect data may not be possible, but techniques can be used to improve the accuracy of the data.
Data providers often use the term “data quality” to describe data accuracy. Data quality is further described in terms of Overall Match Rate, Elemental Match Rates, and Accuracy. These are often the factors that companies consider when purchasing data or conducting a test of data quality. Overall Match Rate refers to the number of records being received from the data provider with respect to the number being submitted for enhancement. The Overall Match rate is determined by matches on last name and address. Thus it is affected by the quality of these fields in data submitted for enhancement. Enhancement is defined as the addition of information to an individual consumer record (i.e., a “household”). For example, if a list of 1,000 customer names is sent to a data provider and the data provider returns data on 800 customer names, the overall match rate is 80%. This applies to the total number of records with appended data, not the number of data appended to each record. When comparing data providers, many companies find match rates to be an important variable, which is why consumer appending vendors often provide (at no cost to buyer) overall and data element match rates on a sample of data. Low match rates may mean that the data provider does not have a large enough representation of a customer base to provide the desired information. Elemental Match Rates refer to the number of elements requested for each record versus the total number of elements appended to a file. An element is a unit of data, a “demographic data field,” such as age of householder, household income, whether a household owns or rents property, etc. One record will have many elements, one for each demographic field potentially appended. Some data providers have more elements in their database than others. For this reason, a company providing a 100% match rate but returning only half of the requested elements may not be the data provider of choice. Data elements may not be returned because they are not collected or, more likely, because the corresponding information could not be found; i.e., the data element is missing. It is also useful to look at the average number of elements returned per record for the elements provided. A 100% overall match rate with a 50% elemental match rate implies that ½ of the database for this element contains missing fields. Data providers often measure elemental match rate differently. Some providers measure elemental match rates as the ratio of elements appended to matched records. In the 1,000 record example described above, a data provider may measure an ordered element with 600 matches for a single element as 600/800 (800 matches). This computes to a 75% elemental match rate. Another provider may measure elemental match rate by the number of elements appended to the total records. In the example above, the elemental match rate using this method is 600/1000, or 60%.
Accuracy refers to how accurate the information is in the elements for the households. To determine the accuracy, a random sample of sufficient size is chosen from the total household record list. For example, if the total household list is nationwide, the sample chosen should not be from just one state, but from many states. The sample is then verified against a valid benchmark to determine the accuracy of the file.
A challenge found with existing art is that there is no comparable standard for assessing data accuracy between data vendors. Because of this, it is difficult to decide which data vendor is the right one for a given consumer list analysis. Some data vendors may be better than others in various areas. For example, Data Vendor A may have more accurate household financial information, while Data Vendor B may have more accurate household marital status information. What is needed, therefore, is a way to accurately compare demographic data between data vendors to determine which data vendor provides the best accuracy for a given consumer list.