The contact data industry typically sells lists of contacts that match specific criteria, for example, “C-level executives in the Washington, D.C. metropolitan area”, “human resources executives in a technology company with more than about 500 employees”, etc. However, it is difficult to estimate the number of contacts that would be returned from a list that contains every person in the world who matches the criteria.
Given an arbitrary specification for a list, there is a need for a computer implemented method and system that estimates the number of people who would match the criteria, that is, a target subpopulation, if a provider's contact database had an entry for every person in the world, that is, the population.
Subpopulation size estimates allow list recipients to assess coverage of a list. If a purchased list contains, for example, about 2,000 contacts, but the size of the target subpopulation is 200,000, then another list provider or vendor or alternative methods other than direct electronic mail (email) for reaching a target audience are needed. If the target subpopulation contains, for example, about 2,200 contacts, then the list of about 2000 contacts may be considered adequate. Estimates of subpopulation size also help list providers establish list value. The more complete a list, the more valuable the list is, if the subpopulation is small and hard to reach. The list providers can also focus on collection efforts in subpopulations where coverage is poor.
A conventional method for estimating a composition of a population estimates proportions of a population that, for example, access a web page, where the population has certain demographics such as proportion of a male population versus a female population. However, the sources of information on the demographics of users have biases and there is a need to account for the biases to obtain more accurate estimates of the population composition. Another conventional population estimation method employs a mark and recapture estimation method in the healthcare industry and conducts surveys to actively collect information and identify healthcare professionals with a predefined attribute. Another population estimation method performs probabilistic population size estimation and overlap determination where the number of individuals in a population is estimated when those individuals have multiple distinct properties comprising, for example, cars with different makes, models, years, colors, etc., but no unique identifiers such as license plate numbers. In this method, the input is a database comprising information about the number of times each property is observed. However, the distribution of the individual properties in the population must be known.
Hence, there is a long felt but unresolved need for a computer implemented method and system that estimates a size of a target subpopulation and quantifies size estimation uncertainty.