1. Field of the Invention
The present invention relates to sampling data and more specifically to a system and method of generating and implementing an optimal sampling plan for data contained in populations of collectives for purposes such as opinion research.
2. Introduction
In many market or opinion research applications, one goal is to make statistically valid statements about the distribution of some opinion measure or other type of measurement in the population under study. Sometimes the population under study consists of individual entities each of which may be itself a collective having multiple members. For example, the population under study might consist of business firms, each of which may be viewed as a collective consisting of all the firm's employees (or some subset of employees of particular interest, e.g., those whose job function is in the Information Technology category). Other examples include populations of schools, households, churches, counties, and other public or private institutions. Within a collective there is typically some variability of opinion among its members. Despite the variability of opinion usually found within a collective, collectives are frequently assigned a single summary number that is representative in some sense of the opinions of the collective's relevant members, e.g., the average rating. This simplifies the task making valid statements about the distribution of opinion across all the collectives in the population under study or comparing distributions from different populations, which is often the main goal of the research. If not all of the relevant members of a collective are sampled, there is likely to be some degree of error in the summary measure calculated for that collective (within-collective sampling error).
Similarly, if not all collectives are sampled, there will be some error in estimating the characteristics of the distribution of the summary measure over collectives (between-collective sampling error), even if the summary measure calculated on each sampled collective were free of error. What is needed is a sampling procedure and associated procedures for statistical inference that take into account both within-collective sampling error and between-collective sampling error and make optimal tradeoffs between the two sources of error.