Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for purposes of statistical inference. The source from which a sample is drawn is called a sampling frame. A sampling frame is a collection of data, it is a list of (or a means of listing) all those within a population who can be sampled. In defining the frame, practical, economic, ethical and technical issues may need to be addressed.
Having established a sampling frame, there a number of ways to organize it to improve efficiency and effectiveness. In a simple sampling case, all elements of the frame are treated equally and it is not subdivided or partitioned. A sampling method is applied to the whole frame. Where the population embraces a number of distinct categories, the frame can be organized by these categories into separate strata or demographics. A sampling method is then applied to each stratum separately. Major gains in efficiency (either lower sample size or higher precision) can be achieved by varying the sampling fraction from stratum to stratum. Some general rules are: the sample sizes could be made proportional to the stratum standard deviation or variance and strata should be chosen to have means which differ substantially from one another. Where items in the population are clustered, sampling can reflect this to minimize costs and/or for ease of data collection. For example, a simple random sample of telephone calls is difficult to obtain from a telephone exchange, but it is easy to take a sample of customer lines and observe all of the calls on the sampled lines. That is, calls are naturally clustered by lines. The sample units within clusters tend to be more similar than randomly chosen sample units and thus clustering requires larger sample sizes to compensate for this reduction in the amount of information per sample element.
In any of the types of sample frames identified, a variety of sampling methods can be employed, individually or in combination. In probability sampling, for example, every combination of items from the frame or stratum has a known probability of occurring, but these probabilities are not necessarily equal. A common example of a probability sample is a random sample. In any form of sampling there is a risk that the sample will not adequately represent the population, but with probability sampling there is a large body of statistical theory which quantifies the risk and thus enables an appropriate sample size to be chosen. Furthermore, once the sample has been taken the sampling error associated with the measured results can be computed. With non-random sampling there is no measure of the associated sampling error. While such methods may be cheaper, this is largely meaningless since there is no measure of quality. There are several forms of probability sampling. For example, in simple random sampling, each element has an equal probability of occurring. This may be infeasible in many practical situations. Other examples of probability sampling include proportionate stratified sampling, disproportionate stratified random sampling and multistage sampling. Some probability samples do not have good methods of measuring quality. For example, selecting every tenth name from the telephone directory is simple to implement and is an example of systematic sampling. Though simple to implement, asymmetries and biases in the structure of the data can lead to bias in results. It is a type of probability sampling, but does not have a good way of measuring quality unless the directory itself is randomized before selection. Mechanical sampling occurs in sampling solids, liquids and gases, using collection devices such as grabs, scoops, probes, etc. Mechanical sampling is not random unless the material being sampled is known to be randomized. In many applications the sampler makes an assumption that the material sampled is randomized, and it is in fact a type of non-probability sampling. Care is needed to insure that the sample is representative of the frame. Sometimes called grab sampling, convenience sampling is the method of choosing items arbitrarily and in an unstructured manner from the population. Though almost impossible to treat rigorously, it is the method commonly employed in many practical situations.
Multiple data sources are sometimes available as potential sampling frames for population surveys, in some situations the use of a multiple frame sample design is more advantageous than using a single sampling frame. For example, one often finds that any one frame by itself may be inadequate to completely cover all units (households, persons, etc.) in the target population. It has been found that by overlapping a list frame and an area frame, for example, more complete coverage of the target population may be insured. For example, dual frame sample designs are appropriate for situations in which the target population densely populates one incomplete frame but forms only a minority of elements in another complete frame. The use of multiple sampling frames, however, has variance and bias implications, as well as sampling, data collection, and logistical considerations.
A schematic representation of a dual framed sample is illustrated in FIG. 1. In this case, the population is defined by the contents of both Frame 1 and Frame 2. Note that if there is no overlap between Frame 1 and Frame 2 a simple stratified sampling model may be employed. However, in dual frame sampling the two different frames may overlap. That is, certain elements of the universal population encompassed by the two frames may be in both frames. For example, as illustrated in FIG. 1, we may see that there are three subsets of the universal sampled population, population A, which is found only in Frame 1, population B, which is found in both Frame 1 and Frame 2, and population C, which is found only in Frame 2. In order to obtain an accurate analysis using such a dual frame sample the sample data must be weighted appropriately. Such weighting may be used to obtain a weighted estimate which takes into account the overlap of the sample frames.
An estimator for a population parameter, such as the population total of both frames of a dual frame sample was presented by H. O. Hartley in 1962 as follows:{tilde over (X)}pop={tilde over (X)}frame1 only+P{tilde over (X)}frame overlap+Q{tilde over (X)}frame2only  (1)where P+Q=1. P is the proportion of Frame 1 which overlaps with Frame 2.{tilde over (X)} is an estimator for a population parameter such as the population total. In this case we can think of this overlapping dual frame sample as four samples. A sample from A from Frame 1, a sample from B from Frame 1, a sample from B from Frame 2, and a sample from C from Frame 2.
The estimator developed by Hartley has several limitations. This estimator requires the determination of a parameter that is presumed to be the proportion of Frame 2 that is in Frame 1. However, determining such a value can be problematic and often the best that can be used is a good guess. Thus, the accuracy of the estimator is limited. Furthermore, the estimator has not been generalized to more than two sample frames.
What is desired, therefore, is an improved system and method for multi-frame sampling for statistical analysis. In particular, what is desired is a system and method that employs an improved estimator that is applicable to dual and multiple frame samples, and that is more accurate and efficient than current systems and methods employing known estimators.