This specification relates to determining the significance of test statistics. Researchers use experiments to test how subjects are changed by being exposed to various treatments. In general, a treatment is something that researchers expose subjects to in order to determine possible effects of the treatment. For example, in an experiment to determine the effect of a particular medication, a subject is exposed to the treatment if the subject takes the medication. As another example, in an experiment to determine the effect of listening to music by Mozart on student test scores, a student is exposed to the treatment if the student listens to music by Mozart. As yet another example, in an experiment to determine the effect of viewing an advertisement on user behavior, a user is exposed to the treatment if the user views the advertisement.
Different types of experiments can be used to estimate the effect of a treatment on subject behavior. Controlled experiments, in which a random proper subset of subjects are exposed to a treatment, can provide estimates of how being exposed to a treatment changes both short-term and long-term subject behavior. However, these experiments are often both expensive and impractical.
Observational studies are used as an alternative to controlled experiments. Observational studies compare exposed subjects, i.e., subjects who were exposed to a treatment, to control subjects, i.e., subjects who were not exposed to the treatment. Observational studies compare exposed and control subjects to estimate the effect the treatment has on particular actions monitored by the study. The effect is measured by generating action-specific models and using the models to generate test statistics that estimate the effect being exposed to a treatment has on a particular action of interest.
Once the test statistic for a particular action is generated, it is determined whether the effect indicated by the test statistic is due to being exposed to the treatment, or is due just to chance. A test statistic is statistically significant if the effect indicated by the statistic is due to more than just chance.
Statistical significance is generally determined by verifying or rejecting a null hypothesis. The null hypothesis states that any effect indicated by the test statistic is due to mere chance. Conventional methods for determining statistical significance verify or reject the null hypothesis by comparing the test statistic to a reference distribution, for example, a standard normal distribution. If the test statistic falls within an upper percentile of the distribution (for example, the top 5%), the null hypothesis is rejected, and the test statistic is determined to be statistically significant.
This significance test is sufficient when the model(s) used to generate the test statistic are adequate. However, the model(s) used to generate test statistics are often not complete. The model(s) are generated using only a proper subset of possible features of subjects. If a feature is correlated with both exposure to the treatment and the action being tested, but not used by the model(s) that generated the test statistic, then there can be hidden, residual selection bias in the model. This hidden, residual selection bias will result in a test statistic that is inaccurate. The hidden, residual selection bias will not be discovered using the conventional methods for determining statistical significance, and therefore a test statistic can be incorrectly determined to be statistically significant.