1. Field of the Invention
The present invention relates in general to the field of information processing, and more specifically to a system and method for determining confidence intervals for weighted trial data.
2. Description of the Related Art
Confidence intervals are often used to characterize the precision of statistical results obtained from sample data. A confidence interval gives an estimated range of values which will include an unknown population parameter with a confidence probability of pconf. The estimated range is calculated from a number of trials in a given data sample. The width of the confidence interval provides an idea about the uncertainty of an unknown parameter. As depicted in FIG. 1, the width of the confidence interval is bounded by high, hi, and low, li, bounds. As the number of trials N 102 increases, the width 104 of the confidence interval decreases.
FIG. 2 depicts a conventional confidence interval determination and application process 200 using a computer system to determine confidence intervals. Process 200 takes in sample data 202. The N and ni determination module 204 determines from the sample data 202 the total number of N trials. Assuming that on any trial, there are j possible outcomes. Then, in N trials, there are n1 trials resulting in outcome 1, n2 trials resulting in outcome 2, . . . , nj trials resulting in outcome j, such that:n1+n2+ . . . +nj=N. Thus, the expected probability θi of obtaining outcome i in any random trial is ni/N,
The confidence interval determination module 206 determines conventional confidence intervals that provide a confidence range (li, hi) such that the unknown actual probability Θi of outcome i occurring in a random trial, given a sample of outcome i occurring ni times out of N trials, has a pconf probability of falling within the range li≦Θi≦hi. Once module 206 determines a confidence interval, confidence interval application module applies the confidence interval, bounded by li and hi, to data.
Application of confidence intervals to data can play a major role in the interpretation of data through an understanding of the precision of statistics derived from sample data. For example, for standard data samples confidence intervals can indicate whether trial data and information derived therefrom can be relied upon with a specified confidence probability, pconf. Trial data refers to a collection of one or more individual sample data. For example, product demand data can be used to determine which products to build in order to match consumer demand. If the product demand data has a large confidence interval, then building products in accordance with the product demand data can represent a significant commercial risk.
In most circumstances each outcome i is weighted equally with every other outcome. For example, rolling a one (1) on a die is weighted the same as rolling any other number on the die. In other words, a one (1) outcome has no more significance than any other number outcome. In some circumstances, outcomes are given slightly different weights according to their significance. For example, for political polling data answers from members of a first group may be counted 2% more than members of a second group because members in the first group are 2% more likely to vote than members of the second group. When trials are weighted, conventional processes determine ni by determining the Θi from the weighted mix of N trials, and multiplying Θi times N trials.