Particular embodiments generally relate to data analysis and more specifically to reducing bias in likelihood estimation of a target feature.
Likelihood estimation involves using an estimator, called a likelihood estimator, to estimate a true probability distribution characterizing a given data set. Likelihood estimators, also called density estimators, may facilitate answering specific questions or features of interest pertaining to the data.
Processes for determining an estimator of a probability distribution or likelihood estimator descriptive of a given dataset are employed in various demanding applications including analysis of experimental results from scientific studies, determination of probabilities of occurrences of certain events based on observations, and so on. Such applications often demand robust, accurate, unbiased methods for determining answers to a specific question or feature of interest given a dataset. The dataset may represent experimental results or other observations.
Probability estimation and likelihood estimation are particularly important in biological fields, such as genetics, medicine, and communications systems, where complex problems involving multiple variables are common. Probability calculations often involve employing a probability distribution, also called a probability density function, to evaluate the probability of occurrence of a given event or other feature of the probability distribution. The true probability distribution or density function is often unknown. Therefore, one uses an estimate of the probability distribution based on data, and one then uses this estimate to evaluate the wished probabilities pertaining to events of interest or another particular feature of the probability distribution. When using likelihood estimation, estimates of the probability of occurrence of a given event, or of any other feature of the true probability distribution, are obtained via one or more estimates of a true probability distribution.
Conventionally, a few global parameters of a distribution, such as the standard deviation and mean of the normal distribution are adjusted to describe the observed data as accurately as possible. Unfortunately, observed data does not always behave according to various well-known distributions, and realistic knowledge about the data-generating experiment typically does not allow one to assume that the true probability distribution of the data can be accurately described with a small number of unknown parameters. These types of descriptions of the true probability distribution of the data are called parametric models in the classical statistical literature, and many standard statistical software packages employ such parametric models.
Consequently, such parametric models for the true probability distribution may yield particularly inaccurate (maximum likelihood) estimators of the true probability distribution, and such estimators are often particularly biased for a particular feature of the probability distribution, and thereby for answering particular questions of interest.