Methods for estimating the distribution density of input values over a range of input values are used, for instance, in classifying samples. The histogram density-estimation method is a nonparametric method for estimating the distribution density of input values over a range of input values.
Such a method can be illustrated with an example related to determining the distribution of various dark gray values for a series of workpieces, such as for purposes of quality control. The gray values of a series of workpieces are measured and assigned to a scale of gray values. In assessing the color quality of the workpieces, the criterion of interest is the distribution density over the entire range of gray values, as opposed to the exact gray values of the individual workpieces.
In determining the distribution density of measured gray values of a series of workpieces over a range of input values, the range of gray values is divided up into partitions, or grades of gray values, and each measured gray value is assigned to that partition within which it falls. Dividing the range of input values into partitions and defining the size of the partitions is referred to as quantization. Measured gray values assigned to each partition are counted and divided by the size of the partition and the entire number of measured gray values. In this manner, an average density is determined for each partition and is treated as an estimated density value. This is carried out for all partitions in the range of input values, so that an estimation of the distribution density of the input values is made over the range of input values.
The smaller the partitions, the more closely the distribution density of the histogram represents the actual distribution of the measured gray values. As the partitions are made smaller, however, more partitions are needed, the number of random samples needed to formulate statistically significant estimation values increases, and the cost of computing the distribution density increases.
In a known histogram method for estimating density, the range of input values is divided up into partitions of a constant size. See, e.g., K. E. Willard, "Nonparametric Probability Density Estimation: Improvements to the Histogram for Laboratory Data", Computers And Biomedical Research, 25, 1992, pp. 17-28. In this method, however, the specified number of partitions is not optimally adapted to the distribution of input values over the entire range of input values, so that the estimation of distribution density that is made is inaccurate.
In another known histogram method for estimating density, the size of the partitions is optimized with the aid of computational methods, on the condition that the size be the same for all partitions and that the number of partitions be freely selectable to obtain the most precise possible estimation of distribution density. See, e.g., D. Freedman, "On the Histogram as a Density Estimator: L.sub.2 Theory", Journal of Probability Theory and Related Fields 57, 1981, pp. 453-476. This method, however, entails a high degree of complexity and does not achieve optimal quantization for distributions of input values with abrupt value fluctuations.