1. Field of the Invention
The present invention relates generally to data processing. More particularly, the present invention relates to the field of population mixture modeling and threshold segmentation of data.
2. Description of the Related Art
Optimization problems arise in nearly every area of science and engineering. For example, a chemist may want to determine the minimum energy structure of a certain polymer, a biologist may need to investigate optimal dosages and scheduling of drugs administered to a cancer patient, a civil engineer may need to decide how many elevators to place in an office building so that waiting times are minimized, or an image scientist may wish to be able to automatically locate people in an image. A typical optimization problem takes the form of: minimize ƒ(x); subject to xεD. Where x=(x1, x2, . . . , xn)T is a vector of parameters, and ƒ is an objective function. Stated another way: find x* such that ƒ(x*)≦ƒ(x) for all xεD. Note that the converse problem, maximization problem, can be posed in this manner by simply negating ƒ(x). If D is a discrete set, the problem is referred to as a combinatorial optimization problem. On the other hand, if D is closed and ƒ(x) is continuous over its domain, this is a continuous optimization problem.
A variety of methods exist to determine local solutions for both combinatorial and continuous optimization problems. That is, x′ is a local solution if ƒ(x′)≦ƒ(x) for all xεN∩D, where N is a neighborhood of x′. For combinatorial problems, a simple choice is the iterative improvement method, which is also properly described as a ‘deterministic descent’ method. More sophisticated methods, such as a branch and bound method, which is a method that solves a series of continuous optimization problems, are also known in the art. For continuous problems, a variety of local minimization methods also exist. Such methods require function values (simplex), function values and derivatives (conjugate gradient and quasi-Newton methods), or function values first and second derivatives (Newton and restricted step methods). In situations where D is compact, it can be represented by the intersection of equality and inequality constraints. Such constrained optimization problems arise frequently in practice and can be solved in a variety of ways; many of which are based on the theory of Lagrange multipliers.
The problem of global optimization is much more complicated. Theoretically, global minimization is as simple as finding the minimum of all local minima and boundary points. In application, this can be impractical. Most local optimization techniques require an initially feasible point x(1)εD from which to begin an algorithmic process of optimization. In order to find a given local minima, it is necessary to choose a feasible point that will converge to that minima. However, because there are often times many local minima, determining such a feasible point for a particular local minima can be as difficult as solving the optimization problem itself.
Global optimization problems can be categorized as either stochastic or deterministic problems. Stochastic, or Monte Carlo, methods (e.g., random search, Metropolis algorithm, simulated annealing, genetic algorithms, etc.) randomly sample D in an attempt to locate feasible points that are local minima (or lie sufficiently close to local minima). Deterministic methods carry out some non-random search procedure to sift through local minima and find the global minimum. For example, branching techniques set up simple sub-problems, which are individually solved. Sub-energy tunneling techniques iteratively transform the objective function to remove local minima. Path following techniques indirectly locate local minima by solving a system of differential equations.
In the field of digital imaging, optimization techniques are useful as part of a point-based image segmentation process. Segmenting grayscale images is a useful task in imaging science. Image segmentation can be used directly for image compression or as a preprocessing step for background removal, feature detection, or pattern recognition algorithms, or possibly other applications. A wide variety of segmentation techniques have emerged in the literature over the last two decades. The simplest methods, and the most efficacious to representation as a global optimization problem, are point-based, where each pixel is quantized based on thresholds determined by analysis of the histogram of the image. There also exist more sophisticated (and time consuming) approaches that are the region-based methods, which take into account local spatial characteristics of the image while quantizing each pixel. Naturally, these techniques are equally applicable to color images.
Generally, point-based image segmentation is comprised a several discrete operations. First a histogram of a digital image is generated. Second, parameters for a plurality of probability density functions are determined, which correspond to a like number of sub-populations in the histogram data. Third, threshold levels are specified segmenting the image data population according to the probability density functions. And fourth, the threshold levels are applied to the histogram. Optimization, or more particularly minimization, occurs with respect to the second operation.
A population mixture model is a probability density function that can be expressed as the weighted sum of multiple sub-populations (other probability density functions). In applications such as image segmentation or cluster analysis, algorithms exist in the prior art that require fitting a population mixture model to a given set of data (a histogram in the case of image segmentation). Current methods for determining the optimal parameters of the population mixture model require prior knowledge of the number of sub-populations appropriate for the data being analyzed. In the prior art, this knowledge is frequently attained by manual inspection or by an automatic pre-processing step. Alternatively, optimization techniques utilizing a multi-start approach have been employed. Such methods fit the model repeatedly for different numbers of sub-populations. Multi-start approaches have two major drawbacks, however. The maximum number of sub-populations is arbitrarily chosen by some preprocessing step, and the computational effort required to repeatedly fit the model is immense.
Therefore, there is a need in the art to eliminate manual, preprocessing, and other multi-start approaches to population mixture modeling that require a priori knowledge of the number of sub-populations appropriate to a given data set prior to conducting an optimization process, or require immense computational effort, like the multistart approach.