Stochastically optimized classifiers are widespread in remote sensing applications. From researchers who design their own automatic classification algorithms to those who use commercially available packages such as ENVI (ENVI Users Guide, 2000), stochastic optimization is at the core of many of these algorithms, which include, for example, vector quantization algorithms and neural networks. Many of these algorithms have significant optimization times or are prone to problems with local minima. Some of these problems can be overcome by replacing completely stochastic sampling with a more active sampling strategy, or “active learning.”
In one approach termed “active learning”, the basic concept is that training in stochastic optimization routines is inherently inefficient, and that selective presentation of patterns within the context of random sampling can lead to faster convergence and, in many instances, better solutions. In (Park and Hu, 1996), it was shown that by choosing samples that lie within a particular distance from current decision boundaries, convergence to the optimal solution is guaranteed, and simple low-dimensional examples were used to prove that their approach could accelerate the rate of convergence for a popular vector quantization known as Learning Vector Quantization (LVQ) (Kohonen, 1997). In the later stages of optimization, their approach is intuitively appealing because it suggests that the majority of errors occur near decision boundaries and that that is where the majority of refinements of the decision boundaries should be concentrated. Nevertheless, although this is better than naive stochastic sampling, it may be too restrictive in the early stages of optimization when the decision boundaries are not necessarily in close proximity to their optimal positions.
In many of the prior art approaches, illustrative data are low dimensional and in some cases artificially constructed. A number of the previous approaches to active learning are also slow, especially for high-dimensional applications such as hyper-spectral imagery. For example, the multi-point search method described in (Fukumizu, 2000) requires an integral over expressions involving second order derivatives (Fisher Information matrices) and is, therefore, significantly more complicated and computationally expensive than the expressions which govern the approach that I have developed. Likewise, some algorithms have been designed around a specific algorithm such as in (Yamauchi, Yamaguchi and Ishii, 1996), in which a complicated scheme of potential pattern interference must also be estimated as the model complexity grows. Similarly, the approach defined in (Hwang et al., 1991) also involves significant computational overhead, primarily from an inversion process which itself is implemented as a stochastic gradient descent algorithm; their approach also uses conjugate pairs of pseudo-patterns which also must be estimated. This inversion process thus requires significant computational overhead, and their entire approach is specific to a particular neural network algorithm.
There is, therefore, a need for a classifier system requiring less computational resources, having faster computational speed, and having greater efficiency than prior art systems.