The main objective of a learning machine is to learn to recognize as well as possible an unknown pattern, given a set of training examples. Of particular importance in the learning process for a learning machine is the method of deriving a decision function based on the training examples, in order to distinguish representatives of one class from representatives of another class. The derived decision function can be used to recognize unknown patterns.
The ability of a decision system to perform well a particular task depends on several factors such as, the number of examples in the training set, the number of errors in the training set and the complexity of the classifier. Prior art decision systems are predicated on two major approaches, each one with particular limitations and drawbacks.
In an approach called the optimal margin approach, a learning machine determines a decision function which is a weighted sum of the components of the input patterns or a weighted sum of arbitrary predefined functions of the input patterns (e.g. Perceptrons and polynomial classifiers). The weights are determined such that a) the number of training patterns that are misclassified by an optimal margin decision system is minimized, and b) the margin between the correctly classified training patterns and the decision boundary is maximized. The margin is defined as the distance of the closest training patterns to the decision boundary. The weights that optimize the margin are a function of only a subset of all training patterns, called the "supporting patterns". The latter are patterns that are closest to the decision boundary.
The capacity of a decision system using the optimal margin approach or, more specifically, the ability of such a system to solve complex problems is restricted by the number of predefined functions. Accordingly, the number of predefined functions had to be limited for computational reasons, hence, limiting the range of applications of decision systems using the optimal margin approach.
In another approach called "memory-based decision approach" prior art decision systems were designed to allow decision functions (e.g. Radial Basis functions, Potential functions) to be represented by a weighted sum of kernel functions, each kernel function being itself a function of two patterns, namely, an input pattern and a memorized pattern (also called "memory" or "prototype"). In this approach, however, prototype patterns and weights are determined in a suboptimal way, thereby limiting the ability of the decision system to perform complex tasks.