1. Field of the Invention
The present invention relates to machine learning techniques. More particularly to the techniques of construction and training of the machine using a training set that consist of ordered pairs of given queries and answers. Upon completion of the training, the machine should be able to provide an answer to any query from the space spanned in some sense by the set of training queries.
2. Description of the Related Art
The techniques generally referred to as ‘Learning machines’ include, among others, Neural Networks, Evolutionary Methods (including Genetic Algorithms) and Support Vector Machines.
The applications of learning machines are, to list a few, Speech-, Image-, Character- and Pattern-Recognition and Data Mining. Various new applications of machine learning may emerge as more efficient learning machines will appear.
Here we present examples of prior art and first quote the definition of the neural network from p.2 of “Neural Networks” by Haykin (Prentice Hall, 1999), the entire content of which is herein incorporated by reference:
‘A neural network is a massively parallel distributed processor made up of simple processing units, which has a natural propensity for storing experimental knowledge and making it available for use. It resembles the brain in two respects:                1. Knowledge is acquired by the network from its environment through a learning process.        2. Interneuron connection strengths, known as synaptic weights, are used to store the acquired knowledge.’        
The definition of the evolutionary methods, quoted from p.373 of “Pattern Classification”, by Stork, Duda and Hart (J.Wiley & Sons, 2000), the entire content of which is herein incorporated by reference, is:
‘Inspired by the process of biological evolution, evolutionary methods of classifier design employ stochastic search for an optimal classifier . . .’.
Both of these families of techniques utilize differential or statistical optimizers to implement the respective learning machine. Implicit in these definitions is the fact that prior art learning machines have a fixed internal structure containing a set of free parameters. Learning is implemented by a procedure of updating of these parameters.
Shortcomings of such a fixed internal structure derive from the fact that it is impossible to know in advance the most appropriate internal structure for the learning problem. The examples of such shortcomings are under-fitting and over-fitting. Some of the shortcomings of the procedures of parameter update are the local minima phenomena and the long learning time.
We wish to express the opinion that many of the shortcomings of learning machines from the families of Neural Networks and Evolutionary Methods owed to the fact that the architecture as well as the parameter update procedure of these learning machines was inspired by a desire to mimic biological mechanisms. However, these need not necessarily the most appropriate for implementation in an inanimate machine.
The idea of Support vector Machines (SVM), quoted from p.421 of “Statistical Learning theory” by Vapnik (J.Wiley & Sons, 1998), the entire content of which is herein incorporated by reference, is:
‘It (SVM) maps the input vector f into the high-dimensional ‘feature space’ Z through some nonlinear mapping, chosen a priori. In this space, an optimal separating hyperplane is constructed’.
Some of the shortcomings of SVM are the following: It is impossible to guarantee that the nonlinear mapping which has been a priori chosen will make the classes linearly separable by a hyperplane. The computational complexity of finding an optimal separating hyperplane can be high. The class label that results from the separation has only one bit value which, in many cases, is insufficient information.
The state of the prior art shortly introduced above is described, in much more detail in: Haykin; Stork, Duda and Hart; and Vapnik (see above for reference details).
The problem of machine learning is also referred to as the problem of recovering or approximating a multivariate function from sparse data, which are indeed the training set mentioned above. However such a problem is recognized to be an ill-posed problem and in order to solve it regularization theory and variational analysis are involved, for which see “Solutions of Ill-Posed Problems” by Tichonov and Arsenin (W. H.Winston, 1977), the entire content of which is herein incorporated by reference. The shortcomings of these approaches are local minima phenomena and unduly large computational complexity. Recent papers on these topics are: “A unified framework for regularization networks and support vector machines” by Evgeniou, Pontil and Poggio (Technical Report AI Memo No. 1654, MIT, 1999); and “Data Mining with Sparse Grids” by Gabriel, Garcke and Thess (Bonn University & Computing, 2000), the entire contents of which are herein incorporated by reference. In particular the treatment in the paper by Evgeniou, Pontil and Poggio, using the regularization technique, translates the problem into an SVM solution. Reference Gabriel, Garcke and Thess, using the variational technique, is thwarted by complexity even with relatively small dimensions of the feature space.