The invention relates to a method for a self-organizing, error-free neural network for depicting multidimensional non-linear imaging functions.
Neural networks serve for imaging multi-dimensional feature vectors in simpler concepts or simpler contexts. A characteristic feature of a neural network is a two-stage procedure:
In a first phase, the learning phase, the feature space is divided into regions. PA1 In a second phase, the readout phase, an allocation to the required concept or context is determined through appropriate linkage of these regions.
The two best know neural networks are the multi layer perception (MLP) and the two-dimensional Kohonen neural network (KNN). In MLP, the learning--that is the separation of the feature space--occurs by non-linear functions (mostly Gaussian or step functions). In KNN, the division of the feature space is achieved through centers of gravity (cluster centers). In the readout phase in MLP, the position of a required feature is defined through additive overriding of the corresponding non-linear functions; in KNN, as a rule, the nearest cluster center is simply output.
The terms "supervised" and "unsupervised" learning, likewise frequently used, play no roll therein. MLP is commonly equated with "supervised learning," since the target result of the learning process is used in the determination of the non-linear functions in the feature space, while in KNN, the allocation of the centers of gravity of the cluster to the target result is frequently not done until after the completion of the learning phase, so the learning itself occurs "unsupervised". On the other hand, the division "supervised/unsupervised" has little relevance in KNN, since the target result can be viewed as simply another dimension of the feature space. The structuring of the feature space then occurs in consideration of the target result. The achievement of an imaging function is in any case a "supervised" process, since the trainer of the neural network has the achievement of a desired purpose in mind. The term "self-organizing" is likewise confusing. KNN is frequently characterized as self-organizing. But the self-organization there relates merely to the independent expansion in the feature space, wherein the number of neurons is strictly prescribed.
The following intends to examine how errors originate in neural networks, and how these errors can be reduced, and even completely avoided, through knowledge about their origination. As a rule, neural networks are already burdened with errors in their learning data record. This means that the structural formation of the feature space in the learning phase occurs so unfavorably that the learning data record itself is also imaged incorrectly. The demand for freedom from error can of course only refer to the learning data record itself, as the "real" structure of the feature space is by definition unknown. If it were known, this definition could be utilized directly for the imaging. A useful quality of the neural network is, as a rule, defined in that following the completion of the learning phase, a feature vector until then unknown (a feature vector not contained in the learning random sample) is correctly allocated. The term "generalization" of a neural network is used for this as a rule, this concept characterizing the discovery of the simpler rules underlying the neural network.
The description or the structuring of the feature space is more easily understood in KNN. KNN should thus serve as the basis for this examination. With KNN, it is also easier to define a dimension for a generalization of the neural network or to analogously derive this from the basic ideas of KNN.
The functioning of KNN can be explained using a two-dimensional basic test. The transition to an arbitrarily high number of dimensions of the feature space is defined solely by two functions:
1) According to Euclid, the distance between two points in n-dimensional space is the root of the sum of the squares of the distances in the respective individual dimensions.
2) The function MIN determines that pair of points of a set of n-dimensional points which have the smallest distance to each other in the dimension Figures of the n-dimensional coordinates system.
There was a "feature space" given by the set of all the numbers in the two dimensions x and y. In a computer simulation, this space is further limited to the numbers between 0 and 1, wherein this limitation has no significance for the conclusions to be reached. A "concept space" is further given which is described, in that a connected region B.sub.i in the feature space is allocated to a concept. In a computer simulation, three regions are therein introduced which are depicted in FIG. 1:
B.sub.1 : the region within a circle 1 with radius r.sub.1 about the center x1, y1
B.sub.2 : the region within a circle 2 with radius r.sub.2 about the center x2, y2
B.sub.3 : the remainder 3 of the region outside B.sub.1 and B.sub.2.
That the regions B.sub.1 and B.sub.2 concern circles is of course unknown to the neural network to be trained. These circle characteristics are merely utilized in the computer simulation to create the feature vectors. The object of the neural network will be to find these regions in the learning phase using a finite number of examples, and to correctly allocate--later, in the readout phase--new feature combinations which were unavailable in the learning phase.
Using this example, a demand for the solvability of a problem with the aid of a neural network can already be established. In contrast, in a reverse conclusion, a neural network can reject a problem as unsolvable in some cases. For example, should the regions B.sub.1 and B.sub.2 overlap, then feature vectors would exist which allocate not only the concept B.sub.1, but also the concept B.sub.2 to a coordinate x, y. Such a problem is only achievable if the dimension of the feature vector is raised by at least one, i.e., if additional information is consulted for the solution of the problem.
Should the regions B.sub.1 and B.sub.2 mutually interlace in a very complicated fashion without overlapping, then the demand is placed on the universal set of the learning data that--analogously to the scanning theory of communications technology--they describe the regions B.sub.1 and B.sub.2 in a sufficiently precise manner. The demand on the neural network to be realized therein is that, given an arbitrarily complicated form of the region B.sub.1, the neural network proposes a model which images at least the learning data record in an error-free fashion. For this error-free imaging, there exists a trivial solution in KNN; it is formed simply of the set of all learning data. However, the utilization of all the learning data would correspond exactly to the generalization zero. The neural network would have "understood exactly nothing". Thus, generalization must be related to how many feature vectors can be discarded without compromising the reliability of a statement. One can now show that KNN as previously expressed works error-free only in the trivial solution and in a few special cases.
There is a real learning data record of L elements (vectors), wherein each element is represented by two coordinates x.sub.i and y.sub.i, as well as the information which concept B.sub.i is dealt with in the respective coordinate. x.sub.i and y.sub.i are equally distributed random numbers. The training begins with the choice of the necessary number K of what are known as Kohonen cells. In this regard, there is a lot of literature with suggestions for the choice. FIG. 2 shows the learning result for L=2000, B=3, and K=16. The choice K=16 is itself meaningless for the basic result. The precise learning coefficients are likewise meaningless, since what is interesting therein is the principle of error, and not the absolute quantity of error.
FIG. 2 depicts sixteen regions--what are known as cells 4--which arise through the choice of K. The cells 4 are represented by the respective cell core 5, which possesses precisely the coordinate x.sub.j, y.sub.j learned through appropriate mean value formations in KNN. Because of the readout specification "minimal distance", all future coordinates are allocated to the respective surfaces of the Kohonen cells. Using the circles B.sub.1 one can clearly see that as a rule, errors are unavoidable therein. The quality of the KNN is essentially dependent on the number of Kohonen cells; the more of the cells which are utilized, the smaller the cell regions are and thus the better the image accuracy is. Freedom from error can only be guaranteed in the current learning algorithm if all learning data are utilized. That is even completely independent of the amount.