The present invention relates to a recognition system for pattern recognition and classification and is particularly, though not necessarily, concerned with a self- organising artificial neural network capable of unsupervised learning.
A neural network is a network of interconnected processing elements in which information is stored by setting different interconnection strengths or weights. Each element of the network provides an output which is a function of its weighted inputs and the network learns categories for example by adjusting the interconnection weights in response to a series of training patterns. It is known to make use of such artificial neural networks which have been trained to classify input data according to the stored categories which have been determined by the training.
One particular training arrangement is known as competitive learning. On presentation of each input pattern of a training set each existing category competes to represent that input. The one or more categories that best represent the input by virtue of being most similar to the input are identified and then modified such that the input is better represented by those modified categories [Rosenblatt, F. (1962) xe2x80x9cPrinciples of Neurodynamicsxe2x80x9d New York: Spartan]. The amount of modification is known as the training rate.
A problem with this approach is that the categories identified at the end of the training are dependent on the initial internal representations of each potential category. Poor selection of the initial internal representations results during subsequent use of the system in some categories (which will correspond to physical or effective resources within the system) being over- or under-used and possibly not used at all. Several techniques have been proposed to circumvent this problem, in each case with an associated cost:
(i) The initial category representations are pre-set with xe2x80x98representative examplesxe2x80x99 of the training data. This ensures that the categories defined are in the appropriate user domain, but requires detailed knowledge on the part of the system user of the range and distribution of data within the domain. The technique assumes that the data is available prior to training. Neither the requirement nor the assumption is realistic for many real problems of interest.
(ii) An alternative approach [Rumelhart, D. E. and Zipser, D. (1985) xe2x80x9cFeature Discovery by Competitive Learningxe2x80x9d Cognitive Science 9 pp. 75-112] involves the updating of all category representations following the presentation of each training data input. Each category representation is updated according to its win/lose state such that the loser or losers of a pattern presented to the network have their category representations modified using a fraction of the training rate used to modify the winning category or categories. This technique has the advantage that all categories will be utilised. However, to prevent instability and associated loss of learning, the rate at which categories are modified must be kept very low. This results in very long training times which are unacceptable for many practical applications.
(iii) The competitive units may be arranged such that a topological mapping exists between categories [Kohonen, T. (1989) xe2x80x9cSelf Organisation and Associative Memory [3rd edition]xe2x80x9d Berlin: Springer-Verlag]. Groups of categories are then modified on presentation of each training input.
Typically a winning category will have a radius of influence determining the group of categories to be updated. This radius of influence decreases as training progresses. The rate at which the radius of influence should decrease is problem dependent and long training times are required.
(iv) As an alternative to separate initialisation of each category, systems have been reported [Hecht-Nielsen, R. (1987) xe2x80x9cCounterpropagation Networksxe2x80x9d Applied Optics 26 pp.4979-4984] which initialise each category representation to the same representative pattern V. As training proceeds each input pattern denoted X on which training is based is modified to take the form [xcex1.X+(1xe2x88x92xcex1).V] where xcex1 is a control parameter which is zero initially but which tends to 1 as training proceeds. This technique is claimed to allow the category representations to adjust to cover the complete range of input patterns. However, the adjustment is data dependent (both in data distribution and in order of presentation) and is also slow. The use of all available categories is not guaranteed.
(v) Investigations have been carried out [De Sieno, (1988) xe2x80x9cAdding a Conscience to Competitive Learningxe2x80x9d Proc. IEEE Int.Conf. on Neural Networks San Diego, I pp.117-124] into the use of a bias term to implement dynamic thresholds for each category such that under-utilised categories may learn more easily. However the bias is difficult to control and instability usually results. As a consequence the training rate must be kept low and long training times are unavoidable. Additionally the rate at which the bias must be varied is highly data dependent making practical implementations difficult.
(vi) Noise can be added to the input patterns, decreasing in magnitude as training proceeds. It is further possible [G.J. Hueter (1988) xe2x80x9cSolution to the Travelling Salesman Problem with an Adaptive Ringxe2x80x9d Proc.IEEE Int.Conf.on Neural Networks San Diego, I pp.85-92] to structure the noise so that all categories will be used during training. To achieve a reasonable distribution of categories the training rate must be kept very small and long training times are required.
Another known problem is that there are two conflicting requirements during training: namely plasticity (the ability to learn new patterns) and stability (the ability to retain responses to previously learnt patterns). This gives rise to what has been described as the Stability-Plasticity dilemma. If the training rate is high, allowing large modifications in the categories formed, then new patterns will rapidly be learnt but at the expense of previously formed categories; i.e the system will be unstable. With a very low training rate, whilst the categories are stable with consistent responses to similar data, it is necessary to present a new pattern many times before the system adapts to recognise it. It is general practice to arrange for the training rate to be high in the early stages of training and to have it tend towards zero as training proceeds, so guaranteeing stability. In order to achieve the desired combination of appropriately categorised data with stability the training data must be carefully ordered so that the evolution of categories proceeds in an orderly fashion.
To deal with the Stability-Plasticity dilemma xe2x80x98Memory-Based Learningxe2x80x99 has been proposed as an addition to competitive learning and entails the explicit storage of training patterns to form new categories (whilst also maintaining established categories). A popular version of this approach is Adaptive Resonance Theory (ART) [Carpenter, G.A. and Grossberg, S. (1987) xe2x80x9cA Memory Parallel Architecture for a Self-Organising Neural Pattern Recognition Machinexe2x80x9d Computer Vision, Graphics and Image Processing, 37 pp. 54-115]. ART assumes the existence of an unlimited number of available categories, each initially empty. The first pattern presented is stored explicitly as the representation for the first category. Subsequent training patterns generate measures of similarity with the category or categories already formed and if sufficiently similar to one of the existing categories will be used to modify that category. If not sufficiently similar to an existing category the training pattern is used to initialise a new category. To ensure stability a category once formed may only slowly be modified. Whilst training is fast and stable under this regime it is for the user to determine the criteria which decide whether new categories should be established, a decision which is not always obvious even with a good understanding of the application domain. Furthermore, the approach is susceptible to noise, is not guaranteed to well represent the training set and makes prediction of the functionality of the final system problematic since the final number of categories is not known.
A group of competitive units employing competitive learning may be arranged so that each unit receives the same input and each unit competes with all others to categorize the same data. Such a group is defined as a competitive neighbourhood. Any number of competitive neighbourhoods may be combined into a single layer where each neighbourhood may receive the same, overlapping or different input patterns but cannot receive as an input the output of any competitive unit within the layer.
For many simple applications a single layer of competitive units is sufficient. In such simple cases, the problems of the need for knowledge of potential categories, of instability and of long training times can be overcome. However there is a limit to the information that a single layer can handle and as the size and complexity of categorization tasks increases, so additional layers are required. In multi-layer systems each layer may receive as an input the output from another layer. As most competitive neighbourhoods receive inputs not from the outside world but rather from other competitive neighbourhoods from other layers, it is not practical for a user to pre-set category representations for these layers as good categorizations are extremely difficult to determine.
Stability is critical when training a multi-layered network. If categorizations within the initial layers constantly change, higher layers of the network will not have consistent inputs on which to base categorization. Without careful selection of the training rate and/or achievement of stability within each layer, the information conveyed will be of little value. Yet during training a network must be capable of learning and responding to pattern classes presented independently of their order of presentation, i.e. to deal with the stability-plasticity dilemma. For multi-layered training, the ART approach to stability cannot be practical as the number of categories formed is dynamic.
In practice multi-layer competitive networks are implemented and trained a layer at a time. Such networks require very careful design and significant effort on the part of the designer in determining suitable data with which to train each layer. Once each layer has been trained, further training is not possible as it will invalidate the inputs to subsequent layers, preventing further training to improve performance and/or learn new classes.
It is an object of the present invention to provide a recognition system in which, independently of an input data set, a competitive learning process categorizes the input data such that the distribution of that data is reflected by the categories formed with each category equally utilized so that the information stored by the network is maximized.
According to a first aspect of the present invention there is provided a system for classifying data vectors into at least one of a plurality of categories respectively defined by stored vector representations which have been determined by a training process, the system comprising a plurality of interconnected processing elements which, for implementing the training process, include:
Input means for receiving input training vectors;
Storage means for storing vector representations of the categories;
Calculating means for calculating a measure of similarity between an input vector and the stored representations for each category; and
Means for selecting and then modifying the selected stored vector representations so as to re-define the categories on an iterative basis during the training process;
wherein the selecting and modifying means comprises for the processing elements of each category:
means for recording a plurality of different measures of learning for the category relative to the input vectors which have been received;
means for combining said recorded measures of learning to form an activation value for the category;
means for evaluating a category strength value from said activation value and the respective measure of similarity; and
means for receiving the category strength values from the processing elements of other categories in the system and deciding, based upon comparative category strengths, whether or not the recipient category representation should be modified.
According to a second aspect of the present invention there is provided a process for training a classification system wherein each of the input vectors in a sequence of input training vectors is classified into at least one of a plurality of categories, respectively defined by stored vector representations, by comparing each input vector with the plurality of stored vector representations and selectively re-defining the categories by modifying at least some of the stored vector representations after the comparison and prior to comparison of the next input vector in the sequence,
wherein in order to effect said modification of stored vector representation, the system
(a) records a plurality of different measures of learning for each category relating to the input vectors which have been received;
(b) quantifies the similarity in the current comparison for each category;
(c) combines the measures from steps (a) and (b) into a strength factor for each category;
(d) correlates and compares the numerous strength factors for all the categories;
(e) identifies those categories having a strength factor which fall into a pre-determined class; and
(f) modifies the categories identified by step (e).
By virtue of the present invention the general competitive training rule is modified such that the modification of categories is influenced both by category usage and by the similarity between internal representative patterns and network training inputs. More particularly, use is made of a plurality of measures of category utilization in the training process, based on the proportion of all inputs received over a representative period which are used to modify that category. Typically two utilization measures are used which respectively monitor long and short term category learning. The long term measure records how frequently the category has previously had its stored vector representation modified whereas the short term measure records how recently the category has had its stored vector representation modified. The short term measure preferably augments non-linearly with each unused input vector. The long term utilization criterion, herein referred to as xe2x80x9cmaturityxe2x80x9d, ensures that all available categories will be used during training. The short term utilization criterion, herein referred to as xe2x80x9cpotentialxe2x80x9d, improves stability by ensuring that the product combination of maturity and potential, herein referred to as xe2x80x9cactivationxe2x80x9d, never dominates xe2x80x9csimilarityxe2x80x9d in category modification.
The predetermined class of category strengths which is used to select the categories to be modified may for example be formed by all those categories whose strengths are above the mean or average strength value, or may be formed by the single category of greatest strength value, or may be formed by any arbitrary number, e.g. 5, of categories having greatest strength values.
In the prior art systems the amount by which categories may be modified is determined by a training rate which is global to all categories so that the rate at which categories form an ideal representation varies according to the distribution of the input data, the order in which it is presented, and the initial representations within each category. To ensure stability the prior art keeps the global training rate low; hence if large changes in a category representation are required, training time will be long. To overcome this problem, the global training rate may be replaced in accordance with a preferred feature of the present invention by a training rate local to each category where the value of this local training rate is based both on the maturity of that category and on the similarities between the internal representative patterns and training inputs. An under-utilized category or a category learning inputs that have a low similarity to its internal representation (as is common at the start of training) will have a high local training rate and can rapidly modify its internal representation. A well-utilized category or a category learning patterns that have a high similarity to its internal representation can have a very low local training rate so ensuring stability.
By virtue of using a local training rate in combination with a plurality of measures of category utilization it becomes possible to undertake augmentative retraining of the system without the need to repeat the previous training process. Thus only the additional training vectors require to be presented to the system in the secondary training process.
Within a multi-layered network, stability of categories used to generate inputs for other layers is critical if layers are to learn simultaneously. However plasticity must be maintained to ensure the reassignment of categories as the need arises. A further preferred feature of the present invention is to introduce a xe2x80x9csuggestive learningxe2x80x9d input to each category. In one mode of operation this suggestive learning input is selectively operable on important categories and stabilizes formation of these categories by influencing both training and activation. In another mode of operation the suggestive learning input guides the formation of new categories such that a user can suggest potential output categories that are desirable. Such guiding signals can propagate down through the layers of an appropriately implemented network to encourage recognition of desired classes of data. Further, the suggestive learning inputs can be used to identify and suppress unimportant categories, leaving them free to categorize new sets of patterns as they arise. The suggestive learning inputs can also be used to enable secondary training to proceed during classification of input data using that data as the training patterns.
A further preferred feature of the present invention, to improve resistance to noise and to further improve category stability utilising local training rates, is that each category may be represented by multiple reference patterns. Each such reference pattern has attached to it an importance value corresponding to the degree to which it can be regarded as typical of the category. When determining the match between a category and an input, the importance value of each stored reference pattern is taken into consideration. The computed degree of compatibility between an input and a particular category will then be a function of the similarity between the input and each internal reference pattern, adjusted in each case to take account of the importance value of the pattern.
A consequence of the use of multiple reference patterns is that infrequent or currently non-typical inputs can be used to modify only internal reference patterns of low importance value, with little effect on the overall category response. If however such inputs continue to modify an internal reference pattern of low importance value, for instance by becoming more frequent, then the importance value of the internal reference pattern will rise, possibly allowing it ultimately to dominate the overall category. All of the multiple reference patterns may have the same local training rate for that category but it is preferred to allow each internal reference pattern to have a local training rate based both on its own importance value and on the maturity of the category as a whole. Thus a high importance value internal reference pattern may be made resistant to modification, keeping the category stable, though during training the importance values of internal reference patterns may shift such that there are always low importance value reference patterns that can respond to inputs that do not match the existing category.
It will be apparent that the inputs and the internal representations may be in either analogue or digital form, presented either serially or in parallel. Input data may represent, for example, samples of the intensity, or other properties, of audio sequences, measures of intensity, colour, or other properties of a video image, or signals generated by an industrial control process.
Embodiments of the present invention will be described which either overcome or obviate the disadvantages inherent in prior art pattern recognition systems of the neural network type. In particular, classification categories may be learnt rapidly and with only a relatively small training set. Further embodiments are also capable of stably retaining previously learnt categories which are known to be of value to a user whilst remaining sufficiently plastic to learn new categories. For ease of understanding the embodiments are based on two distinct processing elements herein referred to as the master and slave components. Each category within a neighbourhood is represented by a master-slave pair. The master, by using activation, is responsible for category identification and for initiating training. The slave is responsible for the storage and modification of internal reference patterns. The master-slave pair form part of an artificial neural network.
For a better understanding of the present invention and in order to show how the same may be carried into effect, reference will now be made by way of example to the accompanying drawings in which: