1. Field of the Invention
The present invention relates generally to n-tuple or RAM based neural network classification systems and, more particularly, to n-tuple or RAM based classification systems having weight vectors with element values being determined during a training process.
2. Description of the Prior Art
A known way of classifying objects or patterns represented by electric signals or binary codes and, more precisely, by vectors of signals applied to the inputs of neural network classification systems lies in the implementation of a so-called learning or training phase. This phase generally consists of the configuration of a classification network that fulfils a function of performing the envisaged classification as efficiently as possible by using one or more sets of signals, called learning or training sets, where the membership of each of these signals in one of the classes in which it is desired to classify them is known. This method is known as supervised learning or learning with a teacher.
A subclass of classification networks using supervised learning are networks using memory-based learning. Here, one of the oldest memory-based networks is the xe2x80x9cn-tuple networkxe2x80x9d proposed by Bledsoe and Browning (Bledsoe, W. W. and Browning, I, 1959, xe2x80x9cPattern recognition and reading by machinexe2x80x9d, Proceedings of the Eastern Joint Computer Conference, pp. 225-232) and more recently described by Morciniec and Rohwer (Morciniec, M. and Rohwer, R.,1996, xe2x80x9cA theoretical and experimental account of n-tuple classifier performancexe2x80x9d, Neural Comp., pp. 629-642).
One of the benefits of such a memory-based system is a very fast computation time, both during the learning phase and during classification. For the known types of n-tuple networks, which is also known as xe2x80x9cRAM networksxe2x80x9d or xe2x80x9cweightless neural networksxe2x80x9d, learning may be accomplished by recording features of patterns in a random-access memory (RAM), which requires just one presentation of the training set(s) to the system.
The training procedure for a conventional RAM based neural network is described by Jxc3x8rgensen (co-inventor of this invention) et al. (Jxc3x8rgensen, T. M., Christensen, S. S. and Liisberg, C.,1995, xe2x80x9cCross-validation and information measures for RAM based neural networksxe2x80x9d, Proceedings of the Weightless Neural Network Workshop WNNW95 (Kent at Canterbury, UK) ed. D. Bisset, pp.76-81) where it is described how the RAM based neural network may be considered as comprising a number of Look Up Tables (LUTs). Each LUT may probe a subset of a binary input data vector. In the conventional scheme the bits to be used are selected at random. The sampled bit sequence is used to construct an address. This address corresponds to a specific entry (column) in the LUT. The number of rows in the LUT corresponds to the number of possible classes. For each class the output can take on the values 0 or 1. A value of 1 corresponds to a vote on that specific class. When performing a classification, an input vector is sampled, the output vectors from all LUTs are added, and subsequently a winner takes all decision is made to classify the input vector. In order to perform a simple training of the network, the output values may initially be set to 0. For each example in the training set, the following steps should then be carried out:
Present the input vector and the target class to the network, for all LUTs calculate their corresponding column entries, and set the output value of the target class to 1 in all the xe2x80x9cactivexe2x80x9d columns.
By use of such a training strategy it may be guaranteed that each training pattern always obtains the maximum number of votes. As a result such a network makes no misclassification on the training set, but ambiguous decisions may occur. Here, the generalisation capability of the network is directly related to the number of input bits for each LUT. If a LUT samples all input bits then it will act as a pure memory device and no generalisation will be provided. As the number of input bits is reduced the generalisation is increased at an expense of an increasing number of ambiguous decisions. Furthermore, the classification and generalisation performances of a LUT are highly dependent on the actual subset of input bits probed. The purpose of an xe2x80x9cintelligentxe2x80x9d training procedure is thus to select the most appropriate subsets of input data.
Jxc3x8rgensen et al. further describes what is named a xe2x80x9ccross validation testxe2x80x9d which suggests a method for selecting an optimal number of input connections to use per LUT in order to obtain a low classification error rate with a short overall computation time. In order to perform such a cross validation test it is necessary to obtain a knowledge of the actual number of training examples that have visited or addressed the cell or element corresponding to the addressed column and class. It is therefore suggested that these numbers are stored in the LUTs. It is also suggested by Jxc3x8rgensen et al. how the LUTs in the network can be selected in a more optimum way by successively training new sets of LUTs and performing cross validation test on each LUT. Thus, it is known to have a RAM network in which the LUTs are selected by presenting the training set to the system several times.
In an article by Jxc3x8rgensen (co-inventor of this invention) (Jxc3x8rgensen. T. M. xe2x80x9cClassification of handwritten digits using a RAM neural net architecturexe2x80x9d, February 1997, International Journal of Neural Systems, Vol. 8, No. 1, pp. 17-25 it is suggested how the class recognition of a RAM based network can be further improved by extending the traditional RAM architecture to include what is named xe2x80x9cinhibitionxe2x80x9d. This method deals with the problem that in many situations two different classes might only differ in a few of their features. In such a case, an example outside the training set has a high risk of sharing most of its features with an incorrect class. So, in order to deal with this problem it becomes necessary to weight different features differently for a given class. Thus, a method is suggested where the network includes inhibition factors for some classes of the addressed columns. Here, a confidence measure is introduced, and the inhibition factors are calculated so that the confidence after inhibition corresponds to a desired level.
The result of the preferred inhibition scheme is that all addressed LUT cells or elements that would be set to 1 in the simple system are also set to 1 in the modified version, but in the modified version column cells being set to 1 may further comprise information of the number of times the cell has been visited by the training set. However, some of the cells containing 0""s in the simple system will have their contents changed to negative values in the modified network. In other words, the conventional network is extended so that inhibition from one class to another is allowed.
In order to encode negative values into the LUT cells, it is not sufficient with one bit per cell or element as with a traditional RAM network. Thus, it is preferred to use one byte per cell with values below 128 being used to represent different negative values, whereas values above 128 are used for storing information concerning the number of training examples that have visited or addressed the cell. When classifying an object the addressed cells having values greater than or equal to 1 may then be counted as having the value 1.
By using inhibition, the cells of the LUTs are given different values which might be considered a sort of xe2x80x9cweightingxe2x80x9d. However, it is only cells which have not been visited by the training set that are allowed to be suppressed by having their values changed from 0 to a negative value. There is no boosting of cells having positive values when performing classification of input data. Thus, very well performing LUTs or columns of LUTs might easily drown when accompanied by the remaining network.
Thus, there is a need for a RAM classification network which allows a very fast training or learning phase and subsequent classification, but which at the same time allows real weights to both boost and suppress cell values of LUT columns in order to obtain a proper generalisation ability of the sampled number of input bits based on access information of the training set. Such a RAM based classification system is provided according to the present invention.
According to a first aspect of the present invention there is provided a method for training a computer classification system which can be defined by a network comprising a number of n-tuples or Look Up Tables (LUTs), with each n-tuple or LUT comprising a number of rows corresponding to at least a subset of possible classes and further comprising a number of columns being addressed by signals or elements of sampled training input data examples, each column being defined by a vector having cells with values, said method comprising determining the column vector cell values based on one or more training sets of input data examples for different classes so that at least part of the cells comprise or point to information based on the number of times the corresponding cell address is sampled from one or more sets of training input examples, and determining weight cell values corresponding to one or more column vector cells being addressed or sampled by the training examples.
According to a second aspect of the present invention there is provided a method of determining weight cell values in a computer classification system which can be defined by a network comprising a number of n-tuples or Look Up Tables (LUTs), with each n-tuple or LUT comprising a number of rows corresponding to at least a subset of possible classes and further comprising a number of column vectors with at least part of said column vectors having corresponding weight vectors, each column vector being addressed by signals or elements of a sampled training input data example and each column vector and weight vector having cells with values being determined based on one or more training sets of input data examples for different classes, said method comprising determining the column vector cell values based on the training set(s) of input examples so that at least part of said values comprise or point to information based on the number of times the corresponding cell address is sampled from the set(s) of training input examples, and determining weight vector cell values corresponding to one or more column vector cells.
Preferably, the weight cell values are determined based on the information of at least part of the determined column vector cell values and by use of at least part of the training set(s) of input examples. According to the present invention the training input data examples may preferably be presented to the network as input signal vectors.
It is preferred that determination of the weight cell values is performed so as to allow weighting of one or more column vectors cells of positive value and/or to allow boosting of one or more column vector cells during a classification process. Furthermore, or alternatively, the weight cell values may be determined so as to allow suppressing of one or more column vector cells during a classification process.
The present invention also provide a method wherein the determination of the weight cell values allows weighting of one or more column vector cells having a positive value (greater than 0) and one or more column vector cells having a non-positive value (lesser than or equal to 0). Preferably, the determination of the weight cells allows weighting of any column vector cell.
In order to determine or calculate the weight cell values, the determination of these values may comprise initialising one or more sets of weight cells corresponding to at least part of the column cells, and adjusting at least part of the weight cell values based on the information of at least part of the determined column cell values and by use of at least part of the training set(s) of input examples. When determining the weight cell values it is preferred that these are arranged in weight vectors corresponding to at least part of the column vectors.
In order to determine or adjust the weight cell values according to the present invention, the column cell values should be determined. Here, it is preferred that at least part of the column cell values are determined as a function of the number of times the corresponding cell address is sampled from the set(s) of training input examples. Alternatively, the information of the column cells may be determined so that the maximum column cell value is 1, but at least part of the cells have an associated value being a function of the number of times the corresponding cell address is sampled from the training set(s) of input examples. Preferably, the column vector cell values are determined and stored in storing means before the adjustment of the weight vector cell values.
According to the present invention, a preferred way of determining the column vector cell values may comprise the training steps of
a) applying a training input data example of a known class to the classification network, thereby addressing one or more column vectors,
b) incrementing, preferably by one, the value or vote of the cells of the addressed column vector(s) corresponding to the row(s) of the known class, and
c) repeating steps (a)-(b) until all training examples have been applied to the network.
However, it should be understood that the present invention also covers embodiments where the information of the column cells is determined by alternative functions of the number of times the cell has been addressed by the input training set(s). Thus, the cell information does not need to comprise a count of all the times the cell has been addressed, but may for example comprise an indication of when the cell has been visited zero times, once, more than once, and/or twice and more than twice and so on.
So far it has been mentioned that weight cell values may be determined for one or more column cells, but in a preferred embodiment all column vectors have corresponding weight vectors.
When initialising weight cell values according to embodiments of the present invention, the initialisation may comprise setting each weight cell value to a predetermined specific cell value. These values may be different for different cells, but all weight cell values may also be set to a predetermined constant value. Such a value may be 0 or 1, but other values may be preferred.
In order to determine the weight cell values, it is preferred to adjust these values, which adjustment process may comprise one or more iteration steps. The adjustment of the weight cell values may comprise the steps of determining a global quality value based on at least part of the weight and column vector cell values, determining if the global quality value fulfils a required quality criterion, and adjusting at least part of the weight cell values until the global quality criterion is fulfilled.
The adjustment process may also include determination of a local quality value for each sampled training input example, with one or more weight cell adjustments being performed if the local quality value does not fulfil a specified or required local quality criterion for the selected input example. As an example the adjustment of the weight cell values may comprise the steps of
a) selecting an input example from the training set(s),
b) determining a local quality value corresponding to the sampled training input example, the local quality value being a function of at least part of the addressed weight and column cell values,
c) determining if the local quality value fulfils a required local quality criterion, if not, adjusting one or more of the addressed weight vector cell values if the local quality criterion is not fulfilled,
c) selecting a new input example from a predetermined number of examples of the training set(s),
e) repeating the local quality test steps (b)-(d) for all the predetermined training input examples,
f) determining a global quality value based on at least part of the weight and column vectors being addressed during the local quality test,
g) determining if the global quality value fulfils a required global quality criterion, and,
h) repeating steps (a)-(g) until the global quality criterion is fulfilled. Preferably, steps (b)-(d) of the above mentioned adjustment process may be carried out for all examples of the training set(s).
The local and/or global quality value may be defined as functions of at least part of the weight and/or column cells. Correspondingly, the global and/or the local quality criterion may also be functions of the weight and/or column cells. Thus, the quality criterion or criteria need not be a predetermined constant threshold value, but may be changed during the adjustment iteration process. However, the present invention also covers embodiments in which the quality criterion or criteria is/are given by constant threshold values.
It should be understood that when adjusting the weight cell values by use of one or more quality values each with a corresponding quality criterion, it may be preferred to stop the adjustment iteration process if a quality criterion is not fulfilled after a given number of iterations.
It should also be understood that during the adjustment process the adjusted weight cell values are preferably stored after each adjustment, and when the adjustment process includes the determination of a global quality value, the step of determination of the global quality value may further be followed by separately storing the hereby obtained weight cell values or classification system configuration values if the determined global quality value is closer to fulfil the global quality criterion than the global quality value corresponding to previously separately stored weight cell values or configuration values.
A main reason for training a classification system according to an embodiment of the present invention is to obtain a high confidence in a subsequent classification process of an input example of an unknown class.
Thus, according to a further aspect of the present invention, there is also provided a method of classifying input data examples into at least one of a plurality of classes using a computer classification system configured according to any of the above described methods of the present invention, whereby the column cell values and the corresponding weight cell values are determined for each n-tuple or LUT based on one or more training sets of input data examples, said method comprising
a) applying an input data example to be classified to the configured classification network thereby addressing column vectors and corresponding weight vectors in the set of n-tuples or LUTs,
b) selecting a class thereby addressing specific rows in the set of n-tuples or LUTs,
b) determining an output value as a function of values of addressed weight cells,
d) repeating steps (b)-(c) until an output has been determined for all classes,
d) comparing the calculated output values, and
f) selecting the class or classes having maximum output value.
When classifying an unknown input example, several functions may be used for determining the output values from the addressed weight cells. However, it is preferred that the parameters used for determining the output value includes both values of addressed weight cells and addressed column cells. Thus, as an example, the output value may be determined as a first summation of all the addressed weight cell values corresponding to column cell values greater than or equal to a predetermined value. In another preferred embodiment, the step of determining an output value comprises determining a first summation of all the addressed weight cell values corresponding to column cell values greater than or equal to a predetermined value, determining a second summation of all the addressed weight cell values, and determining the output value by dividing the first summation by the second summation. The predetermined value may preferably be set to 1.
The present invention also provides training and classification systems according to the above described methods of training and classification.
Thus, according to the present invention there is provided a system for training a computer classification system which can be defined by a network comprising a stored number of n-tuples or Look Up Tables (LUTs), with each n-tuple or LUT comprising a number of rows corresponding to at least a subset of possible classes and further comprising a number of columns being addressed by signals or elements of sampled training input data examples, each column being defined by a vector having cells with values, said system comprising input means for receiving training input data examples of known classes, means for sampling the received input data examples and addressing column vectors in the stored set of n-tuples or LUTs, means for addressing specific rows in the set of n-tuples or LUTs, said rows corresponding to a known class, storage means for storing determined n-tuples or LUTs, means for determining column vector cell values so as to comprise or point to information based on the number of times the corresponding cell address is sampled from the training set(s) of input examples, and means for determining weight cell values corresponding to one or more column vector cells being addressed or sampled by the training examples.
The present invention also provides a system for determining weight cell values of a classification network which can be defined by a stored number of n-tuples or Look Up Tables (LUTs), with each n-tuple or LUT comprising a number of rows corresponding to at least a subset of the number of possible classes and further comprising a number of column vectors with at least part of said column vectors having corresponding weight vectors, each column vector being addressed by signals or elements of a sampled training input data example and each column vector and weight vector having cell values being determined during a training process based on one or more sets of training input data examples, said system comprising: input means for receiving training input data examples of known classes, means for sampling the received input data examples and addressing column vectors and corresponding weight vectors in the stored set of n-tuples or LUTs, means for addressing specific rows in the set of n-tuples or LUTs, said rows corresponding to a known class, storage means for storing determined n-tuples or LUTs, means for determining column vector cell values so as to comprise or point to information based on the number of times the corresponding cell address is sampled from the training set(s) of input examples, and means for determining weight vector cell values corresponding to one or more column vector cells.
Here, it is preferred that the means for determining the weight cell values is adapted to determine these values based on the information of at least part of the determined column vector cell values and by use of at least part of the training set(s) of input examples.
Preferably, the means for determining the weight cell values is adapted to determine these values so as to allow weighting of one or more column cells of positive value and/or to allow boosting of one or more column cells during a classification process. The determining means may furthermore, or alternatively, be adapted to determine the weight cell values so as to allow suppressing of one or more column vector cells during a classification process.
According to an embodiment of the present invention the weight determining means may be adapted to determine the weight cell values so as to allow weighting of one or more column vector cells having a positive value (greater than 0) and one or more column vector cells having a non-positive value (lesser than or equal to 0). Preferably, the means may further be adapted to determine the weight cell values so as to allow weighting of any column cell. It is also preferred that the means for determining the weight cell values is adapted to determine these values so that the weight cell values are arranged in weight vectors corresponding to at least part of the column vectors.
In order to determine the weight cell values according to a preferred embodiment of the present invention, the means for determining the weight cell values may comprise means for initialising one or more sets of weight vectors corresponding to at least part of the column vectors, and means for adjusting weight vector cell values of at least part of the weight vectors based on the information of at least part of the determined column vector cell values and by use of at least part of the training set(s) of input examples.
As already discussed above the column cell values should be determined in order to determine the weight cell values. Here, it is preferred that the means for determining the column vector cell values is adapted to determine these values as a function of the number of times the corresponding cell address is sampled from the set(s) of training input examples. Alternatively, the means for determining the column vector cell values may be adapted to determine these cell values so that the maximum value is 1, but at least part of the cells have an associated value being a function of the number of times the corresponding cell address is sampled from the training set(s) of input examples.
According to an embodiment of the present invention it is preferred that when a training input data example belonging to a known class is applied to the classification network thereby addressing one or more column vectors, the means for determining the column vector cell values is adapted to increment the value or vote of the cells of the addressed column vector(s) corresponding to the row(s) of the known class, said value preferably being incremented by one.
In order to initialise the weight cells according to an embodiment of the invention, it is preferred that the means for initialising the weight vectors is adapted to setting the weight cell values to one or more predetermined values.
For the adjustment process of the weight cells it is preferred that the means for adjusting the weight vector cell values is adapted to determine a global quality value based on at least part of the weight and column vector cell values, determine if the global quality value fulfils a required global quality criterion, and adjust at least part of the weight cell values until the global quality criterion is fulfilled.
As an example of a preferred embodiment according to the present invention, the means for adjusting the weight vector cell values may be adapted to
a) determine a local quality value corresponding to a sampled training input example, the local quality value being a function of at least part of the addressed weight and column vector cell values,
b) determine if the local quality value fulfils a required local quality criterion,
b) adjust one or more of the addressed weight vector cell values if the local quality criterion is not fulfilled,
c) repeat the local quality test for a predetermined number of training input examples,
d) determine a global quality value based on at least part of the weight and column vectors being addressed during the local quality test,
e) determine if the global quality value fulfils a required global quality criterion, and,
f) repeat the local and the global quality test until the global quality criterion is fulfilled.
The means for adjusting the weight vector cell values may further be adapted to stop the iteration process if the global quality criterion is not fulfilled after a given number of iterations. In a preferred embodiment, the means for storing n-tuples or LUTs comprises means for storing adjusted weight cell values and separate means for storing best so far weight cell values or best so far classification system configuration values. Here, the means for adjusting the weight vector cell values may further be adapted to replace previously separately stored best so far weight cell values with obtained adjusted weight cell values if the determined global quality value is closer to fulfil the global quality criterion than the global quality value corresponding to previously separately stored best so far weight values. Thus, even if the system should not be able to fulfil the global quality criterion within a given number of iterations, the system may always comprise the xe2x80x9cbest so farxe2x80x9d system configuration.
According to a further aspect of the present invention there is also provided a system for classifying input data examples of unknown classes into at least one of a plurality of classes, said system comprising: storage means for storing a number or set of n-tuples or Look Up Tables (LUTs) with each n-tuple or LUT comprising a number of rows corresponding to at least a subset of the number of possible classes and further comprising a number of column vectors with corresponding weight vectors, each column vector being addressed by signals or elements of a sampled input data example and each column vector and weight vector having cells with values being determined during a training process based on one or more sets of training input data examples, said system further comprising: input means for receiving an input data example to be classified, means for sampling the received input data example and addressing columns and corresponding weight vectors in the stored set of n-tuples or LUTs, means for addressing specific rows in the set of n-tuples or LUTs, said rows corresponding to a specific class, means for determining an output value as a function of addressed weight cells, and means for comparing calculated output values corresponding to all classes and selecting the class or classes having maximum output value.
According to a preferred embodiment of the classification system of the present invention, the output determining means comprises means for producing a first summation of all the addressed weight vector cell values corresponding to a specific class and corresponding to column vector cell values greater than or equal to a predetermined value. It is also preferred that the output determining means further comprises means for producing a second summation of all the addressed weight vector cell values corresponding to a specific class, and means for determining the output value by dividing the first summation by the second summation.
It should be understood that it is preferred that the cell values of the column and weight vectors of the classification system according to the present invention are determined by use of a training system according to any of the above described systems. Accordingly, these cell values may be determined during a training process according to any of the above described methods.