The present invention relates to neural networks, and to banknote authentication systems using such networks.
Automatic machines which accept banknotes are coming into increasing use. These machines recognize banknotes fed to them; that is, they identify the design or value of the banknotes. It is extremely important for such machines to authenticate the banknotes; that is, to distinguish between real and counterfeit notes. In general, authentication is more difficult than recognition, since the different designs or values are deliberately designed to be readily distinguished, while forgeries are deliberately intended to be indistinguishable from genuine banknotes.
The mechanical techniques used for coin authentication are generally inapplicable to banknote authentication, for which different techniques, primarily optical, have therefore been developed. These techniques generally look at a number of features of the note being inspected, and produce a set of signals which are then matched against a standard set.
All notes start off in good condition, when they are first issued. As they circulate in use, they will tend to become worn in various ways; for example, they can be creased, their corners can become dog-eared, they can be written on and they can become dirty and stained in various ways. The features which are used by the techniques for note authentication will therefore tend to vary slightly from the ideal values. The authentication techniques should therefore incorporate a moderate degree of tolerance, otherwise the rejection rate for valid notes will be too high and customer dissatisfaction will become unacceptable. On the other hand, it is clearly extremely important that the authentication techniques should detect and reject forgeries with a high degree of reliability.
Banknotes are not designed primarily for use with automatic identification techniques. The features which are used for identification by such techniques therefore have to be chosen on an empirical basis. This means that there is generally no simple algorithm by which these features can be combined to determine whether or not a note is valid. In these circumstances, one suitable technique for determining whether or not a note is valid is to use some form of neural network.
Essentially, a neural network is a network of cells or nodes, arranged in a number of layers. The nodes of each layer are fed from the nodes of the previous layer, with the nodes of the first layer being fed with the raw input signals. In each layer, all the nodes perform broadly the same function on their input signals, but the function may be subject to variation in response to various parameters, and there is often a unique set of input signals to each node. The parameters may be different for the different nodes, and may be adjustable in various possible ways to "train" the network.
A probabilistic neural network (PNN) is disclosed in articles by Donald F Specht. The theory underlying the PNN network is based on Bayes probability theory and decision strategy, hence the term "probabilistic"; the network itself is deterministic. The above-mentioned articles are:
"Probabilistic Neural Networks", Donald F. Specht, Neural Networks, Vol 3, 1990, pp 109-118; and PA1 "Probabilistic Neural Networks and the Polynomial Adaline as Complementary Techniques for Classification", Donald F. Specht, IEE Transactions on Neural Networks, Vol 1, No. 1, March 1990, pp 111-121.
For present purposes, a PNN network, as described by Specht, can be summarized as follows. This PNN network includes first, second and third layers. The first layer consists merely of source signal distributors; each node in this layer is fed with a different input signal, and merely passes that signal on to all the nodes in the second layer. The second layer consists of pattern nodes; these are divided into groups, one group for each category or class into which the system classifies the patterns. Each pattern node performs a weighted summation of the input signals and generates an exponential function of the weighted sum. The third layer consists of summation nodes; each summation node is fed with the outputs of a different group of pattern nodes, and simply sums those outputs.
The outputs of the third layer are a set of signals, one signal from each summation node, each of which can be regarded as the probability that the set of input signals belongs to the class for that summation node. These signals will generally be subjected to further processing, in a fourth layer. The simplest form of this fourth layer merely determines and selects the largest of these signals, but more elaborate arrangements, such as selecting the largest signal only if that exceeds the next largest signal by some suitable margin, may also be used.
It should be noted that the Specht output layer is slightly different from this. In the basic Specht circuit, the final layer consists of a single output node fed from two summation nodes and forming a weighted sum of its two inputs (one weight being negative), and generates a 0 or a 1 depending on the sign of the weighted sum. This PNN circuit makes a single binary decision, whether or not the input pattern belongs to a particular type. Specht extends this to include additional pairs of sum nodes, each pair with its output node; the sum nodes of all pairs are fed from the same pattern nodes (in different combinations, of course). Each of these output nodes thus determines whether the input signal belongs to a particular type, independent of the types defined by the other output nodes.
The pattern node layer may be regarded as divided into two sublayers, a weighted sum sublayer and an exponentiation sublayer. The PNN then consists of four or five layers, which can conveniently be termed the input layer, the exemplar (or weighted sum) layer, the Parzen (or exponentiation)layer, the sum (or class) layer, and (if present) the output layer. The Parzen layer is formed of a plurality of Parzen nodes. By a Parzen node herein is meant a node which has a single input and a single output and which effects a non-linear transformation on an input value applied on the input, such that the node provides a maximum value on the output when its input value is zero, the output decreasing monotonically with increasing input. An example of a suitable non-linear transformation is an exponential function, as will be explained in more detail hereinafter.
The critical feature of the PNN network is the pattern node layer, ie the exemplar and Parzen layers. The exemplar layer can be described in terms of vectors; if the set of input signals, is regarded as an input vector and the set of weights is regarded as a weight vector, each node in the exemplar layer forms the dot product of these two vectors. As will be seen later, the weights vector can also be termed an exemplar vector. If, as is convenient, the vectors are both taken as column vectors, then the transpose of the first must be taken to obtain the dot product. In the Parzen layer, each node forms an exponential function of the output of the corresponding node in the exemplar layer.
The exponentiation function of the Parzen layer is known as a Parzen kernel or window, and also as a Parzen or activation function. This is formulated in such a way that the input signal is a measure of the similarity of the input and exemplar vectors, and decreases from a maximum as the dissimilarity increases, so that the output of the exponentiation node decreases as the dissimilarity increases. The Specht articles noted above give several possible Parzen functions.
A neural network must of course have its parameters set appropriately so that it will recognize the desired patterns. This is often referred to as "training" the network. In the PNN network, there are adjustable parameters in the exemplar, exponentiation, and sum (class) layers. In some types of neural network, training involves applying suitable training inputs and adjusting the parameters in dependence on the resulting network outputs; it should be noted that with the possible exception of the class layer, the parameters of the PNN network are set without reference to the outputs.
Neural networks are sometimes described in analog terms; the signals are then regarded as continuously variable, and the nodes are described in terms of devices which add, multiply, and so on. It will however be realized that neural networks can be implemented by digital technology, with the variables being represented as multi-bit numbers and being manipulated by digital adders, multipliers, etc.
The PNN network is designed to assign an unknown input vector to one of a set of classes, and each class is defined by means of a set of "ideal" vectors or exemplars (ie exemplar vectors). There are preferably at least several exemplars for each class.
If applied to banknote identification, there will be a separate class for each denomination of note, and for each different design of note with the same denomination. It may also be convenient to regard each different denomination as consisting of four distinct designs, corresponding to the four orientations in which a note may be inserted into a note accepting machine. The exemplars for a given class will, subject to possible normalization, consist of the vectors obtained from notes of the same denomination and design with different kinds and degrees of wear and dirtiness.
In the exemplar layer, each node is adjusted to recognize a respective exemplar, and its parameters are set in dependence only on the exemplar which it is to recognize; its parameters are independent of any other patterns (for the same or different classes) which the network is to recognize.
If the number of inputs to the network is n, then that is the number of inputs to each exemplar node, and that is also the number of weights in each exemplar node. In other words, the input and weights vectors each have n elements. The choice of the components of the weights vector for each exemplar node is extremely simple; for each node, the weights vector is set to be the same as an exemplar, ie the input vector for the "ideal" note which that node is to recognize. The exemplars can therefore be regarded as a training set of vectors. Each Parzen node can implement the function z=exp ((y-1)/s.sup.2), where y is the input signal to the node, z is the output of the node, and s.sup.2 (or s) is the parameter of the node.
If we assume that the exemplar and the input vector are both normalized to unit length, then 2(1-y)=(W-X).sup.2 where W is the exemplar and X is the input vector. That means that the operand of the Parzen node (ie y-1) is the negative of the square of the distance between the ends of the exemplar and the input vector. The output of the exemplar node, y, is at its maximum, 1, if the input vector matches the exemplar exactly; it decreases as the end of the input vector moves away from the end of the exemplar, at an increasing rate as the distance increases.
The Parzen node forms the exponential of 1-y, which is simply half the square of the distance between the ends of the exemplar and the input vector. The exponential is in fact of -(1-y), and the negative sign means that the output of the Parzen node is at a maximum when the input vector coincides with the exemplar, and decreases as the input vector moves away from the exemplar over the surface of a hypersphere, ie an n-dimensional sphere. The Parzen node output can therefore be regarded as a bell-shaped function (the Gaussian function) projecting from the surface of the hypersphere, with the surface of the hypersphere being the zero or reference surface.
There are typically several exemplars for a given class of pattern, forming a cluster. The ends of these vectors may be arranged roughly symmetrically, but are more likely to form a somewhat irregular shape on the surface of the hypersphere, and can be split into two or more distinct and separate subclusters. For each of these exemplars, the corresponding Parzen node therefore produces a function which has its peak at the end of the exemplar and decreases symmetrically around that peak. The outputs of the Parzen nodes for all the exemplars of a class are summed by a summing node.
The parameter s is a smoothing parameter, which determines the "spread" of the Parzen node output, ie how fast it falls as the angle between the input vector and the exemplar increases. This parameter is preferably chosen so that the output of the summing node for the pattern, ie the sum of the Parzen node outputs for the cluster, is reasonably smooth and flat over the cluster, but falls off reasonably fast beyond the boundary of the cluster.
If the smoothing parameter s is too small, the cluster will tend to break up into separate peaks, with the sum of the Parzen node outputs being small between the peaks; in that case, a pattern which is in the interior of the cluster but is not close to any individual exemplar will produce a small output sum which may not be sufficient to identify the input as belonging to that cluster, ie in that class. If the smoothing parameter is too large, then the sum of the Parzen node outputs will only fall off gradually as the distance from the cluster increases, and input vectors which are a considerable distance from the cluster will be identified as belonging to that cluster (class).
With banknote identification, it is important to detect forged banknotes, as discussed above. This requirement poses a particular difficulty if a PNN network is used, because for the PNN network to detect forged notes, a class could be assigned to the forged notes and a set of exemplars provided to define that class. Alternatively, it may be more convenient to assign several classes to different forms of forgery.
The basic problem is that forged notes are not readily available, which makes it difficult to provide a set of exemplars. Even if a particular type of forgery becomes known, so that a set of exemplars for it can be incorporated in the network, that would only cope with that particular known type of forgery. If another type of forgery became current, then the network would not be able to recognize it. So the network would require updating each time a new type of forgery became known, and it would never be able to cope with new types of forgeries. A technique for defining an "unclassified" or null class is therefore desirable. Note that the classes which the network is designed to recognize are designated the design classes, to distinguish them from the null class.
This null class component density can be thought of as representing the expectation of encountering an input vector in the null class. This null class component density will be flat if the actual distribution of input vectors in the null class is either unknown or irrelevant; but the expectation can be made to depend on the position in the null domain by using a non-uniform density.