The present invention relates to neural network systems and more particularly to circuits and a method for shaping the influence fields of neurons. These circuits that are placed in front of any conventional neural network based upon a mapping of the input space significantly increases the number of permitted influence field shapes, giving thereby a considerable flexibility to the design of neural networks. The present invention also encompasses the neural networks that result from that combination that are well adapted for classification and identification purposes. In particular, the neural networks of the present invention find extensive applications in the field of image recognition/processing.
Artificial neural networks mimic biological nervous systems to solve problems which are difficult to modelize. Neural networks are able to learn from examples and in that sense, they are said to be xe2x80x9cadaptivexe2x80x9d. Depending on the construction of neural networks, the learning phase can be supervised or not. Another important property of neural networks is their ability to show some tolerance for the imprecision and uncertainty which naturally exist in real-world problems to achieve tractability and robustness. In essence, these properties result from their massively parallel arrangement in which input data are distributed. In a typical representation, the hardware implementation of neural networks generally consists of elementary processing units, called neurons, that are connected in parallel and are driven via weighted links, called synapses. There are a great number of application fields for neural networks. They are extensively used in process control, image recognition/processing, time series prediction, optimization, and the like.
Since the forty""s, a lot of different types of neural networks have been developed and described in the literature. The key distinction factors are the way they learn, their generalization capability, their response time and their robustness, i.e. their tolerance to noise and faults. Among the different types of conventional neural networks, those based upon a mapping of the input space appear to be the most promising to date. This type of networks is particularly interesting because of its simplicity of use and its short response time. Several hardware implementations of algorithms exploiting this approach are available in the industry.
Neural networks that are based upon a mapping of the input space require some kind of classification. According to the so-called xe2x80x9cRegion Of Influence (ROI)xe2x80x9d technique, for each area of this input space, one category (or several categories) is (are) associated thereto to allow classification of new inputs. Another conventional classification technique is the so-called K-Nearest Neighbors (KNN). These classification techniques as well as others are widely described in literature.
According to the ROI technique which is the classification technique most commonly used to date, each area is determined in an n-dimensional space by a center and an influence field, which is nothing other than a threshold, thereby defining a hypersphere. For instance, in a three-dimensional space, the influence field can be represented by a sphere. The definition xe2x80x9cinfluence fieldxe2x80x9d is also referred to as the xe2x80x9cdecision surfacexe2x80x9d, the xe2x80x9cregion of influencexe2x80x9d or the xe2x80x9cidentification areaxe2x80x9d in technical literature, all these terms may be considered as equivalent, at least in some respects. RCE (RCE stands for Restricted Coulomb Energy) neural networks belong to this class of neural networks based upon a mapping of the input space.
In the KNN technique, each area is determined in a n-dimensional space by a prototype which defines a Voronoxc3xaf domain. All the points enclosed in this domain have this prototype as the closest neighbor.
Let us consider for sake of simplicity, ROI neural networks. Basically, the ROI neural network is comprised of three interconnected layers: an input layer, an internal (or hidden) layer and an output layer. The internal layer consists of a plurality of nodes or neurons. Each node is a processing unit which computes the distance between the input data applied to the input layer and a weight stored therein. Then, this distance is compared with a threshold. In this particular implementation, the input layer simply consists of a plurality of input terminals. The role of the output layer is to provide the category (or categories) to the user corresponding to the class of the input data. FIG. 1 schematically summarizes the above description of a conventional ROI neural network referenced 10.
Now turning to FIG. 1, the input data is applied to the input layer 11 of ROI network 10. The input data is represented by a vector A whose components are labelled A1 to An. Each component is applied to an input terminal, then to each of the m neurons of the internal layer 12. Each neuron memorizes a prototype, labelled P1 to Pm representing an n component vector which corresponds to the above mentioned weight. The components of prototype vector P1 are labelled P1,1, . . . , P1,n. The output layer 13 is designed to provide the categories. In FIG. 1, only three categories A, B and C have been represented. In the implementation of FIG. 1, the output layer 12 consists of three output terminals, that can be materialized by LEDs. The components of the prototype vector are dynamically established during the learning phase.
Therefore, the ROI algorithm consists in mapping out a n-dimensional space by prototypes to which are assigned a category and a threshold. The role of this threshold is to activate or not the associated neuron. On the other hand, the aim of the learning phase is to map out the input space so that parts of this space belong to one or several categories. This is performed by associating a category to each prototype and computing the influence field for each neuron in the ROI approach. The influence field is a way of defining a subspace demarcated by the threshold in which the neuron is active. When a new prototype is memorized, thresholds of neighbor neurons can be adjusted to reduce any category conflict between the influence fields of these neurons.
The classification of input data is the essential task of neural networks based upon a mapping of the input space. It first consists of computing distances between the input data and prototypes stored in the different neurons of the neural network. The distances are compared with the associated thresholds, and zero, one or several categories are assigned to the input data in the ROI technique. In the KNN approach mentioned above, the output layer is used to determine the k shortest distances.
To compute a distance between the input vector A (components A1, . . . , Ai, . . . , An) and the memorized prototype vector Pj (components Pj,1, . . . , Pj,i, . . . , Pj,n) in the n-dimensional space, several kinds of norms can be used. An easy way is to determine the Euclidian distance, also referred to as the L2 norm, i.e. DistjE2=xcexa3A (Aixe2x88x92Pj,i)2 that will produce a hyperspherical influence field. However, this approach is not currently used because it is difficult to efficiently implement with dedicated circuits (squaring circuits consume too much room). The most extensively used norms to date are the so-called xe2x80x9cL1xe2x80x9d (or xe2x80x9cManhattanxe2x80x9d) norm and the xe2x80x9cLsupxe2x80x9d norm. According to norm L1, the distance is given by: Distj=xcexa3 |Aixe2x88x92Pj,i| while according to Lsup, the distance is given by: Distj=Max|Aixe2x88x92Pj,i|. In a two-dimensional space, the iso-distances are represented by lozenges with the L1 norm, and by squares with the Lsup norm. The following explanation will give more details.
The respective shapes of the influence field regions that are obtained when the distance between an input vector A (components A1 and A2) and a prototype vector P1 (components P1,1 and P1,2) after computation with norms L1 and Lsup in a two-dimensional space are shown in FIGS. 2A and 2B where they are referenced 14 and 15 respectively.
Now, FIG. 3 shows a simple example of mapping input vectors in a two-dimensional space in accordance with the neural network 10 in FIG. 1, when using six neurons and norm L1. In this mapping 16, prototype P1 (components P1,1 and P1,2) referenced 17 which belongs to category B, is emphasized. Prototype 17 is fully characterized by its influence field 18 (which determines the threshold) and its center 19.
FIG. 4 shows the classic distance vs input data (vector) diagram for prototype 17 of FIG. 3. in a one-dimensional space (i.e. along the horizontal dotted line visible in FIG. 3), which illustrates the typical V shape of the transmission characteristics. A similar diagram would be obtained, should the Lsup norm be used.
Typical implementations of ROI and KNN neural networks based upon a mapping of the input space can be found in a family of silicon chips developed and commercialized by the IBM Corporation under the reference ZISC036 (ZISC is a registered trade mark of IBM Corp.). The reader may refer to U.S. Pat. No. 5,621,863 to Boulet et al, assigned to the assignee of the present invention, for more details on this introductory part. To each neuron is associated a RAM memory that is adapted to store 64 components of 8 bits each. The distances are computed using either one of the two norms L1 or Lsup at user""s will.
Unfortunately, even with the innovative neuron architectures described in Boulet et al, it is impossible to encode complex images such as the picture shown in FIG. 5A with a few neurons. As a matter of fact, FIG. 5B shows that at least 14 neurons are necessary to satisfactorily cover the picture. The reason is that the distance evaluator circuit does not allow that a distance could be constant in a defined range of value, limiting the permitted shapes to the lozenge and the square mentioned above.
U.S. Pat. No. 5,166,539 to Uchimura et al describes an attempt to minimize in some respect this limitation, although this reference appears to be more particularly concerned with the reduction of the chip area and its electric consumption when integrated in silicon using a VLSI technology. The improved neuron described in FIG. 3 of Uchimura et al is now shown in FIG. 6 of the instant application. Now turning to FIG. 6, the neuron referenced 20 includes: n input terminals 21-1 to 21-n (where n is an integer greater than 1) receiving n input signals X1 to Xn (corresponding to the components of an input vector X). To each input terminal, e.g. 21-1, is associated a link, e.g. 22-1, that is comprised of two branches, e.g. 22-11 and 22-12. In each branch of the link, a subtraction circuit having a determined threshold value (or weight) associated thereto is connected in series with a rectification circuit. For instance, in branch 22-11, subtraction circuit 23-11 (weight: WL1) is connected in series with rectification circuit 24-11. On the other hand, in branch 22-12, subtraction circuit 23-12 (weight WH1) is connected in series with rectification circuit 24-12. All the signals output by the rectification circuits of FIG. 6 are summed in addition circuit 25. The presence of this addition circuit 25 reveals that the distance is computed with norm L1. Finally, a threshold value circuit 26 connects the output of addition circuit 25 to an output terminal 27 where output signal Y is available.
The subtraction circuits (e.g. 23-11) determine the difference between input signals and weight coefficients, i.e. they determine the results of (input signal-WH) and (WL-input signal), where WH is the greater value of the two weight values WH and WL. Rectification circuits are used to pass only positive values. The addition circuit 25 accumulates all the results of the intermediate absolute value calculations. The threshold circuit 26, into which the accumulation results are inputted, is used to generate the output signal Y of the neuron 20 at output terminal 27.
As a matter of fact, in the neuron 20 shown in FIG. 6, the subtraction and rectification circuits could be viewed as distance evaluator circuits using partial absolute value. FIG. 7 is comprised of FIGS. 7A and 7B which show respectively the distance vs input diagram corresponding to the transmission characteristics of links 24-11 and 24-12 just before the signals are applied to the addition circuit 25. FIG. 8 shows the transmission characteristic which results after summing these transmission characteristics at the output of addition circuit 25. In other words, the distance vs input diagram shown in FIG. 8 corresponds to the summation of the signals appearing on links 22-11 and 22-12 in addition circuit 25 and thus represents the distance evaluation (made according to norm L1) between the input signal (e.g. X1) and a prototype (e.g. P1) memorized in the neuron 20 according to the two associated weights (e.g. WH1 and WL1). Note that if all the pairs of weights are equal, i.e. WHi=WLi, the distance evaluation is the same with conventional neurons using norm L1 and will produce lozenge shaped influence fields.
Finally, the influence field that is obtained by Uchimura et al in a two-dimensional space, as a result of the FIG. 8, has the general shape of a rectangle illustrated in FIG. 9. It is worthwhile to note that according to Uchimura et al teachings, a new shape, referred to as a quasi-rectangle is added to the conventional lozenge, the only shape available thus far with the L1 norm.
The technique described in Uchimura et al exhibits some drawbacks.
The first one lies in the size of memory and circuitry needed to store all the weights. A lot of applications do not need a rectangular-shaped influence field, thus the conventional lozenge shape is quite adequate. According to Uchimura et al, for every component, we have WH=WL, so that with neural network having 64 weights, only 32 components instead of 64 could be used. For such applications, it should not be necessary to store the two values when a simple absolute value circuit would be sufficient to give satisfactory results, thereby saving a significant silicon area (memory and circuitry) and power consumption.
Another significant drawback relates to the particular influence field shape that is obtained according to Uchimura et al which is limited to a quasi-rectangle (in a 2-dimensional space) having a specific orientation. However, very often, other orientations and other shapes are required, for instance to produce freely selectable relatively complex shapes, without necessitating an extremely large number of neurons. On the other hand, the FIG. 6 circuit produces isomorphic or non-isomorphic but regular shapes, which is a further limitation.
According to Uchimura et al, adding the form of a quasi-rectangle to the conventional lozenge-shaped influence field can be analyzed as providing a partial xe2x80x9cdon""t carexe2x80x9d value in the memorized prototype. By xe2x80x9cdon""t carexe2x80x9d, it should be understood that whatsoever the input data is, the distance will be constant in the range defined by weights WL and WH. However, another essential need when dealing with neural networks to solve real world problems, is the use of incomplete (noisy) and unknown input data. Let us consider a neural network that has correctly learned, so that an accurate mapping of the input space has been obtained. In the recognition phase of input data that includes noisy data, a partial xe2x80x9cdon""t carexe2x80x9d on the noisy data becomes quite mandatory. Such a partial xe2x80x9cdon""t carexe2x80x9d on the input data is not possible according to Uchimura et al.
Finally, the circuit described in Uchimura et al does not have the desired flexibility in the sense that the content of the neuron""s internal circuitry is totally dedicated to construct the quasi-rectangle shape mentioned above. As a consequence, it would be highly desirable to overcome all the above mentioned limitations or drawbacks.
It is therefore a primary object of the present invention to provide circuits and a method for shaping the influence field of neurons that offer new shapes to the circuit engineer for greater flexibility in the circuit design.
It is another object of the present invention to provide circuits and a method for shaping the influence field of neurons according to isomorphic or non-isomorphic, regular or irregular, shapes.
It is another object of the present invention to provide circuits and a method for shaping the influence field of neurons which allows the use of partial or total xe2x80x9cdon""t carexe2x80x9d values both on the memorized prototypes.
It is another object of the present invention to provide circuits and a method for shaping the influence field of neurons which allows the use of partial or total xe2x80x9cdon""t carexe2x80x9d values both on the input data.
It is another object of the present invention to provide circuits and a method for shaping the influence field of neurons which allows to group several input data into one and only data when the xe2x80x9cdon""t carexe2x80x9d values is applied on the input data to improve the global response time of the neural network incorporating such neurons.
It is another object of the present invention to provide circuits and a method for shaping the influence field of neurons which allows to group several prototypes into one and only prototype when the xe2x80x9cdon""t carexe2x80x9d values is applied on the memorized prototypes to reduce the number of neurons in the neural network.
It is still another object of the present invention to provide circuits and a method for shaping the influence field of neurons that are placed in front of any neural network based upon a mapping of the input space that does require any change therein.
It is still another further object of the present invention to provide circuits and a method for shaping the influence field of neurons that improves the classification and identification capabilities of any neural network based upon a mapping of the input space.
It is still another further object of the present invention to provide circuits and a method for shaping the influence field of neurons that improves the response time of conventional neural networks based upon a mapping of the input space by recognizing only one input data patterns which groups several ones.
The improved neural network of the present invention results from the combination of a dedicated logic block with a conventional neural network based upon a mapping of the input space usually employed to classify an input data by computing the distance between said input data and prototypes memorized therein. The improved neural network is able to classify an input data represented by a vector A even when some of its components are noisy or unknown during either the learning or the recognition phase. To that end, influence fields of various and different shapes are created for each neuron of the conventional neural network through this combination. The logic block transforms at least some of the n components (A1, . . . , An) of the input vector A into the m components (V1, . . . , Vm) of a network input vector V according to a linear or non-linear transform function F. In turn, vector V is applied as the input data to said conventional neural network. The transform function F is such that certain components of vector V are not modified, e.g. Vk=Aj, while other components are transformed as mentioned above, e.g. Vi=Fi(A1, . . . , An). In addition, one (or more) component of vector V can be used to compensate an offset that is present in the distance evaluation of vector V. Because, the logic block is placed in front of the said conventional neural network any modification thereof is avoided.
The novel features believed to be characteristic of this invention are set forth in the appended claims. The invention itself, however, as well as other objects and advantages thereof, may be best understood by reference to the following detailed description of an illustrated preferred embodiment to be read in conjunction with the accompanying drawings.