Table IV, appended hereto, is a cross reference of symbol mnemonics, notations and convention used herein with their corresponding definitions.
In the field of Optical Character Recognition (OCR), typically, a state of the art recognition technique comprises the use of a photohead essentially consisting of a matrix of photosensitive elements. When an OCR system is in its READ mode, the elements are scanned successively to generate signals that are representative of a certain parameter of the scanned character. Subsequently, the signals that are obtained during each scan are sampled and processed in a processing unit to identify the scanned character. The identification process is generally based upon a comparison between the scanned character and a prototype character model stored in a memory.
For example, FIG. 1 illustrates the capital letter "E" embedded in a 9.times.8 matrix of pixels. A first parameter representative of letter E could consist in counting vertical and horizontal numbers of dark pixels. A feature vector F'/can be defined representing letter "E" comprised of 17 components (or entries): F'1, F'2, . . . , F'16, F'17 wherein F'1=6, F'2=1, . . . , F'16=2, and F'17=0. A category (or class) C can also be defined by the user that is associated with this feature vector F as representing the letter "E". The category C could be, for example, the letter's order number in the alphabet, which is, therefore, 5. A second parameter that could be used as well is the number of dark pixels above and below line aa shown in FIG. 1. In this simpler case, the new feature vector F only has two components F1=13 and F2=9 instead of seventeen, but it still has the same category C=5. Also, although the capital letter "E" and the lower case letter "e" are represented by two different feature vectors (even more, if these letters are printed in various type fonts). They are both considered as belonging to the same category C. Thus, a certain relation or link is established between a feature vector F and a determined category C.
If the representation of capital letter "E" shown in FIG. 1 is taken as the ideal model of this letter, then during a preliminary learning phase, the two-component feature vector F of FIG. 1 is presented as an input vector to a conventional character recognition system and its two components stored therein in a memory. As soon as the components of the input vector F have been stored and a category C associated thereto (in the present instance C=5), the stored input vector F is thereafter referred to as a prototype vector P.
In FIG. 2(A), the prototype vector P is represented by point P with its two components P1 and P2 in a two dimensional space. This two-dimensional space is usually referred to as the feature (or characteristic) space. A defined zone Z (or domain) encompasses point P representing prototype vector P that may be used as a discriminating criteria by the OCR system. The OCR system compares the degree of similarity between the prototype vector P and any input (or incoming) vector A (or pattern) representing the character presented to the OCR system during the character recognition phase. The OCR system, subsequently compares the input vector A and the stored prototype vector P in order to determine their degree of similarity. This degree of similarity may be determined in a variety of manners, classicly by distance.
In determining the distance in the two-dimensional space of FIG. 2(A), an input vector A has two components A1 and A2, for consistency with the prototype vector P described above. The distance comparison between A and P can be made, for example, by determining the Euclidian distance AP, i.e. AP.sup.2 =(P1-A1).sup.2 +(P2-A2).sup.2. Other distance calculation methods may be used, that each produce different zone shapes than a circle. In the two dimensional feature space of FIG. 2(A), the so-called Manhattan or city block distance (L1 norm) results in a lozenge shaped zone, while the square distance (Lsup norm) results in a square shaped zone.
Continuing the example of FIG. 2(A), the zone Z is represented simply by a circle centered at P with radius r. Radius r is commonly referred to as the influence field (or threshold) value of the prototype vector P. During the preliminary learning phase, the initial influence field value r is given, generally, by a default value r0 referred to as the Maximum Influence Field (MaxIF) as illustrated in FIG. 2(A). Normally, MaxIF is defined arbitrarily and empirically.
So, having stored the prototype vector E of FIG. 2(A), an input vector A, the same capital letter "E" but printed with a different type font, may be presented to the OCR system for recognition. If input vector A falls within circle Z, it is thus considered as "similar" to prototype vector P, and in turn, will be labelled with the same category C. (Prior art OCR systems assigned the category to the input vector during the recognition phase.) If, however, the input vector A falls outside the circle Z, then it is considered as "not similar" to the prototype vector P. Therefore, the category C cannot be assigned (or associated) to it by the OCR system. Instead, the input vector A is stored by the user as a new prototype vector with the same category C. Thus, the system stores input vector A as a new prototype vector P' with the category C assigned thereto, providing the extended zone (the shaded areas in FIG. 2(B)), circles Z and Z', which then define the category C.
A third input vector A, a capital letter "F", may be presented to the system and fall within circle Z of prototype vector P. However, letter "F" obviously belongs to another category. The category C of prototype vector P cannot be assigned to the third input vector A by the OCR system. As a consequence, circle Z, as originally drawn, must be shrunk to exclude this third input A. In others words, the radius r0 of the circle encompassing prototype vector P must be shortened, once the user decides that this third input vector A must be stored as a new prototype vector P" in FIG. 2C. This shrunk step is part of the so-called "reduction process" and is an essential aspect of prior art character recognition systems. After the input vector A has been stored as prototype vector P", the shortened radius of circle Z is obtained by reducing the initial radius r0=MaxIF to a value r less than or equal to distance PP". This reduced value r also constitutes the radius of circle Z" (r"=r). The actual (reduced) radius value r of prototype vector P is commonly referred to as the Actual Influence Field (AIF).
The two prototype vectors P and P" with their respective associated categories C and C" and influence fields r and r" are illustrated in FIG. 2(C). There is also a minimum value of the influence field, referred to as the Minimum Influence Field (MinIF). Under no circumstances, may the AIF of a prototype vector have a value lower than MinIF.
FIG. 2(D) shows a two dimensional feature space with three prototype vectors P, P' and P" with their respective influence fields r, r,' and r" and associated categories C, C', and C". When an input vector A is presented to the OCR system for recognition, the system calculates the distances AP=D, AP'=D' and AP"=D" and, then, determines the minimum Distance (Dmin) therefrom. If input vector A falls within one circle, e.g. circle Z (Dmin&lt;r), it is recognized by the system and the category C is associated with it. However, if input vector A does not fall into any of the circles Z, Z' and Z", the input vector is not recognized and a category is not associated with it. If the user decides that this input vector A must be stored as a new prototype vector, then the user presents the input vector again to the OCR system, this time with a category, during a subsequent learning phase. The user decides which category is assigned to the new prototype vector, i.e., whether any of categories C, C' or C", or a new category. If the user decides to assign the category of the closest prototype vector (based on the calculation of the minimum distance Dmin), then the influence field of the new stored prototype vector is a value equal to Dmin, if Dmin&lt;MaxIF or less than Dmin, i.e., MaxIF if Dmin&gt;=MaxIF. In the example of FIG. 2(D), this minimum distance Dmin corresponds to distance D=AP. Finally, if input vector A falls within an overlapping zone, i.e. a common zone between two circles (not shown), the user not only determines the category assigned to the new prototype vector, but may also reduce the two overlapping influence fields. Thus, the user insures that one prototype vector (or the two prototype vectors P' and P") is (are) excluded from subsequent recognition in the vicinity of the new prototype vector.
Although FIGS. 2(A) to 2(D) show an input vector A with two components A1 and A2, it is understood that, generally, an input (or prototype) vector has n components, where n is an integer greater than 0. Thus, components A1, A2, . . . , An are a general representation of input vector A. Therefore, in n dimensional feature space, the circle Z in FIG. 2(A) is an hypersphere. So, the computed distance is the distance separating the center of the hypersphere representing the stored prototype vector and the point representing the input vector. The MaxIF value corresponds to the largest allowed radius of a hypersphere at initialization. Similarly, the MinIF value corresponds to the smallest radius allowed for a hypersphere in the course of the reduction process. For distance calculation methods that are different than the Euclidian method mentioned above, the equidistant points are not on an hypersphere but, instead, on a polyhedral volume. However, a polyhedral volume is referred to as an hypersphere hereinafter for simplicity. Each input vector component, which represents a certain analog value, is coded in binary on m bits, and may, therefore, be represented by an m bit binary word a0 . . . am-1.
For example, referring again to the two component input vector A representing capital letter "E" of FIG. 1, the first vector component A1 is equal to 13. With m=4, A1 is then represented by the binary word consisting of a0=1, a1=1, a2=0, and a3=1, i.e. A1=1101.
Prior art computer-based character recognition systems, after being presented with an input vector, automatically can compare the input vector with previously learned prototype vectors of the feature space to determine the input vector's category or class. Such a system has been implemented on Von Neuman processor based computers using neural algorithms (software emulation). These neural algorithms attempt to emulate neurons such as those found in the brain, for improved pattern recognition. However, because in prior art neural networks, the calculation process is sequential in accordance with the instructions of a software program, the processing time is long.
A biological neural network utilizes nerve cells or synapses as the units thereof. A biological neural network has an extremely high number of these interconnected synapses. The synapses in the network execute calculations in a parallel, so that the overall processing time is very short. Furthermore, the functions of biological neural networks are learned by changing the behavior of synapses and connection states therebetween during learning. Neural computers use neural networks constructed by assembling a limited number of electronic neuron circuits to mimic the nervous systems of living bodies.
Neural computers are capable of pattern processing, useful for operations such as character recognition, voice recognition, process optimization, robot control and the like. Neural computers are most suited to realizing functions with processing procedures that are difficult to state as formal rules. When such neural computers are taught, i.e., operated while conducting learning, even if the taught functions change over time, the neural computer is capable of adapting for such changes.
In addition, neural computers are inherently reliable because neural networks in such neural computers are constructed by interconnecting identical base neuron circuits, so that a failure in one neuron is easily repaired. The failed neuron circuit is simply replaced with another, normally functioning neuron in the neural network. As a result, it is possible to create neural networks with a near immunity to defective neurons or neuron failures. This immunity is very important for VLSI semiconductor chips.
Different neural network architectures, such as the standard Radial Basic Function (RBF) technique are known. The RBF technique is described in the article "A high performance adaptive classifier using radial basis functions" by M. Holler, et al, Microcircuit Applications Conference Nov. 9-12, 1992, Las Vegas, NV. An RBF neural network has a three layer structure. The first layer, which includes the input terminals, is called the input layer or input neuron layer. The second or hidden layer is formed by the neuron circuits themselves. The third layer or neuron output layer receives the second layer neuron circuits' outputs as inputs. Each neuron circuit has weight coefficients (known as synaptic weights) that are related to the components of the neuron's stored prototype vector. Input signals on the input terminals of the first layer are applied in parallel to all the neuron circuits of the second layer for processing. Recognition processing, as described hereinbefore, includes determining the distances between the input vector and all of the prototype vectors of the neural network so that certain neuron circuits react if there is a match (fire) or do not fire if there is no match. Each neuron circuit of the second layer generates a signal that is an input to only one output neuron of a determined category.
FIG. 3(A) shows such a conventional three layer neural network 2 comprised of ten RBF type neuron circuits N1 to N10. The first layer consists of two input neurons I1 and I2 adapted to receive an input vector A comprised of two components A1 and A2. This first layer totally interconnects with each second layer neuron circuit N1 to N10. Each second layer neuron circuit N1 to N10 can be potentially related to only one third layer output neuron O1, O2 or O3. During the learning phase, the prototype vectors are stored in the second layer neuron circuits N1 to N10 (one prototype vector stored per neuron circuit) in a R/W memory usually referred to as the weight memory. Prior to the learning phase, the weight memories are initialized with random weights and the neuron circuits are "free". As soon as a prototype vector is stored in a second layer neuron circuit N1 and N10 and a connection is established between that second layer neuron circuit and a third layer output neuron, i.e. a determined category has been assigned to that prototype vector, this second layer neuron circuit having thus "learned" is designated "engaged" and is no longer considered free. For example, neuron circuits N2, N5 and N8 (which are associated to the same category C2 through single output neuron 02) are engaged. Similarly, other neuron circuits are associated with categories C1 and C3. Neuron circuit N10 is still free. No category has been associated with N10 because it has not learned. The feature space depicted in FIG. 3(B) represents that of the neural network 2 of FIG. 3(A) (only free neuron circuit N10 is not illustrated). The nine circles illustrate the influence fields of the nine prototype vectors stored in neuron circuits N1 to N9. They are organized in three groups of 2, 3 and 4 neuron circuits, respectively, pertaining to categories C1, C2 and C3.
As indicated above, the value of the influence field of a determined neuron circuit may be reduced in the reduction process during a learning phase. However, under no circumstances is the influence field value allowed to reach a value equal to or less than the MinIF value. Should the influence field value fall below MinIF during the reduction process, the neuron circuit is said to be "degenerated". So, in a neural network, every neuron circuit is either free or engaged. In addition, the actual influence fields associated to the prototype vectors of a same category may be different. A determined category may be represented by one or by several prototype vectors, that may or may not be adjacent, and may or may not overlap. Depending upon how the input vector is mapped in the two-dimension feature space of FIG. 3(B), the comparison with all the stored prototype vectors, during a recognition phase, may provide ambiguous results. An input vector, presented to the neural network 2, is compared with all the prototype vectors in the feature space. Each second layer neuron circuit calculates the distance between the input vector and the neuron's stored prototype vector. If the input vector falls within the influence field of a prototype vector, the category attached to the prototype vector is assigned to the input vector. If the input vector falls within the influence fields of several prototype vectors with the same category, then again, that common category is assigned to the input vector. In both cases, an input vector has been recognized by the neural network as being in a single category and, so, is "identified". However, should the input vector fall within the influence fields of at least two prototype vectors belonging to different categories but with overlapping influence field, the network response is ambiguous. The input vector is recognized (at least twice) but not identified because a single category cannot be assigned to it (or associated with it), therefore, the input vector is "undefined" or "uncertain".
In all the above cases, the corresponding neuron circuits which have recognized the input vector are said to have "fired" or "triggered." When a neuron fires, a fire signal F is set active (F=1). If during recognition, an input vector does not fall within the influence field of one neuron circuit of the neural network, every neuron's fire signal F remains inactive (F=0).
Neuron responses that are generated at the neuron circuit level is known as "local" or "neuron" responses. The neural network's responses are known as "global" or "neural" responses. Local responses first include local result information (e.g. a neuron circuit generates a local result fire signal) and local status (e.g. whether a neuron circuit is in a degenerate status) referred to hereinbelow as local results. Local information responses (e.g. distance or category data) are referred to hereinbelow as local data. Likewise, global responses include global results (e.g., as a neural network identifies an input vector, in response, a global result signal is generated) and global output data (e.g. the minimum of all local distances Dmin). Therefore, local data, representative of the local reaction of an individual neuron circuit to the presentation of the input vector, are "consolidated" to produce global data.
Prior art neural networks of the type illustrated in FIG. 3(A) have been extensively used in the industry so far. However, prior art neural network architectures and the prior art neuron circuits employed therein have many limitations and disadvantages.
First, conventional prior art neural networks are in limited complexity because without extra logic, the number of cascaded neuron layers is limited. Limited network complexity means limited network function. However, to overcome this limitation and increase the number of neuron layers in a neural network, extra circuitry must be added. This additional circuitry adds processing delays that slows the neural network's performance and consumes space. Further, circuits, external to the network, hinder the network speed, flexibility and learning capacity. Thus, the neural network size that can be implemented on a single Very Large Scale Integration (VLSI) chip is limited. Therefore, there is a need for increasing the number of layers of neuron circuits that may be included in a neural network. There is also a need to reduce or eliminate any requirement for external circuits in neural networks or in the expansion thereof.
Another limitation of prior art conventional neural networks is that they are not autonomous. A digital computer, typically a micro-controller or a dedicated micro-processor must supervise the neural network in order to formulate any global results. See, for example, U.S. Pat. No. 5,165,010, to Masuda, et al entitled "Information Processing System" and, especially FIG. 23 therein for an example of a micro-controller supervising a neural network formed from a plurality of neuron circuits. The neural computer system described therein is organized with the same parallel architecture as in a conventional micro-controller. Data is exchanged on a data bus between the neuron circuits and the micro-controller, with addresses on a standard address bus. In conventional prior art neural networks, the neuron circuits are totally passive and communicate only with the micro-controller. There is no direct data communication or exchange between individual neuron circuits in the neural network. In addition, because these prior art neural network computers are software controlled, the recognition or the learning phase may each be lengthy, complex operations.
Another disadvantage of conventional neural network chips is that the number of input/output pads is dependent on the number of neuron circuits integrated therein. Increasing the number of neurons require increased address capacity. However, increased address capacity requires more chip input/output (I/O) pads. So, since there is a maximum number of I/O pads, for any chip, the number of I/O pads available for addresses is limited. This limitation limits the number of neurons per chip. For the same reason, the number of I/O pins of the electronic neural modules incorporating multiple such neural network chips is determined by neuron addressing requirements.
The number of categories that are available in such conventional neural networks also is limited. For example, see U.S. Pat. No. 4,326,259 to Cooper, et al., entitled "Self-Organizing General Pattern Class Separator and Identifier" which teaches a neural network wherein the neuron circuits are arranged in a column. The neuron circuit outputs feed the horizontal input lines of a PLA, with vertical PLA lines providing the categories. From FIG. 8 of Cooper, it is clear that the number of categories is limited, for several reasons. In particular, the number of categories is limited because the result must be interpreted by the user. Also, the global information relating to the formal identification of the input vector by the neural network is not generated directly. The user has to interpret the results, whether one neuron fires or several neuron circuits fire.
Another limitation of prior art neuron circuit architecture is that a category, such as C1, C2 or C3, attached to each output neuron of the neural network 2 of FIG. 3(A), cannot be attached at the neuron circuit interconnection level. Particular neuron circuits cannot be selectively blocked from participating in the recognition phase for a determined family of input vectors. This prior art approach is inflexible. It does not permit organizing the neural network either as a single network or as subsets thereof, as the user might desire.
Finally, for these prior art neural networks, recognition and learning must be done at different times. Generally, prototype vector weights are determined separately, by the micro-controller and, subsequently, loaded into neuron circuits, until the micro-controller decides that the learning phase is completed. As a consequence, the recognition and the learning phases cannot be done concurrently and are clearly distinct operations. In conventional neural networks, training a neuron involves adjusting the weights, which, usually, are set randomly at initialization. Once the weights are adjusted, input vectors are supplied to the neural network and outputs (responses) are observed. If an output signal is erroneous, then a mathematical computation is done to determine how the weights should be adjusted. After adjusting the weights, the input vectors are resupplied and the neural network's response to each is re-evaluated until it is correct. In the prior art systems, such as in U.S. Pat. No. 5,222,193 to Shaefer entitled "Optimization Techniques Using Genetic Algorithms" training a neural network requires a Personal Computer Personal Programmer (PCPP) connected to a host computer through a Generic Universal Programmer Interface (GUPI).
These disadvantages may be better understood in light of the neural network 2 of FIG. 3(A). For example, with respect to the determination of the minimum distance Dmin between an input vector and the prototype vectors stored in neuron circuits N1 to N9: Typically, the micro-controller interrogates the first neuron circuit for the distance it has computed; Then, the micro-controller interrogates the second neuron circuit for the distance it computed; and, finally, the micro-controller compares the two distances, selecting the lowest value. This process is continued, in sequence, by successive iterations until the last neuron circuit has been interrogated. The minimum distance value between the input vector and all the prototype vectors is determined only at the end of the process. So, the above-described reduction process is delayed until after the last neuron is interrogated.
A similar technique is applied during the reduction process. Successive iterations are conducted to exclude any neuron circuits that have wrongly fired until only the neuron circuit with the correct category remains. This prior art method requires a dedicated software program, based upon a complex sorting algorithm. The sorting algorithm, typically, requires a significant number of lines of instructions for the interrogation and comparison steps. So, the sort process is very time consuming. Further, because intercommunication between the neuron circuits of the neural network 2 is restricted, potential correlations between each local result signals and between the global result signals cannot be fully exploited. As a consequence, the conventional neural network of FIG. 3(A) only provides limited global information data to the user. In addition, the number of categories that are available at the output neuron level is limited also by neuron fan-out (electrical) limitations.