(1) Field of the Invention
The present invention relates generally to the field of electronic neural networks, and more particularly to a new architecture for neural networks having a plurality of hidden layers, or multi-layer neural networks, and further to a new neural network processor for classifying patterns in optical image data, or other arrays of input data having one or more input dimensions.
(2) Description of the Prior Art
Electronic neural networks have been developed to rapidly identify patterns in certain types of input data, or to accurately classify the input patterns into one of a plurality of predetermined classifications. For example, neural networks have been developed which can recognize and identify patterns, such as the identification of hand-written alphanumeric characters, in response to input data constituting the pattern of on/off picture elements, or xe2x80x9cpixels,xe2x80x9d representing the images of the characters to be identified. In such a neural network, the pixel pattern is represented by, for example, electrical signals coupled to a plurality of input terminals, which, in turn, are connected to a number of processing nodes, or neurons, each of which is associated with one of the alphanumeric characters which the neural network can identify. The input signals from the input terminals are coupled to the processing nodes through certain weighting functions, and each processing node generates an output signal which represents a value that is a non-linear function of the pattern of weighted input signals applied thereto. Based on the values of the weighted pattern of input signals from the input terminals, if the input signals represent a character, which can be identified by the neural network, one of the processing nodes that is associated with that character will generate a positive output signal, and the others will not. On the other hand, if the input signals do not represent a character, which can be identified by the neural network, none of the processing nodes will generate a positive output signal. Neural networks have been developed which can perform similar pattern recognition in a number of diverse areas.
The particular patterns that the neural network can identify depend on the weighting functions and the particular connections of the input terminals to the processing nodes, or elements. As an example, the weighting functions in the above-described character recognition neural network essentially will represent the pixel patterns that define each particular character. Typically, each processing node will perform a summation operation in connection with the weight values, also referred to as connection values or weighting values, representing the weighted input signals provided thereto, to generate a sum that represents the likelihood that the character to be identified is the character associated with that processing node. The processing node then applies the non-linear function to that sum to generate a positive output signal if the sum is, for example, above a predetermined threshold value. The non-linear functions, which the processing nodes may use in connection with the sum of weighted input signals, are generally conventional functions, such as step functions, threshold functions, or sigmoids. In all cases the output signal from the processing node will approach the same positive output signal asymptotically.
Before a neural network can be useful, the weighting functions for a set of the respective input signals must be established. In special cases, the weighting functions can be established a priori. Normally, however, a neural network goes through a training phase, in which input signals representing a number of training patterns for the types of items to be classified (e.g., the pixel patterns of the various hand-written characters in the character-recognition example) are applied to the input terminals, and the output signals from the processing nodes are tested. Based on the pattern of output signals from the processing nodes for each training example, the weighting functions are adjusted over a number of trials. Once trained, a neural network can generally accurately recognize patterns during an operational phase. The degree of success is based in part on the number of training patterns applied to the neural network during the training stage and the degree of dissimilarity between patterns to be identified. Such a neural network can also typically identify patterns that are similar to the training patterns.
One of the problems with conventional neural network architectures as described above is that the training methodology, generally known as the xe2x80x9cback-propagationxe2x80x9d method, is often extremely slow in a number of important applications. Also, under the back-propagation method, the neural network may provide erroneous results, which may require restarting the training. In addition, even after a neural network has been through a training phase, confidence that the best training has been accomplished may sometimes be poor. If a new classification is to be added to a trained neural network, the complete neural network must be retrained. Further, the weighting functions generated during the training phase often cannot be interpreted in ways that readily provide understanding of what they particularly represent.
In my related patent application entitled xe2x80x9cNEURAL DIRECTORSxe2x80x9d (Ser. No. 09/436,957, which is now U.S. Pat. No. 6,618,713), incorporated herein in its entirety by reference, a new neural network architecture, or neural director, was described in which the weighting functions may be determined a priori, i.e., the new neural network architecture is constructed rather then trained. The neural director has an input processing node layer, which receives the input vector X and an output processing node layer, which generates the output vector Y. In a type 1 neural director containing linear neurons, the connections between the input and output processing node layers are a unique weighting set w(i,j) that contains an internal representation of a uniform spatial distribution of xe2x80x9cJxe2x80x9d unit vectors throughout a unit sphere of xe2x80x9cIxe2x80x9d dimensions. Thus the cosine value between any two adjacent unit vectors is a constant everywhere in the unit sphere. A type 1 neural director is thus described as linear in both its neural circuit, i.e., classically linear, and in its space, i.e., spatially linear. A type 2 neural director, is generally classically linear but spatially nonlinear, though it will be understood that either classic or spatial nonlinearity will result in a neural director type 2. A spatial nonlinearity causes an input vector pair to diverge in direction in the output space and is analogous to a system nonlinearity in chaos theory where two similar initial condition points diverge over time. In the case of spatial nonlinearity, the system divergence occurs as the input data flows through repetitious stages of nonlinearity versus a chaotic system recursion over time. One of the many important characteristics of a constructed neural network is that a classification of an input pattern is greatly defined by a vector""s direction in a multidimensional space. Reduced to its most basic concept, a constructed neural network senses features from a specific input pattern to provide a deterministic direction through a connecting circuit as a feature vector. This deterministic direction in a multidimensional space is the information that is used for the recognition and classification of the pattern. When compared to a neural director type 1 of the same input and output dimensions, a neural director type 2 nonlinearly shifts an input vector away from the output direction which one would anticipate using the neural director type 1. A neural director type 2 produces a nonlinear gradient between two poles it its multidimensional output space, one pole lying in the center of a sub space that is directed by all positive elements and the other pole being the opposite polarity. The spatial nonlinearities of the type 2 neural director provide a process that allows the discrimination of finer details in the recognition of an input pattern. Depending on the resolution chosen for the internal representation of the uniform spatial distribution, a neural director type 1 may be called a xe2x80x9cnearxe2x80x9d ideal neural director type 1. A near ideal neural director type 1 remains linear in its neural circuit but it is slightly nonlinear in space because the position of a vector in the neural director""s output space will be altered relative to the vector""s ideal position in a linear space. Used in a multilayer neural director, the near ideal neural director type 1, without other nonlinearities, increases the recognition resolution of similar patterns.
My related patent application xe2x80x9cNEURAL SENSORSxe2x80x9d (Ser. No. 09/436,956, which is now U.S. Pat. No. 6,594,382), incorporated herein in its entirety by reference, described the use of neural directors, in combination with other constructed neural network components, to provide a neural sensor. The neural sensor receives raw input data defining a pattern, such as image or sound data, and generates a classification identifier for the pattern. The neural sensor has a pattern array former that organizes the raw input data into the proper array format. A first order processing section receives the pattern array and generates a first order feature vector illustrative of first order features of the input data. A second order processing section also receives the pattern array and generates at least one second order feature vector illustrative of gradients in the input data. A vector fusion section receives the feature vectors from the first and second order processing sections and generates a single fused feature vector, which is provided to a pattern classifier network, or memory processor.
The memory processor, embodiments of which are described in my related patent applications xe2x80x9cDYNAMIC MEMORY PROCESSORxe2x80x9d (Ser. No. 09/477,653, which is now U.S. Pat. No. 6,560,582) and xe2x80x9cSTATIC MEMORY PROCESSORxe2x80x9d (Ser. No. 09/477,638 which is now abandoned), incorporated herein in their entirety by reference, receives the fused feature vector and, in turn, generates a pattern classification for the input data. Generally, the neural sensor increases input data dimensionality for improved pattern sensitivity, while the memory processor reduces the data dimensionality into a specific class. The dynamic memory processor provides for recognition of a time variant input pattern and is particularly suited for speech recognition. The static memory processor provides for recognition of a non-time varying input image, or pattern and provides a class identifier for the dominant image.
Accordingly, it is an object of the present invention to provide a new and improved neural network architecture for use in pattern recognition in which the input image contains one or more whole or partially hidden patterns.
Other objects and advantages of the present invention will become more obvious hereinafter in the specification and drawings.
In accordance with the present invention, a new neural network architecture, referred to hereinafter as a Multimode Invariant Processor (MIP), is provided. The MIP utilizes one or more constructed neural network modules, such as neural directors, Positional King Of the Mountain (PKOM) circuits, a static memory processor and others to provide unique invariant processes producing classifications of the input data. The multimode invariant processor contains an architecture to process one, two, or higher dimensional arrays of input data. One embodiment of the MIP architecture, a two dimension architecture, produces a process similar to human peripheral vision. This embodiment will be described herein to provide a full understanding of the invention and an understanding for developing MIP architectures of other dimensionalities.
In brief summary, an image MIP, i.e., a two dimensional MIP, is provided to simultaneously classify one or more whole or partially hidden patterns in real world optical image data. The classification processing is invariant to combinatorial changes in photonic input image translation, scale size, rotation and partial image input data. The photonic input image or input image defines two-dimensional spatial data from an array of photo transducers or pixels each represented by a pixel value. The multimode invariant processor comprises a retina portion, a spatial nonlinear portion, a convergence processing portion and a classifier portion. The retina portion receives the input image and transforms the input image into image data and generates in response a vector of local image gradient information for each pixel. The spatial nonlinear portion includes a neural director array (harmonic neural network) associated with each respective pixel, which generates respective feature vectors. The feature vectors can have a greater dimensionality than the image data, to aid in discrimination between similar patterns of the input image. The spatial nonlinear portion processes image data to further increase the discrimination between similar patterns of the input image and to generate image feature information representing at least one image primitive of the input image. An image primitive is defined as a smallest part of an image that can be distinguished from another image primitive of said image, with respect to a specific MIP input resolution. The convergence processing portion further increases the discrimination between similar patterns of the input image and generates and converges local common image feature information from any pixel position through a common feature space into a portion of a memory vector space. Each independent input image generates a set of primitive activations in the memory vector space. The classifier portion receives all primitive activations, or information, and generates in response a classification indicating the likelihood that one or more independent images are present in an image input data.