(1) Field of the Invention
The present invention relates generally to the field of electronic neural networks, and more particularly to a new architecture for neural networks having a plurality of hidden layers, or multi-layer neural networks, and further to new methodologies for providing supervised and unsupervised training of neural networks constructed according to the new architecture.
(2) Description of the Prior Art
Electronic neural networks have been developed to rapidly identify patterns in certain types of input data, or to accurately classify the input patterns into one of a plurality of predetermined classifications. For example, neural networks have been developed which can recognize and identify patterns, such as the identification of hand-written alphanumeric characters, in response to input data constituting the pattern of on/off picture elements, or xe2x80x9cpixels,xe2x80x9d representing the images of the characters to be identified. In such a neural network, the pixel pattern is represented by, for example, electrical signals coupled to a plurality of input terminals, which, in turn, are connected to a number of processing nodes, or neurons, each of which is associated with one of the alphanumeric characters which the neural network can identify. The input signals from the input terminals are coupled to the processing nodes through certain weighting functions, and each processing node generates an output signal which represents a value that is a non-linear function of the pattern of weighted input signals applied thereto. Based on the values of the weighted pattern of input signals from the input terminals, if the input signals represent a character which can be identified by the neural network, one of the processing nodes which is associated with that character will generate a positive output signal, and the others will not. On the other hand, if the input signals do not represent a character which can be identified by the neural network, none of the processing nodes will generate a positive output signal. Neural networks have been developed which can perform similar pattern recognition in a number of diverse areas.
The particular patterns which the neural network can identify depend on the weighting functions and the particular connections of the input terminals to the processing nodes, or elements. As an example, the weighting functions in the above-described character recognition neural network essentially will represent the pixel patterns which define each particular character. Typically, each processing node will perform a summation operation in connection with the weight values, also referred to as connection values or weighting values, representing the weighted input signals provided thereto, to generate a sum that represents the likelihood that the character to be identified is the character associated with that processing node. The processing node then applies the non-linear function to that sum to generate a positive output signal if the sum is, for example, above a predetermined threshold value. The non-linear functions which the processing nodes may use in connection with the sum of weighted input signals are generally conventional functions, such as step functions, threshold functions, or sigmoids. In all cases the output signal from the processing node will approach the same positive output signal asymptotically.
Before a neural network can be useful, the weighting functions for a set of the respective input signals must be established. In special cases, the weighting functions can be established a priori. Normally, however, a neural network goes through a training phase, in which input signals representing a number of training patterns for the types of items to be classified (e.g., the pixel patterns of the various hand-written characters in the character-recognition example) are applied to the input terminals, and the output signals from the processing nodes are tested. Based on the pattern of output signals from the processing nodes for each training example, the weighting functions are adjusted over a number of trials. Once trained, a neural network can generally accurately recognize patterns during an operational phase. The degree of success is based in part on the number of training patterns applied to the neural network during the training stage and the degree of dissimilarity between patterns to be identified. Such a neural network can also typically identify patterns which are similar to the training patterns.
One of the problems with conventional neural network architectures as described above is that the training methodology, generally known as the xe2x80x9cback-propagationxe2x80x9d method, is often extremely slow in a number of important applications. Also, under the back-propagation method, the neural network may provide erroneous results which may require restarting the training. In addition, even after a neural network has been through a training phase, confidence that the best training has been accomplished may sometimes be poor. If a new classification is to be added to a trained neural network, the complete neural network must be retrained. Further, the weighting functions generated during the training phase often cannot be interpreted in ways that readily provide understanding of what they particularly represent.
Accordingly, it is an object of the present invention to provide a new and improved neural network architecture for use in pattern recognition in which the weighting functions may be determined a priori.
Another object of the present invention is to provide a neural network architecture which can be trained with a single application of an input data set.
A further object of the present invention is to provide a neural network architecture which can be used in time varying pattern recognition.
Other objects and advantages of the present invention will become more obvious hereinafter in the specification and drawings.
In accordance with the present invention, a new neural network architecture, referred to hereinafter as a dynamic memory processor, is provided. The dynamic memory processor receives inputs from a sensor and provides low dimensional pattern recognition or classification identifiers for the inputs. The dynamic memory processor provides pattern recognition for time variant inputs, such as sound signal inputs. The dynamic memory processor is part of a new neural network technology that is constructed rather then trained. Since the words xe2x80x9cneural networksxe2x80x9d often connote a totally trainable neural network, a constructed neural network is a connectionist neural network device that is assembled using common neural network components to perform a specific process. The constructed neural network assembly is analogous to the construction of an electronic assembly using resistors, transistors, integrated circuits and other simple electronic parts. A constructed neural network is fabricated using common neural network components such as processing elements (neurons), output functions, gain elements, neural network connections of certain types or of specific values and other artificial neural network parts. As in electronics, the design goal and the laws of nature such as mathematics, physics, chemistry, mechanics, and xe2x80x9crules of thumbxe2x80x9d are used to govern the assembly and architecture of a constructed neural network. A constructed neural network, which is assembled for a specific process without the use of training, can be considered equivalent to a trained neural network having accomplished an output error of zero after an infinite training sequence. Although there are some existing connective circuits that meet the design criteria of a constructed neural network, the term xe2x80x9cconstructed neural networkxe2x80x9d is used herein to differentiate this new neural technology which does not require training from the common neural network technology requiring training. A constructed neural network can consist of a neural network module, such as a neural director, a neural sensor, or a vector decoupler, and may contain one or more modules to develop a specific neural architecture. The construction and functioning of some or all of these modules will be further explained hereinafter. Combinations of neural network modules in the present invention lead to a unique neural architecture producing a distributed connectionist computational process. It is noted that one embodiment of the present invention requires training of two neural network modules with a single application of an input stimulus, but does not require retraining of previously learned input stimuli when being trained for a new input stimulus.
Constructed neural networks can be embodied in analog or digital technologies, or in software. Today one can find a blurring between the boundaries of analog and digital technologies. Some of the classic analog processing is now found in the realm of digital signal processing and classic digital processing is found in analog charged couple devices and sample and hold circuits especially in the area of discrete time signals and delay lines.
One of the differences between a classic neural network device and a constructed neural network device is in the internal processing of recognition information. The neural network processing of a multilayer constructed neural network device, or of a subsection of a constructed neural network device, lies within a multiplicity of concentric multidimensional spherical spaces. This can simply be envisioned as an xe2x80x9conionxe2x80x9d where each spherical onion layer represents a spherical multidimensional layer populated with neural network processors. A multilayer constructed neural network device processes an input vector, which may be modified as it propagates through each layer by an alteration of the vector""s direction in its multidimensional space. The vector""s direction in its multidimensional space is used in the recognition of the vector""s representation. The vector""s altered direction in its spherical multidimensional space compared to the vector""s unaltered direction can be seen as a nonlinearity in a constructed neural network. The classic neural network device primarily uses nonlinear neurons for its nonlinear processing in an undefined subspace. In addition to the possible use of nonlinear neurons for its nonlinear processing, the constructed neural network device in a multidimensional spherical subspace primarily uses a second form of nonlinearity as will be discussed further below.
One of the components utilized in constructing a dynamic memory processor is a neural director. In brief, a neural director receives an input vector X comprising xe2x80x9cIxe2x80x9d input components Xi and generates in response thereto, an output vector Y comprising xe2x80x9cJxe2x80x9d output components Yj, where xe2x80x9cIxe2x80x9d and xe2x80x9cJxe2x80x9d are the neural director""s input and output dimensions. The neural director has an input processing node layer comprised of xe2x80x9cIxe2x80x9d processing nodes and an output processing node layer comprised of xe2x80x9cJxe2x80x9d processing nodes. Each output processing node receives the outputs from the input processing nodes to which a xe2x80x9cjxe2x80x9d subset of weighting values w(i,j) has been applied and generates one of said output components Yj representing a linear function in connection therewith. The weighting values w(i,j) contain a unique internal representation of a uniform spatial distribution.
A neural director can be constructed to be one of two types, designated type 1 or type 2. The two types differ in what may be termed xe2x80x9cspatial linearityxe2x80x9d. In addition to classic linearity, i.e., the use of non-linear output functions in the neural circuit, spatial linearity includes a xe2x80x9clinearity in spacexe2x80x9d. In a fully populated single layer neural network which has xe2x80x9cIxe2x80x9d input processing nodes and xe2x80x9cJxe2x80x9d output processing nodes, each of the output processing nodes will contain xe2x80x9cIxe2x80x9d weight values. The xe2x80x9cIxe2x80x9d weight values of each processing node can be considered a vector of xe2x80x9cIxe2x80x9d components in an xe2x80x9cIxe2x80x9d dimensional space. One of the many important characteristics of a constructed neural network is that a classification of an input pattern is greatly defined by a vector""s direction in a multidimensional space. Thus, spatial linearity/nonlinearity affects the internal process of a dynamic memory processor. An angular relationship between input and output vector pairs can be used to define spatial linearity. A network is linear in space when the angles between all different vector pairs are the same in the output space as they are in the input space regardless of the dimensionalities of the spaces. A network is nonlinear if it is either classically and/or spatially nonlinear. A spatial nonlinearity causes an input vector pair to diverge in direction in the output space and is analogous to a system nonlinearity in chaos theory where two similar initial condition points diverge over time for each cycle through the nonlinear system. A neural director type 1 is linear in both its neural circuit, i.e., classically linear, and in its space, i.e., spatially linear. Generally, a neural director type 2 is classically linear but spatially nonlinear, though it will be understood that either classic or spatial nonlinearity will result in a neural director type 2. When compared to a neural director type 1 of the same input and output dimensions, a neural director type 2 nonlinearly shifts an input vector away from the output direction which one would anticipate using the neural director type 1. One embodiment of a neural director type 2 produces a nonlinear gradient between two poles in its multidimensional output space, one pole lying in the center of a sub space that is directed by all positive elements and the other pole being the opposite polarity.
Spatial nonlinearity is a parameter for a constructed neural network connectionist device which affects the recognition differentiation between similar input patterns. Reduced to its most basic concept, a constructed neural network senses features from a specific input pattern to provide a deterministic direction through a connecting circuit as a feature vector. This deterministic direction in a multidimensional space is the information that is used for the recognition and classification of the pattern. The spatial nonlinearities of the type 2 neural director provide a process that allows the discrimination of finer details in the recognition of an input pattern. Spatial nonlinearity is the result of a deterministic change in a vector""s direction in its multidimensional space relative to its intended direction in a linear space. Spatial nonlinearities are caused by the partial restriction of linear coupling of vector data between multidimensional spaces. The dimensionalities between these spaces may be different or the same. While most conventional neural networks demonstrate a spatial nonlinearity, the spatial nonlinearity is primarily caused by the use of nonlinear neurons.
The neural director type 1 has several advantages in performing different operations depending upon its application in a network. A neural director type 1 has the ability to linearly transform a vector from one set of dimensions to the same or that of another set of dimensions. The type 1 neural director can fuse separate data paths into a single vector as each output element of the vector contains a composition of all input elements of the input data, e.g., a neural director with equal input and output dimensions. The type 1 neural director may also distribute input data into different layers of like data and can expand its input data into higher dimensions, where the input data can be sensed at a higher resolution than it can in its lower dimension. Although the dimensions are not totally independent, the dimensional independency can be increased when the type 1 neural director is coupled with a spatially nonlinear device. The neural director type 1 can represent a generalized matched filter which contains all possible combinations of input patterns due to its distributed connection set. The type 1 neural director can linearly expand input data or can use nonlinear output functions, which when applied to a conventional neural network with the original data, or with the original data as the neural director input in lieu of the original data alone, will make the conventional network learn faster. Depending on the resolution chosen for the internal representation of the uniform spatial distribution, a neural director type 1 may be called a xe2x80x9cnearxe2x80x9d ideal neural director type 1. A near ideal neural director type 1 remains linear in its neural circuit but it is slightly nonlinear in space because the position of a vector in the neural director""s output space will be altered relative to the vector""s ideal position in a linear space. Used in a multilayer neural director, the near ideal neural director type 1, without other nonlinearities, increases the recognition resolution of similar input patterns. Unless indicated otherwise, the neural directors used herein are type 1 neural directors.
The dynamic memory processor utilizes combinations of neural directors to form a multi-layer harmonic neural network, a classifier network, an integration neural network and a word map neural network. The pattern to be recognized is input into, or is sensed by, a neural sensor which generates a fused feature vector. The fused feature vector is then received by a multi-layer harmonic neural network which generates output vectors aiding in the discrimination between similar patterns. The fused feature vector and each output vector from its corresponding multi-layer harmonic neural network are separately provided to corresponding positional king of the mountain (PKOM) circuits within the classifier network. Each PKOM circuit generates a positional output vector with only one of its elements having a value corresponding to an active output of one, this element corresponding to the element of the fused feature vector, or output vector, having the highest contribution in its respective vector. All other elements of the positional output vector have a value corresponding to zero (no output). The positional output vectors are mapped into a multidimensional memory space, thus the memory space is sparsely populated with active outputs corresponding to the highest contributing elements. A set of active outputs is called a memory vector. As time progresses, a multiplicity of memory vectors are produced by temporal variations in the input pattern. An array of recognition vectors read the memory vector space, activating probability connections and their xe2x80x9cintegratingxe2x80x9d neuron to generate one of a class of likelihood outputs to a class PKOM which outputs classification identifiers for the input pattern. For each potential class, or set of features of the input pattern, the class PKOM provides output values corresponding to the respective likelihood of the features, with the class whose likelihood is the highest having the maximum asserted output value. Thus the classification identifiers provide the desired recognition of the input pattern. In the case of a sound pattern, two neural sensors are utilized, one to provide a fused feature vector for the envelope signal of the sound pattern and the other providing a fused feature vector for the tonal signal of the sound pattern. The envelope fused feature vector and the tonal fused feature vector are then each separately received by a corresponding memory processor. The classification identifiers for the envelope and tonal signals are then provided to a plosive and tonal neural network. The class likelihood outputs of the plosive and tonal neural network are then provided to a word map neural network which outputs classification identifiers for the input pattern, or, in the case of a sound input pattern, the input word.