The invention herein disclosed comprises artificial neural networks (ANNs) and systems and methods based thereon for most of the applications of ANNs such as clustering, detecting or recognizing spatial, hierarchical and/or temporal patterns of objects or causes; understanding images or videos; recognizing speeches, handwriting and texts; and generating representations of probability distributions of labels of patterns of objects and causes, where data may contain erasure, smear, noise, occlusion, distortion, alteration, rotation, translation and/or scaling. The ANNs (artificial neural networks) are based on a low-order model of biological neural networks and have applications in a large number of fields such as computer vision, signal processing, financial engineering, telecommunication, data clustering, and data mining. Example applications are handwritten character/word classification, face recognition, fingerprint identification, DNA sequence identification, speech recognition, machine fault detection, baggage/container examination, video monitoring/understanding, image understanding, scene analysis, text/speech understanding, automatic target recognition, medical diagnosis, prosthesis control, robotic arm control, and vehicle navigation.
A good introduction to the prior art in ANNs (artificial neural networks) and their applications can be found in Simon Haykin, Neural Networks and Learning Machines, Third Edition, Pearson Education, New Jersey, 2009; Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer Science, New York, 2006.
An ANN, which is a functional model of biological neural networks, was recently reported in James Ting-Ho Lo, Functional Model of Biological Neural Networks, Cognitive Neurodynamics, Vol. 4, Issue 4, pp. 295-313, November 2010, where the ANN is called the temporal hierarchical probabilistic associative memory (THPAM), and in James Ting-Ho Lo, A Cortex-Like Learning Machine for Temporal and Hierarchical Pattern Recognition, U.S. patent application Ser. No. 12/471,341, filed May 22, 2009; Publication No. US-2009-0290800-A1, Publication Date Nov. 26, 2009, where the ANN is called the probabilistic associative memory (PAM). The ANN is hereinafter referred to as the THPAM. The goal to achieve in the construction of the THPAM was to develop an ANN that performs Hebbian-type unsupervised and supervised learning without differentiation, optimization or iteration; retrieves easily; and recognizes corrupted, distorted and occluded temporal and spatial information. In the process to achieve the goal, mathematical necessity took precedence over biological plausibility. This mathematical approach focused first on minimum mathematical structures and operations that are required for an effective learning machine with the mentioned properties.
The THPAM turned out to be a functional model of biological neural networks with many unique features that well-known models such as the recurrent multilayer perceptron, associative memories, spiking neural networks, and cortical circuit models do not have. However, the THPAM has been found to have shortcomings. Among them the most serious one is the inability of its unsupervised correlation rule to prevent clusters from overgrowing under certain circumstances. These shortcomings motivated further research to improve the THPAM. At the same time, the unique features of the THPAM indicated that it might contain clues for understanding the structures and operations of biological neural networks. To achieve this understanding and eliminate the mentioned shortcomings, the components of the THPAM were examined from the biological point of view with the purpose of constructing a biologically plausible model of biological neural networks. More specifically, the components of the THPAM were identified with those of biological neural networks and reconstructed, if necessary, into biologically plausible models of the same.
This effort resulted in a low-order model (LOM) of biological neural networks and an improved functional model called the Clustering Interpreting Probabilistic Associative Memory (CIPAM). They were respectively reported in the articles, James Ting-Ho Lo, A Low-Order Model of Biological Neural Networks, Neural Computation, Vol. 23, No. 10, pp. 2626-2682, October 2011; and James Ting-Ho Lo, A Cortex-Like Learning Machine for Temporal Hierarchical Pattern Clustering, Detection, and Recognition, Neurocomputing, Vol. 78, pp. 89-103, 2012, which are both incorporated into the present invention disclosure by reference. Note that “dendritic and axonal encoders”, “dendritic and axonal trees” and “dendritic and axonal expansions” in them are collectively called “neuronal encoders”, “neuronal trees” and “neuronal codes” respectively in the present invention disclosure and that “C-neuron”, “D-neuron” and “expansion covariance matrix” are called “nonspiking neuron”, “spiking neuron” and “code covariance matrices” respectively in the present invention disclosure.
It was subsequently discovered that the LOM and the CIPAM are equivalent in the sense that their corresponding components can mathematically be transformed into each other. In fact, generalizing the mathematical transformation that transforms the LOM and the CIPAM into each other, we can transform the LOM and the CIPAM into infinitely many equivalent models.
The LOM, the CIPAM and their equivalent models are each a network of models of the biological neuronal node or encoder (which is a biological dendritic or axonal node or encoder), synapse, spiking/nonspiking neuron, means for learning, feedback connection, maximal generalization scheme, feedback connection, etc. For simplicity, these component models are sometimes referred to without the word “model”. For example, the model neuronal node, model neuronal encoder, model neuronal tree, model synapse, model spiking/nonspiking neuron, etc. will be referred to as the neuronal node/encoder/tree, synapse, spiking/nonspiking neuron, etc. respectively. The LOM, the CIPAM and all their equivalent models can be used as artificial neural networks (ANNs). To emphasize that their components are artificial components in these artificial neural networks, they are referred to as the artificial neuronal node/encoder/tree, artificial synapse, artificial spiking/nonspiking neuron, etc. respectively.
If there is possibility of confusion, the real components in the brain are referred to with the adjective “biological”, for example, the biological neuronal node, biological neuronal encoder, biological neuronal tree, biological spiking/nonspiking neuron, and biological synapse, etc. The components of equivalent models (or equivalent ANNs) that can be obtained from transforming a component model of the LOM are given the same name of the said component of the LOM. In other words, all the model components (of the equivalent ANNs) that are equivalent to one another are given the same component name.
All models that are equivalent to the LOM including the LOM and the CIPAM use the “unsupervised covariance rule” instead of the “unsupervised correlation rule” used in the THPAM and can prevent the clusters of patterns or causes formed in synapses from overgrowing. Moreover, all models that are equivalent to the LOM including the LOM and the CIPAM use the “supervised covariance rule” instead of the “supervised correlation rule” used in the THPAM. These are two of the main improvements in the LOM, the CIPAM and other equivalent models over the THAPM.
From the application viewpoint, as an ANN, the LOM (or a mathematical equivalent thereof) has the following advantages:                1. No label of the learning data from outside the ANN is needed for the UPUs (unsupervised processing units) in the LOM to learn.        2. The unsupervised learning by a processing unit clusters data without involving selecting a fixed number of prototypes, cycling through the data, using prototypes as cluster labels, or minimizing a non-convex criterion.        3. Both the unsupervised and supervised covariance rules are of the Hebbian type, involving no differentiation, backpropagation, optimization, iteration, or cycling through the data. They learn virtually with “photographic memories”, and are suited for online adaptive learning. Large numbers of large temporal and spatial data such as photographs, radiographs, videos, speech/language, text/knowledge, etc. are learned easily. The “decision boundaries” are not determined by exemplary patterns from each and every pattern and “confuser” class, but by those from pattern classes. In many applications such as target and face recognition, there are a great many pattern and “confuser” classes and usually no or not enough exemplary patterns for some “confuser classes”.        4. Only a small number of algorithmic steps are needed for retrieving or estimating labels. Detection and recognition of multiple/hierarchical temporal/spatial causes are easily performed. Massive parallelization at the bit level by VLSI implementation is suitable.        5. empirical probability distributions and membership functions of labels are easily obtained by supervised processing units (SPUs) and unsupervised processing units (UPUs).        6. The ANN generalizes not by only a single holistic similarity criterion for the entire input exogenous feature vector, which noise; erasure; distortion and occlusion can easily defeat, but by a large number of similarity criteria for feature subvectors input to a large number of UPUs (processing units) in different layers. These criteria contribute individually and collectively to generalization for single and multiple causes. Example 1: smiling; putting on a hat; growing or shaving beard; or wearing a wig can upset a single similarity criterion used for recognizing a face in a mug-shot photograph. However, a face can be recognized by each of a large number of feature subvectors of the face. If one of them is recognized to belong to a certain face, the face is recognized. Example 2: a typical kitchen contains a refrigerator, a counter top, sinks, faucets, stoves, fruit and vegetable on a table, etc. The kitchen is still a kitchen if a couple of items, say the stoves and the table with fruit and vegetable, are removed.        7. Masking matrices in a PU (processing unit) eliminate effects of corrupted, distorted and occluded components of the feature subvector input to the PU, and thereby enable maximal generalization capability of the PU, and in turn that of the ANN.        8. The ANN is no more a blackbox with “fully connected” layers much criticized by opponents of such neural networks as multilayer perceptrons (MLPs) or recurrent MLPs. In a PU of the ANN, synaptic weights are covariances between neuronal codes and labels of the vector input to the PU. Each PU has a receptive field in the exogenous feature vector input to the ANN and recognizes the pattern(s) or cause(s) appearing within the receptive field. Such properties can be used to help select the architecture (i.e., layers, PUs, connections, feedback structures, etc.) of the ANN for the application.        9. The ANN (or a mathematical equivalent thereof) may have some capability of recognizing rotated, translated and scaled patterns. Moreover, easy learning and retrieving by an ANN allow it to learn translated, rotated and scaled versions of an input image with ease.        10. The hierarchical architecture of the clusterer stores models of the hierarchical temporal and spatial worlds (e.g., letters, words and sentences).        11. Ambiguity and uncertainty are represented and resolved with empirical probabilities and membership degrees in the sense of fuzzy logic.        12. Noises and interferences in inputs self-destruct like random walks with residues eliminated gradually by forgetting factors in the synapses, leaving essential informations that have been learned by repetitions and emphases.        13. The architecture of the ANN can be adjusted without discarding learned knowledge in the ANN. This allows enlargement of the feature subvectors, increase of the number of layers, and even increase of feedback connections.        
For simplicity and clarity of the present invention disclosure, we will mainly describe the LOM in the present invention disclosure and will also show how the LOM is transformed into the CIPAM and other ANNs that are mathematically equivalent to it by the use of affine functions and their inverses.