The computer implemented Joint Proximity Association Template (JPAT) neural network for implementing a bi-directional neural network framework invention that is to be described in this patent is constructed from a combination of two distinct types of machine learning methods; a convolutional neural network and an associative memory matrix. By recognizing that a neural network is a logic based device and by interpreting associative memory as an intuitive based device, this invention can be said to emulate the intra-action and the inter-action of the cognitive processes of the left-brain and right brain. The invention is a computer processing soft-ware based implementation that (1) reduces the long training times by a full order of magnitude, (2) reduces the execution time to reach a decision by a full order of magnitude, and (3) produces beneficial intralayer and interlayer connections.
The implementation of this computer implemented joint processing architecture is designed to take an existing hierarchy of proximal layers of feed-forward convolutional neural network processes, add next to it a parallel hierarchy of proximal associative memory processes, and then furthermore, connect the two processes by another set of associative memory processes. FIG. 1 gives the visual outline for the joint processing architecture which has the appearance of a ladder. The part of the figure that is enclosed by the dash-lined box indicates the main part of the invention. This invention It is the purpose of this paper to describe how the device was built and how it can be implemented as a machine learning enhancement tool that may be used to replace or complement any existing convolutional neural networks programmed for image classification.
The present technical solution can be a computer program method, a system, or a product at any technical detail of integration. The computer program product may include a non-volatile material-based computer readable storage medium having computer readable instructions therein for invoking a processor to carry instructions of the present technical solution in response to an instruction execution device.
Several references are given at the end of this paper in the form of patents and published papers. These references are (roughly) split into two groups; one group for neural networks (NN) and one group for associative memory (AM). There is a third group called Both.
A diagram for a basic neural network (NN) is given in FIG. 2. There is an input U, two sets of weights and biases and an output Y. The weights and biases weave the input to the output. Two equations are given that show the updating process as the process iterates. The equation for Y is shown and serves to produce a number. This number is compared to a desired number; usually in the form of a classification vector. In general there are several inputs matched with several desired outputs. The error curve tracks the trajectory towards an acceptable error threshold. There is no deterministic approach to get to the threshold immediately; it ‘just happens’ with the aid of Paul Werbos and his insightful thesis whose subject was the application of backpropagation to artificial NNs. The point of FIG. 2 is to show that in order to get from Layer U to Layer Y a set of weights are trained in order to carry out that task. The purpose of the weights is to add a greater capacity for correlation over that of simply comparing U to Y. The idea of adding layers between U and Y is rooted in the desire to emulate human cognitive processes. A diagram for a basic associative memory (AM) matrix is given is FIG. 3. The diagram shows an example with two input/output (I/O) pairs A1/B1 and A2/B2. These two pairs start out in binary form but are transformed into polar form into the corresponding pairs X1/Y1 and X2/Y2. An AMM is formed by summing the outer products of the two polar pairs. The matrix M is given on the right. This matrix takes the place of the weights and biases of FIG. 2. The only training, so to speak, is the forming of the outer products. The idea is to apply A1, or a noisy version thereof, to the matrix M whereupon the strong association embedded within M would draw the output to B1. The term Bidirectional Associative Matrix (BAM) comes from the fact that this output can in turn be applied to the transpose of M whereupon the new output should approximate A1. The back-and-forth process can be repeated until a so-called resonate pair forms.
In general the idea of forming associations between I/O pairs is well founded; only the way in which the M matrix is formed has issues with stability which has led to other approaches such as the Adaline method of Bernard Widrow and Ted Hoff. The point of FIG. 3 is to show that the I/O pairs themselves form the AM connection matrix in contrast to the derived weights and biases of the NN.
The NN process is used to demonstrate a necessary component of the overall process and make-up of J. Patrick's Ladder. In essence, the NN is only an auxiliary part of the invention which is the reason the dash-line box in FIG. 1 does not include the NN. Rather, the main part of the invention is the novel two-fold implementation of the AM construct in the way of a parallel (to the NN) process that may be described as demonstrating an intuitive sense that facilitates (1) faster learning, (2) faster execution, and (3) intra-layer information sharing.
For the remainder of this paper and in order to provide a clear example of what this invention is capable of doing, the number of I/O object pairs learning will be set to ten throughout this paper.
The specific neural network to be used for demonstration is the convolutional neural network (CNN) and the specific type of associate memory will follow the additive model which will be referred to as the Associative Memory Matrix (AMM). Note that another common name for AMM is Bidirectional Associative Memory (BAM). This paper will use AMM.
A generic outline of a computer implemented CNN process stored in memory is given in this background section as opposed to giving it in the summary of invention section. The outline below is the process step filler to the left rail of FIG. 1. The CNN processes are unidirectional processes which are indicated by the pairs of downward pointing arrows along the left rail.
For this Generic CNN Process outline, 21 steps are listed. In each step a computer is used to execute pre-programmed instructions that transform an input image throughout multiple stages and that stores numerical outputs of computer implemented transforms in memory used by a computer to classify the input image. We want to emphasize that the following steps only form a generic outline of a typical convolutional neural network with its typical functions of, and not limited to, downsampling, biasing, and application of an activation function (tan hn). The point of this part of the description is to show the context of the layering in any general convolutional neural network in order to direct the comparison between layers of the CNN and associative memory matrices in the JPAT; however, the instructions in the JPAT description under the SUMMARY OF THE INVENTION are detailed.
Given Layer 1
                1. The initial input Object/Image is a 28×28 matrix of numbers called L1Compute Layer 2 with the following process (labeled P12)        2. Apply 2D Convolution of L1 with twelve 15×15 filters        3. Apply Tan h        4. Downsample by four        5. Multiply by weights        6. Add biases        7. Tan h compressLayer 2 output consists of twelve 14×14 images and is called L2Compute Layer 3 with the following process (labeled P23)        8. Apply 2D Convolution of L2 with select sets of 5×5 filters        9. Apply Tan h        10. Downsample by four        11. Multiply by weights        12. Add biases        13. Tan h compressLayer 3 output consists of sixteen 5×5 images and is called L3Compute Layer 4 with the following process (labeled P34)        14. Apply 2D Convolution of L3 with select sets of 5×5 filters        15. Apply Tan h        16. Output one hundred twenty 1×1 imagesLayer 4 output is a 120×1 vector and is called L4Compute Layer 5 with the following process (labeled P45)        17. Multiply L4 output with weight matrix        18. Apply Tan hLayer 5 output is a 200×1 vector and is called L5Compute Layer 6 with the following process (labeled P56)        19. Multiply L5 output with weight matrix        20. Apply Tan hLayer 6 output is a 10×1 vector and is called L6        21. From L6 output, the classification decision is based on the position of maximum value This concludes the background to this invention. The invention will show how the CNN and the AMM processes are brought into a dependent relationship; one that appears to be very beneficial. Although the immediate application is to pattern recognition with respect to the MNIST data base of handwritten numeral 0-9, it is envisioned that it will also apply to other hierarchical systems that (1) classify human behavioral performance via EKGs, EEGs, and EOGs, (2) classify large clusters of data sets, i.e. Big Data, (3) carry out iterative hierarchical algorithms in the domain such as RF, Acoustics, and Geophysics and (4) monitor network traffic using Open System Interconnection (OSI) architecture. In short, it is envisioned that the JPAT construct will apply to any hierarchical and logic based machine learning class process.        