A common calculation performed in neural networks and machine learning algorithms is a weighted sum. This calculation takes a set of inputs (xi), multiplies it by a set of weights (wi), and sums those values to create a final result (z). This can be written as:
  z  =            ∑      i                            ⁢                  ⁢                  x        i            ⁢              w        i            Although this a relatively simple equation, it is the basic computation step for most neural network designs and the inputs can number in the thousands, while the weights can number in the millions. Current software-based neural network designs are limited by the ability to perform this calculation, among other things. A “complete” neuron will perform this calculation, then perform a function on z to create the final neuron output (y). Typically the function on z is trivial compared to the weighted sum. Common examples include the rectified linear, binary threshold, and logistic neuron functions:
      Rectified    ⁢                  ⁢    Linear    ⁢          :        ⁢                  ⁢    y    =      {                                                                                        z                  ⁢                                                                          ⁢                  if                  ⁢                                                                          ⁢                  z                                >                0                                                                                        0                ⁢                                                                  ⁢                otherwise                                                    ⁢                                  ⁢        Binary        ⁢                                  ⁢        Threshold        ⁢                  :                ⁢                                                  ⁢                                                ⁢        y            =              {                                                                                                                        1                      ⁢                                                                                          ⁢                      if                      ⁢                                                                                          ⁢                      z                                        >                    0                                                                                                                    0                    ⁢                                                                                  ⁢                    otherwise                                                                        ⁢                                                  ⁢            Logistic            ⁢                          :                        ⁢                                                  ⁢            y                    =                      1                          (                              1                +                                  e                                      -                    z                                                              )                                          
Neural networks and other machine learning algorithms typically apply multiple sets of weights to the same inputs. These weight sets are often called “filters”, and function to detect a pattern in the input data. Many neural networks and machine learning algorithms function by searching the inputs for patterns and then providing the results to another stage of processing.
Due to the large number of weights that need to be stored in these systems, memory management is a key technical challenge. Upwards of millions or even billions of weights are needed for processing the data. Continuously loading and reloading these weights becomes a bottleneck in terms of power, physical area of implementation, and performance. Previous systems which used flash memory to store the weights also used structures that could use one or a few weights at a time, used inefficient architectures, performed very slowly, and/or had limited capability. Therefore, it is desirable to develop an integrated circuit design that mitigates these issues.
This section provides background information related to the present disclosure which is not necessarily prior art.