Computer artificial intelligence (AI) has been built on machine learning, particularly using deep learning techniques. With deep learning, a computing system organized as a neural network computes a statistical likelihood of a match of input data with prior computed data. A neural network refers to a plurality of interconnected processing nodes that enable the analysis of data to compare an input to “trained” data. Trained data refers to computational analysis of properties of known data to develop models to use to compare input data. An example of an application of AI and data training is found in object recognition, where a system analyzes the properties of many (e.g., thousands or more) of images to determine patterns that can be used to perform statistical analysis to identify an input object.
Neural networks compute “weights” to perform computation on new data (an input data “word”). Neural networks use multiple layers of computational nodes, where deeper layers perform computations based on results of computations performed by higher layers. Machine learning currently relies on the computation of dot-products and absolute difference of vectors, typically computed with multiply and accumulate (MAC) operations performed on the parameters, input data and weights. The computation of large and deep neural networks typically involves so many data elements it is not practical to store them in processor cache, and thus they are usually stored in a memory.
Thus, machine learning is very computationally intensive with the computation and comparison of many different data elements. The computation of operations within a processor is orders of magnitude faster than the transfer of data between the processor and main memory resources. Placing all the data closer to the processor in caches is prohibitively expensive for the great majority of practical systems due to the memory sizes needed to store the data. Thus, the transfer of data becomes a major bottleneck for AI computations. As the data sets increase, the time and power/energy a computing system uses for moving data around can end up being multiples of the time and power used to actually perform computations.
Descriptions of certain details and implementations follow, including non-limiting descriptions of the figures, which may depict some or all examples, and well as other potential implementations.