Artificial intelligence (AI) applications involve machines or software that are made to exhibit intelligent behavior such as learning, communication, perception, motion and manipulation, and even creativity. The machines or software can achieve this intelligent behavior through a variety of methodologies such as search and optimization, logic, probabilistic methods, statistical learning, and neural networks. Along these lines, various deep learning architectures such as deep neural networks (deep NN) including deep multi-layer perceptrons (MLPs) (often referred to as a DNN), convolutional deep neural networks, deep belief networks, recurrent neural networks (RNN), and long-short-term memory (LSTM) RNNs, have gained interest for their application to fields like computer vision, image processing/recognition, speech processing/recognition, natural language processing, audio recognition, and bioinformatics.
A deep NN generally consists of an input layer, an arbitrary number of hidden layers, and an output layer. Each layer contains a certain amount of units, which may follow the neuron model, and each unit corresponds to an element in a feature vector (such as an observation vector of an input dataset). Each unit typically uses a weighted function (e.g., a logistic function) to map its total input from the layer below to a scalar state that is sent to the layer above. The layers of the neural network are trained (usually via unsupervised machine learning) and the units of that layer assigned weights. Depending on the depth of the neural network layers, the total number of weights used in the system can be massive.
Many computer vision, image processing/recognition, speech processing/recognition, natural language processing, audio recognition, and bioinformatics are executed and managed at data centers supporting services available to large numbers of consumer and enterprise clients. Data centers are designed to run and operate computer systems (servers, storage devices, and other computers), communication equipment, and power systems in a modular and flexible manner. Data center workloads demand high computational capabilities, flexibility, power efficiency, and low costs. Being able to accelerate at least some portions of large-scale software services can achieve desired throughputs and enable these data centers to meet the demands of their resource consumers. However, the increasing complexity and scalability of deep learning applications can aggravate problems with memory bandwidth.