Deep Neural Networks (DNNs) have recently been produced which offer output results having state-of-the-art levels of accuracy. For example, DNNs have provided impressive results when applied to the interpretation of audio information, image information, text information, etc. A DNN is composed of multiple layers, where each layer, z, includes multiple neurons. Each neuron, in turn, provides an output result (referred to as an output activation) that is computed as a function of inputs provided by the neurons in a preceding layer, (z−1). A DNN model collectively refers to all of the parameters (e.g., weighting and biasing values) that are used to compute the activations.
A training system produces a DNN model based on a corpus of labeled input data, such as a corpus of images having labels that identify the objects in the images. In one case, the training system may produce the DNN model using a gradient descent technique, which entails successive forward and back propagation phases of analysis. Such a training task typically involves the processing of a large number of input examples (e.g., corresponding to terabytes of data), and learning a very large number of parameters (e.g., corresponding to billions of parameter values). Hence, the task of training a DNN can be expected to consume an enormous amount of computing resources, and can take a considerable about of time to perform.
The research community has proposed various distributed processing systems to train DNNs in an expedited manner. A distributed processing system is composed of a cluster of computing units which train the DNN in a parallel manner, e.g., by breaking the training task into sub-tasks and performing the sub-tasks in parallel.
More generally stated, the operation of training a DNN model using a distributed processing system is an example of a graph processing task. In the case of a DNN, the neurons of the DNN model constitute nodes within a graph, and the connections between the neurons constitute the links of the graph. Other distributed graph processing tasks, that is, other than the task of training a DNN model, are also resource-intensive in nature.