The recent surge in the performance of machine intelligence systems is not due to the development of revolutionary new algorithms. Indeed, the core algorithms used in machine intelligence applications today stem from a body of work that is now over half a century old. Instead, it has been improvements in the hardware and software that implement machine intelligence algorithms in an efficient manner that has fueled the recent surge. Algorithms that were once too computationally intensive to implement in a useful manner with even the most sophisticated of computers can now be executed with specialized hardware on an individual user's smart phone. The improvements in hardware and software take various forms. For example, graphical processing units traditionally used to process the vectors used to render polygons for computer graphics have been repurposed in an efficient manner to manipulate the data elements used in machine intelligence processes. As another example, certain classes of hardware have been designed from the ground-up to implement machine intelligence algorithms by using specialized processing elements such as systolic arrays. Further advances have centered around using collections of transistors and memory elements to mimic, directly in hardware, the behavior of neurons in a traditional artificial neural network (ANN). There is no question that the field of machine intelligence has benefited greatly from these improvements. However, despite the intense interest directed to these approaches, machine intelligence systems still represent one of the most computationally and energy intensive computing applications of the modern age, and present a field that is ripe for further advances.
The reason machine intelligence applications are so resource hungry is that the data structures being operated on are generally very large, and the number of discrete primitive computations that must be executed on each of the data structures are likewise immense. A traditional ANN takes in an input vector, conducts calculations using the input vector and a set of weight vectors, and produces an output vector. Each weight vector in the set of weight vectors is often referred to as a layer of the network, and the output of each layer serves as the input to the next layer. In a traditional network, the layers are fully connected, which requires every element of the input vector to be involved in a calculation with every element of the weight vector.
The enormous size of the data structures is problematic from the perspective of storage and memory management. Tautologically, large data structures require a large amount of space to store. However, traditional ANNs present a somewhat unique data management problem in that the data structures can be sparse, but the sparsity is difficult to utilize for purposes of data storage resource mitigation because the sparse data can encode important “spatial information” regarding the relative locations of non-sparse data elements within the data structure. Furthermore, since traditional ANNs are difficult to parallelize, intermediate outputs in the execution of an ANN need to be stored, at least temporarily, between computations.
Methods for efficiently conducting matrix computations with sparse matrices and for compressing sparse matrices are known in art and can be applied to combat the above-mentioned issues. Known methods include specific formats for sparse matrixes to be stored so that the non-sparse values of the matrix can be readily retrieved and applied to a calculation using the matrix. One such format is known as compressed sparse row (CSR) format in which data representing a matrix is more efficiently stored in three separate matrices. One of those matrices stores the non-sparse values, one stores the column indices of the non-sparse values in the original matrix, and one stores a pointer for the row indices of the original matrix with reference to the other two newly-generated matrices.
FIG. 1 illustrates a directed graph 100 for the computation of a modern machine intelligence system. The input to directed graph 100 is an input tensor X. The output of directed graph 100 is an output tensor Y. The input could be an encoding for a picture, such as an image of a cat 101. In this example, execution of directed graph 100 involves the graph providing an encoding of a textual guess as to what the content of the encoded image contained. The graph output can be referred to as an inference generated by the directed graph because the machine intelligence system is effectively inferring what the picture shows from the encoding of the picture. As such, if directed graph 100 represented a properly trained machine intelligence system, execution of graph 100 with input tensor X would produce an output tensor Y which encoded the word “CAT” as illustrated.
The edges of directed graph 100 represent calculations that must be conducted to execute the graph. In this example, the graph is broken into two sections—a convolutional section 102 and a fully connected section 103. The convolutional portion can be referred to as a convolutional neural network (CNN). The vertices in the directed graph of CNN 102 form a set of layers which includes layers 106, 107, and 108. The layers each include sets of tensors such as tensors 109, 110, and 111. The vertices in the directed graph of fully connected section 103 also form a set of layers which includes layers 112 and 113. Each edge in directed graph 100 represents a calculation involving the origin vertex of the edge. In CNN 102, the calculations are convolutions between the origin vertex and a filter. Each edge in CNN 102 is associated with a different filter F11, Fn1, F12, Fn2 etc. As illustrated, filter F12 and tensor 109 are subjected to a full convolution to generate one element of tensor 111. Filter F12 is “slid around” tensor 109 until a convolution operation has been conducted between the filter and the origin vertex. In other approaches, filter F12 and a portion of tensor 109 are multiplied to generate one element of tensor 111 and the full convolution is used to generate multiple elements of tensor 111. In fully connected section 103, the calculations are multiplications between a set of weights and the values from the prior layer. In fully connected section 103, each edge is associated with a unique weight value that will be used in the calculation. For example, edge 114 represents a multiplication between weight wn and input value 115. The value of element 116 is the sum of a set of identical operations involving all the elements of layer 112 and a set of weight values that uniquely correspond to the origin vertex of each edge that leads to element 116.
Execution of directed graph 100 involves many calculations. In the illustration, dots are used in the vertical directions to indicate the large degree of repetition involved in the directed graph. Furthermore, directed graph 100 represents a relatively simply ANN, as modern ANNs can include far more layers with far more complex interrelationships between the layers. Although not illustrated by directed graph 100, the outputs of one layer can loop back to be the inputs of a prior layer to form what is often referred to as a recursive neural network (RNN). The high degree of flexibility afforded to a machine intelligence system by having numerous elements, along with an increase in the number of layers and complexity of their interrelationships, makes it unlikely that machine intelligence systems will decrease in complexity in the future. Therefore, the computational complexity of machine intelligence systems is likely to increase in the future rather than diminish.
All of the data involved in the training and execution of directed graph 100 needs to be stored and retrieved as the calculations mentioned in the paragraph above are executed. The weights in fully directed section 103, the filters in CNN section 102, and the intermediate results of the execution of the graph from input tensor X all the way to the output tensor Y, all need to be stored to and retrieved from memory as the algorithmic logic units and other computational hardware executes or trains the directed graph. Given that the execution of even a basic ANN can involve hundreds of billions of computations with outputs that need to be stored for the next round of computation, methods that address the space required to store all this data, and the speed at which the data can be compressed for storage and decompressed to facilitate further computation, will greatly improve the field of machine intelligence.