The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
It is difficult to program a computer to solve a wide array of different problems without first providing the algorithms for solving each problem type. For example, classifying large amounts of data traditionally would require initially identifying every category and identifying each algorithm that can be used to prioritize categories based on input information.
Manually programming every algorithm can be extremely cumbersome. For instance, in categorizing data, a rule would need to be programmed into the computer for each type of priority. Additionally, manually programming solutions to complex problems is only useful if the problem has been seen before. The manually programmed algorithms are unable to categorize information that is not described in any of the algorithms.
Neural networks provide an elegant solution to the difficulties of solving complex problems. Neural networks generally contain set algorithms and a plurality of weight values. The weight values are multiplied by the algorithms and any input data to generate a result. As a neural network is trained, the weights are adjusted so that the computations performed with training data yield the correct results. Neural networks have found utility in various kinds of data analysis systems, voice recognition, document recognition, image recognition, recommendations systems and other applications of artificial intelligence.
A major benefit of neural networks is that they learn inherent patterns in different datasets. Thus, the programmer does not have to know the correct algorithm prior to programming the neural network for the neural network to provide an accurate result. Additionally, the fact that the neural network picks up patterns in existing data allows the neural network to provide solutions for problems that had not been considered prior to programming of the neural network.
While neural networks provide a large benefit in solving complex problems, they do so at a high computational cost. Both the training of the neural network and the processing of data through a neural network requires a large amount of power and memory primarily due to multiplication of large matrices. Each time a training dataset or an input dataset is processed, a large weight matrix must be retrieved from memory and multiplied by other large matrices.
Due to the high computation and storage costs of neural network use, most neural networks are stored on and utilized by server computers. The storage, retrieval, transmission, and multiplication of large matrices tend to require better graphics cards and memory than are generally available on low power devices, such as smartphones and tablet computers. If a user of a low power device wishes to utilize a neural network, the user generally must send the input data to a server computer and wait for the server computer to produce an output. In some cases, this interchange occurs using background messaging processes over networks so that the user of a client computer or mobile computing device may be unaware that transfers to the server are occurring.
However, the restriction of neural networks to server computers greatly decreases their usefulness. Neural networks on low power devices can be extremely useful for image recognition, speech-to-text, providing recommendations to users and other applications that are not yet conceived or developed. Yet if the load of processing the input data with the neural network is too high, then a client computing device must be capable of interfacing with a server computer for use of the neural network and must be dependent on the server computer to provide the requested outputs. Ample processing power, and/or an active network connection to a server, typically are required.
Thus, there is a need for a technique that reduces the memory usage of storing a neural network on a client computing device and reduces the computational cost of processing data using the neural network. Reducing these costs would benefit the client computing device by allowing the client computing device to run the neural network without being dependent on a server computer. There is an acute need for techniques that offer improved computation efficiency to make executing neural networks, and related applications, using mobile computing devices a reality.