Today, implementations of artificial intelligence are driving innovation in many fields of technology. Artificial intelligence systems and artificial intelligence algorithms include many models that enable learning (deep learning), reasoning, and data processing capabilities of a machine (e.g., a computer). These AI systems and models are often trained intensively to perform one or more specific tasks, such as natural language processing, image recognition, planning, decision-making, and the like. Neural network training, for example, in many cases may take thousands of hours across the training cycle and many terabytes of training data to fine tune an associated algorithm before use.
However, once trained, a neural network model or algorithm may be deployed quickly to make inferences based on relatively smaller datasets than training datasets to accomplish specific tasks (e.g., recognizing speech from speech input data, etc.). The inferences made by the neural network model or algorithm based on the dataset may be a prediction about what the neural network calculates to be a correct answer or indication.
Still, while neural network models or algorithms may not require a same amount of compute resources as required in a training phase, deploying a neural network model or algorithm in the field continues to require significant energy and compute power to classify data and infer or predict a result. This is because many of the traditional computers and systems that implement neural network models or algorithms tend to be larger to accommodate a great amount of circuitry needed for computing power and increased data processing speeds when implementing the neural network model and due to the large size of the circuitry, more energy is required to enable the compute power of the many circuits.
These traditional computers and systems for implementing artificial intelligence models and, namely, neural network models may be suitable for remote computing, such as in distributed computing systems (e.g., the cloud), or when using many onsite computing servers and the like. However, latency problems are manifest when these remote artificial intelligence processing systems are used in computing inferences and the like for remote edge computing or in field devices. That is, when these traditional remote systems seek to implement a neural network model for generating inferences to be used in remote field devices, there are unavoidable delays in receiving input data from the remote field devices because the input data must often be transmitted over a network with varying bandwidth and subsequently, inferences generated by the remote computing system must be transmitted back via a same or similar network.
Implementing AI processing systems at the field level may be a proposed solution to resolve some of the latency issues. However, attempts to implement some of these traditional computers and systems at an edge device (or in field of use device) may result in a bulky system with many circuits, as mentioned above, that consumes significant amounts of energy due to the architecture of the computing system used in generating inferences. Thus, such a proposal may not be feasible and/or sustainable.
Accordingly, there is a need for a deployable system for implementing artificial intelligence models in the field, and preferably to be used in edge devices, that do not result in large, bulky (edge) devices and that have necessary compute power to make predictions or inferences while also being energy efficient.
The below-described embodiments of the present application provide such advanced and improved integrated circuits and implementation techniques capable of addressing the deficiencies of traditional systems.