Today, the various implementations of artificial intelligence are driving innovation in many fields of technology. Artificial intelligence (AI) systems and artificial intelligence models (including algorithms) are defined by many system architectures and models that enable machine learning (deep learning), reasoning, inferential capacities, and large data processing capabilities of a machine (e.g., a computer and/or a computing server). These AI systems and models are often trained intensively to perform one or more specific tasks, such as natural language processing, image recognition, planning, decision-making, and the like. For example, a subset of these AI systems and models include artificial neural network models. The training of an artificial neural network model may, in many cases, require thousands of hours across the training cycle and many terabytes of training data to fine tune associated neural network algorithm(s) of the model before use.
However, once trained, a neural network model or algorithm may be deployed quickly to make inferences to accomplish specific tasks (e.g., recognizing speech from speech input data, etc.) based on relatively smaller datasets when compared to the larger training datasets used during the training cycle. The inferences made by the neural network model or algorithm based on the smaller datasets may be a prediction about what the neural network calculates to be a correct answer or indication about a circumstance.
Still, while neural network models or algorithms may not require a same amount of compute resources, as required in a training phase, deploying a neural network model or algorithm in the field continues to require significant circuitry area, energy, and compute power to classify data and infer or predict a result. This is because many of the traditional computers and systems that implement neural network models or algorithms tend to be larger to accommodate a great amount of circuitry needed for computing power and increased data processing speeds when implementing the neural network model and due to the large size of the circuitry, more energy is required to enable the compute power of the many circuits.
These traditional computers and systems for implementing artificial intelligence models and, namely, neural network models may be suitable for remote computing, such as in distributed computing systems (e.g., the cloud), or when using many onsite computing servers and the like. However, latency problems are manifest when these remote artificial intelligence processing systems are used in computing inferences and the like for remote, edge computing devices or in field devices. That is, when these traditional remote systems seek to implement a neural network model for generating inferences to be used in remote field devices, there are unavoidable delays in receiving input data from the remote field devices because the input data must often be transmitted over a network with varying bandwidth and subsequently, inferences generated by the remote computing system must be transmitted back to the remote field devices via a same or similar network.
Implementing AI processing systems at the field level (e.g., locally at the remote field device) may be a proposed solution to resolve some of the latency issues. However, attempts to implement some of these traditional AI computers and systems at an edge device (e.g. remote field device) may result in a bulky system with many circuits, as mentioned above, that consumes significant amounts of energy due to the required complex architecture of the computing system used in processing data and generating inferences. Thus, such a proposal without more may not be feasible and/or sustainable with current technology.
Accordingly, there is a need for a deployable system for implementing artificial intelligence models locally in the field (e.g., local AI), and preferably to be used in edge devices, that do not result in large, bulky (edge) devices, that reduces latency, and that have necessary compute power to make predictions or inferences, in real-time or substantially real-time, while also being energy efficient.
The below-described embodiments of the present application provide such advanced and improved integrated circuits and implementation techniques capable of addressing the deficiencies of traditional systems and integrated circuit architectures for implementing AI and machine learning.