Artificial intelligence (AI) can enable computers to perform various complicated tasks, such as those related to cognitive functions that are typically associated with humans. These functions often involve making predictions or assessments based on real-world inputs. Several approaches to AI are prevalent, including machine learning techniques. Machine learning systems, in at least some examples, may be trained using known data sets rather than employing a specific predetermined algorithm to perform a task.
One machine learning model, referred to as an artificial neural network (ANN), is inspired by the interconnections of neurons in a biological brain. Typically, ANNs include multiple computational nodes arranged in interconnected layers, with each node modeling a neuron that may receive one or more inputs, process the inputs, and pass an output to the next layer, with the final layer producing a desired output. In some examples, each node may assign a weight to each of its inputs and then combine (e.g., sum) the weighted inputs to produce a result from that node. For example, if a task involves identifying a particular object in an image, filter weights may be trained to correspond to a probability that the input image includes the object.
Some operations of ANNs may require a high amount of computing resources, which may limit computing hardware devices that may effectively utilize such ANNs (e.g., to make inferences regarding data using a particular ANN). In recent years, methods have been developed that may modify specific aspects of ANNs such that the ANNs may be utilized by computing hardware devices with fewer and/or more specific computing capabilities. For example, quantization processes may apply techniques to store numbers and/or perform calculations associated with an ANN in more compact and/or more efficient formats.
Unfortunately, conventional quantization methods may include computationally intensive and/or expensive computing operations, such as scaling of quantized integers (e.g., 32-bit quantized integers) to lower bit depth integers (e.g., 8-bit quantized integers) via conventional floating-point multiplication operations. These inefficient scaling operations may increase the cost and/or the complexity of quantization of ANNs. The instant disclosure, therefore, identifies and addresses a need for additional systems and methods for efficient scaling of quantized integers.