The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
With the growth of artificial intelligence, machine learning technologies have found their way into wide variety of applications. Training a machine learning model is generally very resource intensive and thus, usually requires dedicated computer systems to perform. However, with the expansion in applications of machine learning, there is a growing need for training of machine learning models to be performed in a shared computing resource environment without sacrificing accuracy.
To improve performance, reduced-precision numerical representations may be used in training machine learning models. For example, the weights in neural networks may have reduced-precision format, and thus require less computational resources for processing. However, some operations may still (albeit temporarily) produce wider-precision numerical representations.
One way to reduce wider-precision numerical representations back to reduced-precision ones, is to simply truncate the wider-precision numerical representations. Truncation of extra bits is trivial to implement (and usually the default), but can lead to training errors/lower accuracy by systematically biasing values (such as weights) in one direction.
To utilize reduced-precision numerical representations without sacrificing accuracy, stochastic rounding is performed instead of trivial truncation. The stochastic rounding of wider-precision numerical representations avoids introducing a bias and therefore, improves the accuracy of the resultant machine learning models. For example, stochastic rounding on a wider-precision decimal rounds the value up or down with a probability proportional to the least-significant decimals that are to be dropped from the wider-precision decimal. Accordingly, the value of 37.25 would be rounded up to 38 with a 25% probability, and rounded down to 37 with a 75% probability.
One approach for implementing stochastic rounding is by executing multiple instructions that yield the result of the rounding. The software program may contain the appropriate command(s) for stochastic rounding, which during the compilation of such a program, would yield multiple instructions to be executed by a hardware processor. The multiple instructions incur high overhead when processed: multiple processor cycles, potential multiple memory lookups, and pipeline stalls, among others.