The power and performance of machine-learning and deep neural network algorithms is dictated by the matrix multiply kernel. Specialized hardware typically implement multiply-adder trees to increase energy efficiency of the underlying dot-product operations that are involved in matrix multiplication. In order to support the numerous different training and inference workloads, a dot-product circuit needs to support different precisions with maximum area and energy efficiency for each precision. These can vary from 16-bit/32-bit support for some architectures while 1-bit to 8-bit on others. Support for multiple-precision dot-products, however, tend to negatively affect the area and energy efficiencies of the underlying dot product circuit.