Computing device machine learning applications can compute large numbers of Vector-Vector Dot-Products (VVDP). Such large numbers of VVDP computations can incur a corresponding large number of memory accesses to store inputs, store intermediate values, calculate reductions, store reduction values, and the like. The large number of memory accesses associated with computing VVDP can incur substantial memory access delays and/or consume substantially high amounts of power transferring data between processing units and memory, which can create a high computing load on the system.