In a typical neural network, a standard computation is a dot product between input values (activations) and weight values. A typical way for an integrated circuit to compute these weight values is to use a multiply-accumulate (MAC) circuit that repeatedly performs the multiplication of an input value by a weight value, adds that to the existing partial dot product, and stores the new partial dot product. However, this requires numerous clock cycles, as each term in the dot product uses a separate cycle. Accordingly, techniques for parallelization without massively expanding the surface area of the circuit are required.