Efficient computation of signed vector inner products (SVIPs) in terms of computing time and power consumption is of significant value in numerous Digital Signal Processing algorithms and application, which rely heavily on vector matrix multiplications. SVIP intensive applications are, for example, Neural Net coefficient matrixes, transition matrixes of Hidden Markov Models and graph matrices of Factor Graphs. Since the datapaths of traditional DSP and microprocessor cores are ill suited for efficient SVIP computations it is highly beneficial to add a customized hardware data path for this purpose like the present vector matrix product accelerator to reduce the computational load of the data path of the DSP core or microprocessor core.
Furthermore, the interface between the present vector matrix product accelerator and the data bus of a single ported data memory system or area with limited data transfer bandwidth is important to provide efficient integration between the vector matrix product accelerator and an existing data memory bus of the DSP or microprocessor core. This interface is important to transfer a large number of input matrix elements and/or input vector elements to the vector matrix product accelerator in a minimum of bus cycles of the single ported data memory. Hence, it is important to enable efficient parallel or simultaneous operation between present vector matrix product accelerator and existing DSP or microprocessor datapaths to dramatically improve multiplication throughput of the total microprocessor circuit.