Increasingly, there is a demand for solutions to complex linear systems. For example, solving linear system equations may be used to estimate a camera location and angle (e.g., camera pose estimation) or provide vehicular simultaneous localization and mapping (SLAM) calculations, which may be used in augmented reality (AR) or virtual reality (VR) applications. Many applications solve the linear system equations by representing the linear equations as matrices, then solving for a solution matrix (e.g., matrix-solve).
The matrix-solve operations are often performed on a software kernel running on a generic processor, such as a central processing unit (CPU). Matrix-solve operations are computationally intensive. For example, for a matrix K with M rows and N columns (i.e., size M×N), the matrix-solve operations are of complexity O(M2N). The matrix-solve operations also require substantial memory bandwidth, resulting in a substantial time delay in computing the solution (e.g., large latency). The large latency may significantly affect the performance of various applications, such as slowing camera pose estimation or SLAM calculations. In an embodiment, a matrix-solve operation executed on an ARM (A9 cortex) CPU running at 1.2 GHz using a software kernel optimized for ARM architecture took 1.55 ms to solve for a 128×100 output matrix. In addition to the high latency and large memory bandwidth requirements, matrix-solve operations also require substantial energy to execute the large number of memory accesses. The high latency and high energy consumption may substantially reduce the performance of time-dependent applications, such as AR or VR applications.