QR decomposition (also called a QR factorization) of a matrix is a decomposition of the matrix into an orthogonal matrix Q and a right triangular matrix R. QR decomposition may be used, for example, to solve the linear least squares problem. QR decomposition also is the basis for a particular eigenvalue algorithm called the QR algorithm
One known technique for performing QR decomposition is the modified Gram-Schmidt technique, which calculates the Q matrix as follows (where A is the input matrix, having columns ak and elements ajk):
for k=0:n−1                r(k,k)=norm(A(1:m, k));        for j=k+1:n−1                    r(k, j)=dot(A(1:m, k), A(1:m, j))/r(k,k);                        end        q(1:m, k)=A(1:m, k)/r(k,k);        for j=k+1:n−1                    A(1:m, j)=A(1:m, j)−r(k, j)·q(1:m, k);                        end        
end
As can be seen, there are two data dependencies. First, neither the r(k, j) nor the q(1:m, k) terms can be computed until r(k,k) has been computed. And while r(k,k) is nominally computed first, floating point functions may have long latencies. Second, the A(1:m, j) terms cannot be computed until the r(k, j) and q(1:m, k) terms have been computed. These dependencies may introduce stalls in the data flow.
Such data dependencies can cause delays when the computation is performed in hardware, and also may be of concern in a software implementation in a multicore processor environment, or even in a single core processor environment if the processor is deeply pipelined and the pipeline is optimized for more functions more common than division.
Copending, commonly-assigned U.S. patent application Ser. No. 12/703,146, filed Feb. 9, 2010, describes a modified Gram-Schmidt orthogonalization with no dependencies between iterations, but one internal dependency remains.