Modern computer applications often involve gathering and processing large amounts of information (e.g., databases with hundreds of thousands of rows or more). Applications that process high volumes of data (sometimes referred to as “big data” applications) are very computationally intensive. Many big data applications employ common data analytics tools to perform analytical operations such as data clustering, classification, de-noising, etc. Big data applications often require large amounts of hardware resources and power to support the computationally intensive analytical operations. It would be useful to simplify these analytical operations and make them more computationally efficient in order to conserve computational resources and power. Further, in certain real-time applications (e.g., image/facial recognition, service logistics, consumer product recommendation on large scale e-commerce platforms, etc.), there are rather stringent speed requirements. Currently, these applications often are performed by large servers due to the processing requirements. It would be useful to simplify the processing operations to allow them to execute on devices with less powerful processors and fewer memories (e.g., on personal computers, mobile phones, etc.).
Many analytics tools perform singular value decomposition (SVD) as a part of the data processing operation. For example, in some search applications, keywords in documents are sometimes first arranged in a large matrix, with rows corresponding to keywords, columns corresponding to documents to be searched, and entries corresponding to the strength of the keyword-document association. To perform a search of a set of key words, SVD is performed on the matrix, the most significant singular values and their associated singular vectors are extracted and used to create a simplified matrix with much less clutter, then the elements of “search hints” that are most relevant to the intent of the search are narrowed down. For large scale searches (e.g., Internet searches), the numbers of keywords and documents are huge and ever increasing, which means the size of the matrix to be computed is also huge and ever increasing. A more efficient technique to perform SVD would result in faster response time as well as savings in computing and networking resources and power.
Currently, SVD is typically computed using Cholesky decomposition. For large matrices, a large number of computations are required, and thus a more efficient way to compute SVD is desirable. Further, the typical implementation of Cholesky decomposition computes the singular values of a matrix in a sequential manner. In other words, there are usually multiple stages of computation, where each stage depends on the value of the previous stage, therefore the computation is typically carried out on one processor at a time without the benefit of parallel processing. Thus, it would also be desirable if the process can be implemented in a way that takes advantage of modern multi-processor architecture and be able to execute in a pipelined, parallelized fashion.