Since the 1950's until 2012, the world has enjoyed continuous improvement in high performance numerical computing. In the 1990's, it became common to use Linpack, an implementation of Block LU Decomposition with partial pivoting, as a benchmark for supercomputer performance. LU decomposition is a simple algorithm, which achieves a significant computational result. Block LU Decomposition is an extension of LU Decomposition that fit naturally into the parallel processor computers deployed in that time. Partial pivoting is an extension to Block LU Decomposition that insures numerical stability under some straightforward conditions. From here on, Block LU Decomposition will be assumed to incorporate partial pivoting unless otherwise stated.
Performance advances of the world's super computers began to slow starting around 2010 based on the top 500 list, eventually stalling about 2012, and remaining flat since 2013. While computations within an integrated circuit continue to improve, communication across these very large systems is drastically limiting the effect of the on-chip performance improvement and the ability to achieve exascale performance. An exascale computer is required to run a version of Linpack (Block LU Decomposition) for at least 8 hours at an average of an exaflop (a billion billion Floating Point operations per second).