Computer power has increased over 70% per year for the last 50 years (or over 11 orders of magnitude), thus making it difficult to measure and compare performance of computers having vastly different performance capabilities with a benchmarking system that does not scale. Furthermore, since a given make of parallel processor may offer a performance range of over 8000 to 1, the scaling problem exists even if applied to computers of current vintage. Any benchmark of fixed size is soon obsoleted by hardware advances that render the time and space requirements of the benchmark unrepresentative of realistic use of the equipment. A common workaround consists of performing a fixedsize task repetitively, but this has proven to be less than satisfactory.
A related issue is the difficulty of scientifically comparing computers with vastly different architectures or programming environments. A benchmark designed for one architecture or programming model puts a different architecture at a disadvantage, even when nominal performance is otherwise similar. Assumptions such as arithmetic precision, memory topology, and "legal" language constructs are typically wedded to the job to be timed, in the interest of controlling as many variables as possible. This "ethnocentrism" in benchmark design has hampered comparison of novel parallel computers with traditional serial computers. Examples of popular benchmarks that have some or all of the foregoing drawbacks are LINPACK, the "PERFECT.TM. Club" , the Livermore Loops, SPEC, Whetstones, and Dhrystones.