Distributed cluster computing frameworks are popular to cope with ever-increasing Big Data in the modern computing era. Hadoop and Spark, for example, are quickly growing, and many internet-service companies such as Google, Facebook, Amazon, and the like are considering these cluster computing platforms as their platforms of choice to solve their many Machine Learning problems.
In addition, new startups, such as Palantir, provide such platforms and analytical applications as services. The keys to succeed in this business are competitive response times in providing services with high energy efficiency, because energy costs for data centers are substantial. As such, eliminating wasteful processes in computing is crucial.
Modern Big Data machine learning algorithms heavily rely on fast iterative methods. Conceptually, fast iterative methods provide not only a simple and fast converging framework, but also appeal to a data-centric philosophy. “Data-centric” means (generally) that analyzing more data with dumber algorithms is better than analyzing less data with stronger algorithms. Such an approach is well aligned to Big-Data analytics. That is, with ever-increasing data, it is important to cope with such large scale data in a reasonable amount time by sacrificing some degree of accuracy. In this context, fast iterative methods have become popular.
Such fast iterative methods come with two commonalities. First, as the number of iterations of the algorithm increases, a solution matures. For example, Coordinate Descent (CD) repeats search steps with finer granularity in its delta values in directions and search steps. Second, parallel solvers are often useful. Because of random and divergent nature of fast iterative methods, there are many variants that adopt parallel search techniques to enhance convergence speed and prune sub-optimal or divergent cases. These common characteristics may result in huge performance losses and energy wasted because the hardware or system frameworks work towards the best accuracy not knowing the end-use requirements. For example, commonly 64-bit Arithmetic Logic Units (ALUs) and/or 128-bit ALUs are used throughout the entire application to find a solution even for some processes that only require 32-bit or lower bit ALUs. Software solutions, such as GNU Multiple Precision (GMP) Arithmetic library and the NYU Core Library, may improve the precision beyond 128 bits, limited only by available memory. But using greater precision than is needed by the iteration of the algorithm is wasteful of resources and slow to process.
A need remains for a way to improve the performance of fast, inexact solution methods.