The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for providing run-ahead approximated computations.
A parallel computing system is a computing system with more than one processor for parallel processing of tasks. A parallel program is a program that may consist of one or more jobs that may be separated into tasks that may be executed in parallel by a plurality of processors. Parallel programs allow the tasks to be simultaneously executed on multiple processors, with some coordination between the processors, in order to obtain results faster.
There are many different approaches to providing parallel computing systems. Examples of some types of parallel computing systems include multiprocessing systems, computer cluster systems, parallel supercomputer systems, distributed computing systems, grid computing systems, and the like. These parallel computing systems are typically distinguished from one another by the type of interconnection between the processors and memory. One of the most accepted taxonomies of parallel computing systems classifies parallel computing systems according to whether all of the processors execute the same instructions, i.e. single instruction/multiple data (SIMD), or each processor executes different instructions, i.e. multiple instruction/multiple data (MIMD).
Another way by which parallel computing systems are classified is based on their memory architectures. Shared memory parallel computing systems have multiple processors accessing all available memory as a global address space. These shared memory parallel computing systems may be further classified into uniform memory access (UMA) systems, in which access times to all parts of memory are equal, or non-uniform memory access (NUMA) systems, in which access times to all parts of memory are not equal. Yet another classification, distributed memory parallel computing systems, also provides a parallel computing system in which multiple processors are utilized, but each of the processors can only access its own local memory, i.e. no global memory address space exists across them. Still another type of parallel computing system, and the most prevalent in use today, is a combination of the above systems in which nodes of the system have some amount of shared memory for a small number of processors, but many of these nodes are connected together in a distributed memory parallel system.
In some parallel computing systems, the Message Passing Interface is used as a way of communicating and coordinating work performed by a plurality of computing or processing devices in parallel. The Message Passing Interface (MPI) is a language-independent computer communications descriptive application programming interface (API) for message passing on shared memory or distributed memory parallel computing systems. With MPI, typically a parallel application is provided as one or more jobs which are then separated into tasks which can be processed in a parallel manner on a plurality of processors of one or more computing devices. MPI provides a communication API for the processors to communicate with one another regarding the processing of these tasks.
The use of parallel computing systems to process large analytical workloads, e.g., facial recognition workloads, weather or traffic condition analysis, biological sequence analysis, Internet traffic analysis, document warehouse analytics, various data mining applications, or any other type of large analytical workload, is becoming increasingly important in today's information age. As can be appreciated, the amount of data upon which such analytics are performed is quite vast and continues to increase. Even with the speed increases made possible through parallel computing systems, the sheer size of the data that needs to be analyzed, at target cost-performance levels, makes the application of analytics to the full set of data rather impractical.