Parallel computing is a form of computation whereby multiple operations are carried out simultaneously by different processing units of a parallel computer. A parallel programming model is a model for writing parallel computer programs to be compiled and executed on a parallel computer. A parallel programming model must specify more than a sequential model as, in addition to specifying how tasks to be carried out by an individual processor are to be defined, a parallel programming model must also specify:                Available parallelism i.e. the decomposition of a program into tasks which may be executed simultaneously, from time-to-time;        Communication (paths and types) between simultaneous tasks;        Synchronization required to preserve program meaning, i.e. causality.        Collective termination criteria, e.g. whether the program is to terminate by consensus or majority.        
A parallel program may also specify a schedule (partial execution order), in addition to the essential synchronization information, to optimize performance.
A simple and popular software parallel programming model is the Bulk Synchronous Parallel (BSP) model, first described in “A bridging model for parallel computation”, Leslie G. Valiant, Communications of the ACM, Volume 33 Issue 8, August 1990. FIG. 1 show a high level representation of the principles of BSP. Software conformant to the BSP model guarantees deadlock freedom and makes precedence explicit. In the BSP model as originally described by Valiant, computation proceeds in a number of “supersteps” 102. The supersteps 102 are separated by barrier synchronization. During each superstep 102, tasks 104 are independent (i.e. can execute in parallel); a barrier 106 is crossed to commence the next superstep when and only when all tasks have completed execution in that superstep. Tasks can post messages, represented by the arrows in FIG. 1, to themselves or to other tasks at any time during each superstep (though FIG. 1 does not reflect this explicitly).
However, those messages are not visible to receivers until the start of the next superstep. In each superstep, each task may operate on output data which was generated by that task itself in the previous superstep and/or on output data which was generated by other task(s) in the previous superstep.
Typically, a (possibly large) number of tasks will execute on each processor in each superstep. That is, typically there are (possibly many) more tasks than there are physical processors. During each superstep, each processor may perform computation on data in its local memory or registers, which may include data received as messages from other processors in the previous superstep (i.e. output data that was generated by tasks running on different processors in the previous superstep) and/or output data computed by that processor itself in the previous superstep (i.e. output data that was generated by tasks running on that same processor in the previous superstep).
According to this BSP model, there is a single synchronization of all processors once per superstep, and a superstep comprises both computation and the exchange of messages.
Parallel computing has useful applications in the context of machine learning. To date, efforts have focused on implementing machine learning algorithms with BSP in distributed, cloud-based computer systems.