Parallel computing is the use of two or more processors (computers) in combination to solve a single problem. Parallel computing involves writing concurrent programs. In writing a concurrent program, the programmer has to figure out how to break the problem into pieces, and has to figure out how the pieces relate to each other.
There are different ways of running a concurrent program on an execution platform. The program may be executed on a uni-processor machine, for example, using a threading system, or on a parallel computer comprising plurality of processors. While concurrency is a semantic property of a program, parallelism pertains to its implementation as determined by the compiler, libraries and other systems software.
Distributed computing is a specialized form of parallel computing in which the processing nodes (computers) are physically distributed and are interconnected. These interconnections may or may not be reliable. However, the computers must cooperate in order to maintain some shared state to work on the given problem. Distributed Computing harnesses the idle processing cycles of the workstations on the network and thus makes them available for working on computationally intensive problems that would otherwise require a supercomputer or workstation/server cluster to solve.
The development of parallel programs is a tedious task and involves numerous skills other than the general programming skills on the part of the programmer. The development involves division of the problem into parallel executable fragments and synchronizing the parallel executing processes with each other in order to produce a proper result. The programmer must also handle transfer of data from one process to another. Furthermore, distributed systems used to run the parallel programs are unreliable and prone to system shutdowns and network failures. In order to make a system fault tolerant, a programmer has to encode the necessary complex instructions in the system to recover from a failure, which takes a lot of extra effort.
Debugging of a concurrent program is even more tedious than building it. In a method of debugging a concurrent program concurrent program is serialized and the programmer is provided with the tools to debug it as a sequential program [U.S. Pat. No. 5,860,009, Naoshi Uchihira, Shinichi Honiden, Toshibumi Seki, “Hypersequential Programming: A New Way to Develop Concurrent Programs”]. After debugging of the program, its concurrency is restored back and is executed as parallel processes.
There are many approaches for achieving parallelism. In one approach called as data parallelism, in order to add parallelism to a programming language, the language is extended, that is, the compiler is extended to recognize the new language constructs. While such newer extended languages provide enhanced performance they are limited by a lack of portability between operating systems. Moreover, the programmer needs to learn the new language constructs. Parallel compilers are usually based on data parallel programming model. High Performance FORTRAN (HPF) and Data Parallel C Extensions (DPCE) support data programming. In this model, distribution of data at a very high level is specified using parallel variables. This approach is also limited by the type of tasks that can be parallelized and cannot be used for general purpose parallel computation.
The other approach to design and implement a parallel program, rather than using a new extended compiler, is to use Message Passing Libraries (MPL). In this model, processes communicate by sending and receiving messages. Data transfer requires cooperative operations to be performed by each process (a send operation must have a matching receive). Programming with message passing is done by linking with and making calls to libraries which manage the data exchange between processors. MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) are standard message passing libraries providing concurrency among processes [Message Passing Interface Forum, “MPI: A message-passing interface standard”]. In these libraries, it is the programmer's responsibility to resolve data dependencies and avoid deadlocks and race conditions.
In other approach, called control parallelism or task parallelism or functional parallelism, work is divided into multiple threads. In this model different tasks are executed at the same time. It requires all subroutines to be thread-safe. OpenMP is based on this model [Leonardo Dagum, Ramesh Menon, “OpenMP: An Industry-Standard API for Shared-Memory Programming”]. OpenMP uses the fork-join approach of parallel execution with threads. Routines for locking the data are to be used by the programmer for handling synchronization. OpenMP FORTRAN implementations are not required to check for dependencies, conflicts, deadlocks, race conditions or other problems that result from incorrect program execution. TOPC (Task Oriented Parallel C\C++) is a software library built on master slave model [G. Cooperman, “TOP-C: a task-oriented parallel C interface”].
It is now well accepted that the object paradigm provides good foundations for the new challenges in concurrent and distributed computing. Object notions, rooted in the data-abstraction principle and the message-passing metaphor, are strong enough to structure and encapsulate modules of computation and flexible enough to match various granularities of software and hardware architectures. Programs structured around objects are modular, and easier to understand and modify. However, in addition to these advantages, integrating concurrency and synchronization with data abstraction offers benefits that are particular to parallel programming. As a result, many object-based concurrent, parallel, or distributed models, languages, or system architectures have been proposed like Abcl, Actel, Actor, Argus, Concurrent Smalltalk, COOL, Eiffel, Emerald, Hybrid, Nexus, Parmars, POOL-T, Presto [Jean-Pierre Briot, Rachid Guerraoui, Klaus-Peter Lohr, “Concurrency and distribution in object-oriented programming”].
Several object oriented implementations for supporting basic concurrency exist. Various encapsulations for providing an object oriented interface over the basic operating system services for process management have evolved. Synchronization has been simplified through the use of synchronized procedures associated with each object. Library provided as part of software development kit of JAVA is a perfect example. Some modern implementations have made introducing concurrency in the program much easier through active object. Active objects provide a view of object as a process. Concurrency in active objects can then be viewed as the parallel execution resulting from the creation of these active objects and their interactions with one another. Calls to active objects act like message exchange between two processes. Similar to the active objects are actor-based languages. Actors are self-executing objects, each having a unique address and a mailbox. Actors communicate by sending messages asynchronously and executing concurrently.
All the above methods of achieving concurrency involve organization of the program along interacting parallel executing processes. They involve division of either data or tasks into parallel executing fragments. Processes-have to communicate and are required to be synchronized with each other in order to get right results.
Work has also been done in the direction of conversion a sequential program to parallel executable code [U.S. Pat. Nos. 5,088,034 and 5,452,461, W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, B. Pottenger, L. Rauchwerger, and Peng Tu, “Parallel Programming with Polaris”, M. W. Hall, J. M. Anderson, S. P. Amarasinghe, B. R. Murphy, S. W. Liao, E. Bugnion, and M. S. Lam, “Maximizing multiprocessor performance with the SUIF compiler”, C. Polychronopoulos, M. Girkar, M. R. Haghighat, C.-L. Lee, B. Leung, and D. Schouten, “Parafrase-2: a new generation parallelizing compiler” and P. Banerjee, J. A. Chandy, M. Gupta, E. W. Hodges IV, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su., “The PARADIGM Compiler for Distributed-Memory Multicomputers”]. A compiler is extended to find data dependencies between parts of the program and independent parts are made to execute in parallel. Other methods of parallelization include ‘Inter-procedural analyses and ’ Symbolic analyses. A suitable parallel code is built and put in place of the sequential code. However, these compilers are faced with an inherent limitation in their capability to find parallel part in a program containing operation directly on memory addresses. Dependence or non-dependence between any two parts is sometimes known only at runtime or depends on program input; these compilers, which rely on prior division of the program, cannot resolve them. Most of these implementations are long way to come into actual practical use. Some other implementations require the programmer to select and link components in a dataflow graph to specify dependency between them [U.S. Pat. No. 5,999,729], while some others resolve the dependency by having questions and answers between a system and the user [U.S. Pat. No. 6,253,371]. Another method to bring parallelism to a sequential program is by allocating instructions to the processors depending on the memory address contained in the operand [U.S. Pat. No. 5,619,680]. However, the parallelism achieved is instruction level and its scalability is highly limited.
In one method of execution of sequential program in parallel is to serially label the steps to access or modify data variables in accordance with the intended sequence of the whole sequential program and then executing parts in parallel under a control system such that all memory accessing and modifying operations are executed only in the sequential order [U.S. Pat. No. 5,832,272]. The system however got huge overheads and requires special hardware is not applicable on programs with complex memory accesses.
In some object oriented systems, future objects are used for parallelization of a sequential program [U.S. Pat. No. 5,404,521, Neelakantan Sundaresan, “Extending the Standard Template Library for Parallelism in Coir<Futures>”, Rohit Chandra, Anoop Gupta, John L. Hennessy, “COOL: An Object-Based Language for Parallel Programming”]. In such systems, computation intensive subroutines are called asynchronously. Asynchronous call means that the called procedure executes in parallel with rest of the program. The result of the subroutine is made to be stored in a ‘future’ object after it returns. The main program continues to run with the subroutine concurrently. If the main program accesses the future object before subroutine returns, the program simply blocks to await the desired result. The mechanism does not allow access to future object holding return value of a remote procedure in the main program or passing it as an argument to another procedure. Thus by using futures, parts of a sequential program are executed in parallel, with implicit synchronization. The programmer does not handle the synchronization in the future object; it is inbuilt. Communication is also implicit in form of arguments (from main program to subroutine) and futures (from subroutine to main program).
However, many limitations remain in the usage of future objects and to the level of parallelism achievable through them.                Futures can only be used for return value. But in normal practice, arguments are also used for returning data values. Synchronization of theses values is not handled by the futures. This limits the procedures that can be executed in parallel.        Future object, if not available, blocks the main process when passed as an argument to a subroutine. This limits the level of parallelism achievable (futures are used in the subroutine, not at point of calling the subroutine). An object oriented language, Actel, does provide for passing futures as arguments to other procedures [Zair Abdelouahab and Slimane Hammoudi, “Concurrency in Object Oriented Language Actel”], but it has got its own limitations; it is confined to shared memory architectures only.        Futures do not support partial returning; a value can become available from a parallel subroutine only after all the return values have been evaluated, reducing the level of parallelism achieved.        
Also, futures are incompatible with references, especially on a distributed memory system. References play an important role in any programming language system. Nonconformity of any architecture to references severely limits its capability to be used in various complex systems. In distributed systems, support for references involves the complex task of not only the synchronization over referred data, but also of maintaining the linkage structure (how data are connected), which is subject to changes during program execution, together with providing parallelism.
In all, use of futures does not bring true parallelism to a sequential program. Futures can be used to execute in parallel simple procedures only, which take only [in] arguments ([inout] not supported, i.e. C++ pointer or reference arguments are unsupported) and in which arguments themselves do not contain references.
Another popular mechanism for distributed computing is through RPC. The semantics of RPC are identical to the semantics of the traditional procedure call, except that while a normal procedure call takes place between procedures of a single process in the same memory space, RPC takes place between a client and a server process on different systems connected through a network. Like a normal procedure call, RPC is a synchronous operation, i.e., the client process is blocked until processing by the server is complete. To gain parallelism RPC has been extended to asynchronous calls also. Futures can be employed for synchronization in a limited manner [U.S. Pat. No. 5,999,987, Murat Karaorman, John Bruno, “Introducing concurrency to a sequential language”].