Parallel computing uses multiple processing elements simultaneously to solve a problem. The typical types of parallelism include bit level parallelism to instruction level and on to task level. These are accomplished by breaking the problem into independent parts so that each processing element can execute its part of the algorithm simultaneously with the others. The processing elements can be diverse and include resources such as a single computer with multiple processors, several networked computers, specialized hardware, or any combination of the above.
From the above-mentioned processing elements multicore-processing elements, which exist on the same chip can issue multiple instructions per cycle from multiple instruction streams. Of the multicore chips available today, field-programmable gate arrays (“FPGAs”) can be used to create hundreds of cores on them by which can be used for multicore parallel computing. However, programming in these languages can be tedious.
Several vendors have created “C to HDL” (i.e., C programming language to hardware description language) tools that attempt to emulate the syntax and semantics of the C programming language, with which most programmers are familiar. The best-known C to HDL tools are Mitrion-C, Impulse C, DIME-C, and Handel-C. Specific subsets of SystemC based on C++ language can also be used for this purpose. But they all cannot use the cores optimally and cannot be programmed for effective performance. So, largely FPGAs today can be used as co-processors to a general purpose computer solving a portion of the large computations such as matrix multiplications, N-body problems etc., but never to be used as general purpose computer to run full-blown applications.
In the recent days many used programming FPGAs using systolic arrays for data-flow computing to solve small compute intensive sub tasks as mentioned above, but still using Verilog or VHDL which is again very tedious thereby cannot be used for general purpose programming. Though systolic array computing provides extremely fast computing on multicore with scalable architecture and can turn many exponential problems into linear or polynomial, they are very difficult to implement and build.