The present invention generally relates to computer architectures. More particularly, the present invention relates to a parallel processing computer architecture using multiple field programmable gate arrays (FPGA) for a commercial off-the-shelf (COTS) hybrid-computing framework.
High performance computer systems having flexibility for providing user configuration are attracting wide spread interest, and in particular, in the defense and intelligence communities. Increasing silicon density in field programmable gate arrays (FPGAs) is attracting many users to build parallel processing architectures such as single instruction-multiple data (SIMD) architectures using coarse-grained processing arrays in FPGAs. Signal and image processing applications are well fit to parallel data structures handled by multiple data architectures. Even though digital signal processors (DSPs) are maturing to use more SIMD or very long instruction word (VLIW) architecture elements within a processor, still there is a compelling argument against using DSPs for high performance computer systems due to their inflexibility and compiler generated overhead. So, more and more solution developers are turning towards FPGA based high performance systems.
A major problem faced by these solution developers is to accelerate compute intensive functions in these high-data processing applications—such as wavelet transformation, high performance simulation, and cryptography—by executing the functions in hardware. Many compute intensive functions have regular data structures that are highly amenable to data parallelism and work well with traditional SIMD parallel processing techniques. With growing silicon component density in FPGAs, it is becoming more desirable to implement SIMD using FPGAs.
Another important problem faced by solution developers is the ability to make the solution independent of any particular commercial programmable hardware board vendor. Input/output (I/O) is still a bottleneck to achieving high overall system throughput performance. Fast data transfer is required and most importantly the interoperability of systems across different I/O standards is required. Currently, there are various I/O and switch fabric standards in place—such as PCI, PCI-X, PCI-Express, Infiniband, and RapidIO, for example—and new standards may emerge in the future. In essence, what is needed is a means to map from the commercial standard I/O buses—such as those noted—to a single, universal bus and to build application glue to a single, universal memory port. With rapid requirements changes and technology development, adaptability of a solution is required to protect investment in the solution. As systems have to be interoperable capable with other systems in the future, a solution is needed for connecting heterogeneous high performance computing systems and smart sensors. A further consideration is that a solution can adapt itself to address critical needs of defense applications running on next generation embedded distributed systems.
As can be seen, there is a need for a solution to the technical problem of improving high performance for very computation-intensive, high data stream applications over conventional high performance servers or host machines. There is also a need for a solution to provide support as a “super hardware accelerator” for servers and other host machines.