Application-Specific Processors (ASPs) have disappeared since the advent of the Very Large Scale Integration (VLSI) of integrated circuits (IC). VLSI has provided the basis for a general-purpose processor (the microprocessor) consisting of fixed circuits controlled by software programs to execute various tasks. The microprocessor takes advantage of the ability to integrate large fixed circuits and allow flexibility of task execution through software programs. These devices can be mass-produced at low cost. This makes it difficult to build ASPs that can stay ahead of the performance of microprocessors. Traditionally, it has been much easier to get performance by using the next generation of microprocessor and porting software to newer systems than it is to build ASPs.
To achieve higher performance systems using microprocessors it is necessary to connect them together to achieve greater computational parallelism. This requires a communication mechanism built upon a physical hardware connection scheme and software protocols built on top of the hardware. There are two general approaches to building these multiprocessor systems.
The most inexpensive approach is to connect a large number of commodity microprocessor-based computing systems, where the hardware level of communication uses a commodity protocol, such as Ethernet and the software is built upon a commodity protocol stack, such as TCP/IP.
This is a low-cost solution, but it suffers from the bandwidth and latency limitations of the hardware layer and the overhead of the protocol software.
The more expensive approach relies on more customized hardware. The hardware for communication is either based on circuits built outside of the microprocessor chip, which requires much more complexity in terms of the system design, or the communications hardware is implemented as part of the microprocessor chip. In this latter case, the chip is not likely to be a commodity part, and it is therefore much more expensive to develop. This approach can reduce the bandwidth and latency issues, but it will still incur the overhead of the software protocol layer, though it may be less than what exists in a commodity protocol stack.
With the development of programmable logic, such as Field-Programmable Gate Arrays (FPGAs), and Hardware Description Languages (HDLs), it is possible to reconsider the development of ASPs. Customized computational circuits can be described using an HDL and implemented in FPGAs by compilation (known as synthesis) of the HDL. As the VLSI technology improves, the circuits can be ported to the latest generation of FPGAs in a similar manner to porting software to an improved microprocessor.
Most complex computational problems require more than one processor to solve in a timely manner. A divide and conquer strategy is known in the art as parallel computing where complex problems are reduced into manageable smaller pieces of approximately the same size to be solved by an array of processors.
Massively parallel computer systems rely on connections to external devices for their input and output. Having each processor, or set of processors, connected to an external I/O device also necessitates having a multitude of connections between the processor array and the external devices, thus greatly increasing the overall size, cost and complexity of the system. Furthermore, output from multiple processors to a single output device, such as an optical display, is gathered together and funneled through a single data path to reach that device. This creates an output bottleneck that limits the usefulness of such systems for display-intensive tasks.
The trend in computing system design is to attempt to provide for the greatest degree of parallelism possible. Known designs use parallel connections between processors to provide fast data exchange. It will be appreciated that processor pin count and limited circuit board space are significant design limitations.
Despite advances in process technology and VLSI circuits, general-purpose processors are limited by chip size, consequently on-chip memory size, data latency, and data bandwidth. Furthermore, general-purpose processors are not as versatile as configurable logic in optimizations of specific tasks. There continues to be a need for an interconnect architecture of configurable logic to improve data latency and bandwidth. There also exists a need to apply architectural improvements to create a system that is scalable, low complexity, high density and massively parallel. It may also be advantageous to commercially provide for such systems using commodity parts to significantly reduce the risk of development and keeping pace with improvements in technology.