Without limiting the scope of the invention, its background is described in connection with optical interconnects and parallel computing. Optical interconnections are generally divided into two categories, guided wave and free-space optics. Guided wave interconnection uses optical fiber or integrated optics methods. Disadvantages of guided wave optical interconnects include fixed interconnects and a crowded backplane. The advantage of guided wave connection is the precision in reaching the destination. However, free-space optics can provide a similar advantage if properly arranged. Furthermore, free-space optics solve routing restriction by utilizing the advantage of non-interactive property of photons when crossing over.
Backplane crowdedness becomes an important issue when submicron technology allows the existence of multi-million-transistor chips and the co-existence of sophisticated functional blocks in the chips. The implementation of the communications between the chips tends to negate the advantage of the submicron technology for reasons including the following: (1) the number of I/O pins grows with the complexity of the chip; (2) the narrower the interconnection metallization the higher the resistance; (3) the closer the line is the higher the stray capacitance is., and hence the higher RC time constant will induce slower I/O rate for more functionality; (4) the multiple use of the I/O interconnects to limit their number results in the use of one or more crossbarswitches which dominate the board space as the parallelism increases; and (5) the technique of limiting the number of I/O paths between complex components and not using crossbar interconnect self-organization results in I/O blocking and performance that is dependent on the time varying demand for specific I/O paths.
The state-of-the-art microprocessor runs above 150 MHz. It is expected to achieve a clock rate of 0.5 GHz with the assistance of BiCMOS and GaAs technologies. The 25 MHz processors (i.e. TI's TMS320C40) are achieving 50 MFLOP performance, therefore, the newer technologies are expected to achieve 1 GFLOP performance. The newer technologies will require 1000 parallel processors to achieve a teraflop (TFLOP) performance; note the current technology requires more than 20000 parallel processor. In the foreseeable future, massively parallel computing systems will be required to achieve TFLOP computing capability. Therefore, this system must solve the interconnection problem for very large numbers of computing elements without diminishing the delivered performance relative to the available performance.
Considerable study has been given to the applications of fixed interconnect strategies in parallel computing architectures. These strategies result in a system with, for example, tiered-bus, two-dimensional (2D) mesh, three-dimensional (3D) mesh, multi-degree hypercube, and tiered binary crossbar architectures. In general, all of the strategies result in a system performance that is dependent on the number of independent paths provided from point A to arbitrary point B in the system. I/O contention decreases the delivered performance from the systems available capability based on the specific applications data communication requirements. Therefore, different architectures will provide better results depending on the application run on them.
A non trivial secondary attribute of these fixed interconnect strategies is the mapping of the applications onto the architecture. This mapping can have a dominant impact on the system performance. The application is the set of system functions for which the parallel computing system is needed. These functions represent the perceived system solution to some problem and that solution has some natural structure and parallelism. One must then try to optimize the mapping of this solution, which may have been very difficult to conceive of in its own right, onto the parallel computing system's architectural connectivity and parallelism. This mapping of application data flow and parallelism onto hardware interconnect structure and parallelism is a problem which is essentially unsolved to date.