Computer architecture generally refers to a system designer's and programmer's view of a computer, which includes parameters such as memory, instruction sets, programmable registers, interfacing signals, and other aspects relating to the internal operation of computers. The processing power driving today's computers includes large ASICs designed for mainframe computers, as well as microprocessor and microcontroller devices housed in desktop PCs.
Technological computer architecture advances have typically evolved from the recognition of computing shortcomings facing the technology of the day. Where a new architecture may have solved a problem, it often created a new one. For example, memory caching, instruction pipelining, and reduced instruction set computers have all emerged to relieve a computing bottleneck of one form or another. Advances in other technologies, such as networking and telecommunications, have also inspired changes in computer architectures, while new design, fabrication and manufacturing techniques have permitted architectural improvements. At times, computer architecture progress forges straight ahead, yet at times is diverted off course. Some of the problems facing even the most current technologies stem from the commercial need to provide comprehensive and complex computing systems capable of operating over a broad range of applications. However, this reality can have a detrimental effect on processing performance for more specific applications.
For example, the Complex Instruction Set Computer (CISC) for years dominated the architectural race. CISC architectures were driven by the prevailing view that a large instruction set was desirable. The rationale behind this view was that by adding new, more specialized instructions, program execution would be accelerated due to a reduced number of instruction fetches. While this was true, other factors were adversely affecting program execution performance, including the inherent complexity associated with CISC processors that reduces the ability to speed up the Central Processing Units (CPU). Furthermore, many programs executed by these CPUs are produced by compilation which imposes a certain pattern on the utilization of the instruction set. Other factors also contributed to the realization that a better way of accomplishing greater processing speeds was needed.
Computer architecture then took a turn in an attempt to increase computing speed and performance, and Reduced Instruction Set Computers (RISC) were born. RISC processors are equipped with a restricted number of instructions and addressing modes, and the spared CPU logic is used for additional internal registers. While RISC processing certainly helped processing speeds, the technical limitations of memory was holding the technology down, as memory could not maintain the supply of data and instructions. Further, as the speed increased, it became more difficult to supply a fill 32-bit word from memory in a single cycle, since RISC processors require more instructions to perform the same job that what was required by a CISC processor. Additionally, the fixed instruction format of RISC processors resulted in RISC code using more memory. These problems were in part addressed by the high speed cache, and in some designs multiple caches, such as an instruction cache and a data cache. Again, these solutions raised new issues, such as cache coherency issues.
However, there are applications that are so data-intensive that use of a CISC, or even a RISC for that matter, is extremely inefficient. For applications where very large volumes of data must be processed quickly, these general architectures simply have too much overhead. The use of programs and program memory, program counters, memory fetching, address decode, bus multiplexing, branch logic, and the like are advantageous in some applications, but inherently result in undesirable overhead for certain other computing needs. Consider, for example, a recent seismic processing task in the oil industry. The task involved taking 30 gigabytes of input data, subjecting it to over 240 teraoperations on a supercomputer, and producing approximately 194 megabytes of output data. This task took approximately 2 months of CPU time on a state-of-the-art 24-processor machine. The current invention would cut this time to approximately 12 days if implemented using FPGA technology, and to approximately 4 days using ASIC technology.
Furthermore, in order to obtain even these lengthy processing turn-around times, it requires state-of-the-art computing power operating at the highest available clock rates, which means high equipment costs. The present invention, on the other hand, can be used with lower-performance host computers and still provide a substantial overall increase in processing speed. The host computers used in connection with the present invention can be "commodity" components, resulting in lower host computer costs.
Therefore, it would be desirable to provide a processing architecture having cutting edge computing speeds for use in data-intensive applications. Accordingly, the present invention provides a computer architecture capable of sustaining peak performance by exploiting the parallelism in algorithms and eliminating the latencies involved in sequential machines. The present invention provides a solution to the aforementioned and other shortcomings of the prior art, and offers additional advantages and benefits over existing computer architecture technologies.