FIG. 1A illustrates a conventional processor's compiler. Conventional processors, such as Intel micro-processors and ARM micro-processors are well known. For example, a conceptual illustration of a conventional processor is shown in FIG. 1B. These processors are the heart of central processing units for modern computers and devices and are used to process algorithms. A problem with conventional processors is that these types of processors are general purpose and are not reconfigurable in any practical way that allows their performance to be enhanced for specific applications. Another problem is that the program execution control adds substantial overhead to processing of algorithmic functions, such as mathematical operations and logical decisions that modify the flow of processing. A higher level programming language may be used to program the conventional processor, and the compiler converts the instructions in the higher level programming language into machine code for the particular processor architecture. This machine code is provided to a memory location accessible by the processor and provides instructions for operation of the processor hardware, together with any BIOS or other calls provided by the system architecture. In most cases, mathematics and logical processing directions are directed to an arithmetic logic unit (ALU), which returns a solution to a program execution control portion of the processor, which manages overhead, such as guiding the processor through the correct order of solving mathematical algorithms, logic decisions, handling of data and the like. Machine code instructions are continuously fetched from program storage in order to control the processing of data. This overhead significantly limits machine performance.
For example, the following illustrates steps of a conventional compiler compiling a mathematical operation in a “C” programming language, which is an example of a higher level programming language that may be compiled to create machine code for a particular conventional processor. A simple mathematical operation assigns “var i1;” “var i2;” and “var s;” to define a data storage location for variable i1, i2 and result s. Then, an instruction “s=i1+i2;” may be used to sum the variables assigned in data locations i1 and i2. The compiler (a) first assigns storage locations for data (e.g. i1, i2 and s) and (b) generates source code into machine code. A conventional processor would retrieve all or a portion of the machine code from a memory location in which the code is stored. Then, it would execute the machine code. For this example, the central processing unit (CPU) would load i1 data in a memory location and send it to the ALU, load i2 data in a memory location and send it to the ALU, and instruct the ALU to add the data located in i1 and i2. Only then would the ALU perform an addition of the values located in the data locations for i1 and i2. This is the useful work step, with the setup by the CPU being overhead. Then, the CPU could get the ALU result from the data location for “s” and could send it to the input and output controller. This is a necessary step to present the result, if the result is not an intermediate step in the calculation. Conventional processors evolved out of a desire to save time in the development of computer programs, allowing higher level programming languages to be compiled for various architectures of central processing units and peripheral facilities. Also, all processes executed by the CPU can share a common ALU, by time sharing the ALU among various programs operating in the system environment.
Application specific integrated circuits (ASICs) are known that build into hardware electronic circuits capable of rapidly performing calculations for specific functions. These reduce overhead by hard wiring specific functions into the hardware.
Some field programmable gate arrays (FPGAs) are known that have a large number of logic gates and random access memory (RAM) blocks. These FGPAs are used to implement complex digital computations. Such FPGA designs may employ very fast input/output and bidirectional data buses, but it is difficult to verify correct timing of valid data within setup time and hold time. Floor planning enables resource allocations within FPGAs to meet these time constraints. FPGAs can be used to implement any logical function that an ASIC chip could perform. An ability to update the functionality after shipping, partial re-configuration of a portion of the design and a low non-recurring engineering cost relative to an ASIC design offers an advantage for some applications, even when the generally higher unit cost is taken into consideration.
However, the penetration of FPGA architectures has been limited to narrow niche products. An FPGA virtual computer for executing a sequence of program instructions by successively reconfiguring a group of FPGA in response to those instructions was patented in U.S. Pat. No. 5,684,980. FIG. 2 illustrates a structure for this FGPA architecture. This issued patent includes an array of FPGAs that changes configurations successively during performance of successive algorithms or instructions. The configuring of array of FPGAs allows an entire algorithm or set of instructions to be performed without waiting for each instruction to be downloaded in performing each computational step.
The developments in FGPAs and integration with processors gives the promise of the ability to be reprogrammed at “run time”, but in reality, reconfigurable computing or reconfigurable systems to suit the task at hand are far from being implemented in practical applications due to the difficulties in programming and configuring these architectures for this purpose.
FIG. 2 illustrates a block diagram of a virtual computer including an array of field programmable gate arrays and field programmable interconnection devices (FPIN) or cross-bar switches that relieve internal resources of the field programmable gate arrays from any external connection tasks, as disclosed in U.S. Pat. No. 5,684,980, the disclosure and drawings of which are hereby incorporated herein in their entirety for the purpose of disclosing the knowledge of a skilled artisan, familiar with FPGAs.
FIG. 2 illustrates an array of field programmable gate arrays and field programmable interconnection devices that are arranged and employed as a co-processor to enhance the performance of a host computer or within a virtual computer processor to perform successive algorithms. The successive algorithms must be programmed to correspond with a series of conventional instructions that would normally be executed in a conventional microprocessor. Then, the rate of performing the specific computational task of the successive algorithms by the FPGA/FPIN array is much less than the rate of the corresponding instructions performed by a conventional microprocessor. The virtual computer of FIG. 2 must include a reconfigurable control section that governs the reconfiguration of the FPGA/FPIN array. The configuration bit files must be generated for the reconfigurable control section using a software package designed for that purpose. Then, the configuration bit file must be transmitted to a corresponding FPGA/FPIN array in the virtual computer. FIG. 2 illustrates how the arrays and dual port random access memory (RAM) are connected by pins to the reconfigurable control section, a bus interface and computer main memory. The bus interface is connected to a system bus.
U.S. Pat. No. 5,684,980 shows how the pins provide a clock pin and a pin connecting the reconfigurable control section to the FPGA/FPIN arrays, and shows an example of a reconfigurable control section.
U.S. Pat. No. 4,291,372 discloses a microprocessor system with specialized instruction formatting which works in conjunction with an external application dependent logic module handling specific requirements for data transfer to and from a peripheral device. The microprocessor provides a program memory having a specialized instruction format. The instruction word format provides a single bit field for selecting either a program counter or a memory reference register as the source of memory address, a function field which defines the route of data transfers to be made, and a source and destination field for addressing source and destination locations. Previously, peripheral controller units burdened the system with processor and control circuits in the base module for handling the specific requirements.
Digital Signal Processing (DSP) units or arrays of DSP processors may be hardwired into parallel arrays that optimize performance for some graphic intensive tasks, such as pixel processing for generating images on output screens, such as monitors and televisions. These are custom made and include a BIOS specific to the graphical acceleration environment created for the digital signal processors to do their job.
Matrix bus switching (MBS) is known. For example, the user guide “AMBA® 4 AXI4™, AXI4-Lite™, and AXI4-Stream™ Protocol Assertions, Revision: r0p1, User Guide,” copyright 2010, 2012, referenced as ARM DUI 0534B, ID072312, teaches a system for matrix bus switching that is high speed and implementable by a person having ordinary skill in the art. The user guide is written for system designers, system integrators, and verification engineers who want to confirm that a design complies with a relevant AMBA 4 protocol. This can be AXI4, AXI4-Lite, or AXI4-Stream, for example. All of the trademarks are registered trademarks of ARM in the EU and elsewhere. Where excepted, this reference is incorporated herein in its entirety by reference. An MBS is a high speed bus for data input and output, and this reference teaches the methods and hardware for a system engineer to integrate an example of an MBS in a processor system architecture.
All of this is known in the art, but no example in the prior art eliminates almost all of the overhead generated by conventional processing systems, while maintaining the flexibility of processing a wide range of algorithms and using a standard higher level programming language, such as “C”, for software development for the processing system.