1. Field of the Invention
The present invention relates to reconfigurable computing systems.
2. State of the Art
As the cost of complex integrated circuits continues to fall, systems companies are increasingly embedding RISC processors into non-computer systems. As a result, whereas the bulk of development work used to be in hardware design, now it is in software design. Today, whole applications, such as modems, digital video decompression, and digital telephony, can be done in software if a sufficiently high-performance processor is used. Software development offers greater flexibility and faster time-to-market, helping to offset the decrease in life cycle of today""s electronic products. Unfortunately, software is much slower than hardware, and as a result requires very expensive, high-end processors to meet the computational requirements of some of these applications. Field Programmable Gate Arrays (FPGAs) are also being increasingly used because they offer greater flexibility and shorter development cycles than traditional Application Specific Integrated Circuits (ASICs), while providing most of the performance advantages of a dedicated hardware solution. For this reason, companies providing field programmable or embedded processor solutions have been growing very rapidly.
It has long been known in the software industry that typically most of the computation time of any application is spent in a small section of code. A general trend in the industry has been to build software applications, standardize the interfaces to these computationally intensive sections of code, and eventually turn them into dedicated hardware. This approach is being used by many companies to provide chips that do everything from video graphics acceleration to MPEG digital video decompression. The problem with this approach is that dedicated chips generally take one or more years to create and then are good only for their specific tasks. As a result, companies have begun providing complex digital signal processing chips, or DSPs, which can be programmed to perform some of these tasks. DSPs are more flexible than application-specific hardware, but are less flexible than standard processors for purposes of writing software.
The logical extension of the foregoing trends is to create a chip which is a processor with dedicated hardware that replaces the computationally intensive sections of the application code. In fact, most complex MPEG chips already include a dedicated embedded processor, but are nevertheless not very flexible. Unfortunately, FPGAs, while they provide greater flexibility, are only 5-10% as dense as ASICs (gate arrays/standard cells) per usable function. Since there are usually many different sections of computationally intensive code that must be executed at different times within any given application, a more efficient way of using the inherently inefficient FPGA logic is to repeatedly load each specific hardware logic function as it is needed, and then replace it with the next function. This technique is referred to as reconfigurable computing, and is being pursued by university researchers as well as FPGA companies such as Xilinx and others. U.S. Pat. No. 5,652,875 describes a xe2x80x9cselected instruction setxe2x80x9d computer (SISC) CPU implemented in programmable hardware. A related patent is U.S. Pat. No. 5,603,043. Both of these patents are incorporated herein by reference.
It is desired to have an improved method and apparatus for reconfigurable computing.
A problem that can occur in reconfigurable computing systems that use more than one reconfigurable region concerns data coherency. In one reconfigurable computing system, multiple reconfigurable slices are used. Data from an external memory is written to and stored from these reconfigurable slices. A central processing unit is used to implement instructions which result in the loading of these reconfigurable slices to and from the external memory. The problem of data coherency can occur when these instructions operate out of order. Consider an example when a first data slice is loaded with an instruction which loads data blocks A and B from the external memory and intends to write the result in data block C. A later instruction loads data blocks C and D from the external memory and intends to write the result in data block E. If the second instruction starts before the first instruction begins, the old value of data block C would be used by the second instruction rather than the updated version.
In order to maintain the data coherency and in-order operation, the present invention uses a data dependency checking table which checks to ensure that the instructions do not operate out of order. For example the data dependency checking table can have an entry which stores the information concerning the data blocks A, B and C in one data entry. When another instruction loading data blocks C and D into a reconfigurable slice and intending to write the results of a computation into block E is about to occur, the dependency checking table can detect the data dependency and the conflict is avoided. Note in this example, the second instruction would be stalled by the dependency checking table until the first instruction completes.
The data blocks loaded into and out of the different reconfigurable slices vary in size. For this reason, an indication of the size of the data blocks is stored in the dependency data table. In a preferred embodiment, a mask value is stored in the data dependency checking table so that the protected regions of the external memory can be quickly computed. The masks are used to produce masked addresses that can be compared in a simple identity comparison rather than in a computationally complex function of the different addresses and the exact data block sizes.
In a preferred embodiment of the present invention, the dependency checking table works with extension instructions. The extension instructions include configuration extension instructions to load a configuration into the reconfigurable slices, and data block extension instructions that indicate the data blocks to be sent to and stored from the slices in and out of the external memory. Each of the data block extension instructions results in an entry being placed into the dependency checking table. When such a data block extension instruction finishes the dependency checking table entry is cleared.
Another embodiment of the present invention concerns the use of a dependency checking table in a system in which the data dependency table stores a mask value which is used to give an indication of the size of the data blocks involved. The mask values can be used in a relatively quick computation to determine whether there is a conflict between data accesses to the external memory.