The invention relates to an apparatus and method for execution of standard functions in a computer system.
Past developments in computer systems to improve the execution speed of programs have seen the emergence of floating point numeric co-processors. These evolved due to the demand for repetitive calculations, typically in scientific applications where a number of complex trigonometric or floating point operations are required. In the early days of 8-bit processors, a square root function could take in the order of 5 milliseconds to execute. Floating point co-processors could perform this operation in around 1 microsecond, i.e. around 5000 times faster.
Recent developments have been aimed at speeding up more complex numeric functions, such as digital fourier transforms (DFT""s). To perform such a complex numeric function, the processor may be required to perform many thousands of floating point applications. Processors designed to perform such complex numeric functions are generally referred to as digital signal processors (DSP""s). The goal is to reduce the execution time to such an extent that real time operation can be achieved. Real time operation is highly desirable for applications such as image processing, holographic television and mobile telephone communication, for example.
With the focus on numeric and then DSP applications, some of the more mundane data processing functions have largely been ignored by hardware designers. The lack of interest in data processing functions can be understood by the fact that these functions do not consume as much processor time as numeric functions, such as the traditional square root function, or complex functions such as DFT""s.
One example of a simple data processing function is a basic string search. The task is to find if a string such as xe2x80x9cfredxe2x80x9d appears in any of a number of other strings such as xe2x80x9crolling stonesxe2x80x9d, xe2x80x9cmanfred mannxe2x80x9d and xe2x80x9cpink floydxe2x80x9d, and if so where. The processor performs a byte-by-byte comparison of the first character in the required string xe2x80x9cfxe2x80x9d in the other strings until a match is found. The second character xe2x80x9crxe2x80x9d is then compared with the next character of the string being compared, and so on. To perform the calculation, data is required to be loaded from system memory into the processor registers, interleaved with the processor instructions, also held in memory. The transactions on the system bus, the loading of internal processor registers and decision making are all executed in processor clock periods and represent a significant overhead of the string search function.
As databases become ever larger, repetitive applications of the simplest functions can be expected to consume increasing proportions of the total processor power, notwithstanding the fact that the functions individually are not computationally intensive. Imagine a database where many thousands of records have to be scanned for matching strings, or sorted into a different order.
It is thus an aim of the invention to provide a computer system in which standard functions, especially non-numeric functions such as those related to database applications, can be performed more efficiently than with conventional execution.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Features of the dependent claims may be combined with those of the independent claims as appropriate and in combinations other than those explicitly set out in the claims.
According to one aspect of the invention there is provided a computer system comprising a processor unit connected to a system bus, a mass storage medium including a library of functions, and a field programmable gate array (FPGA). The function library includes a number of functions stored in a pre-compiled form derived from compilation of firmware code and comprising a set of configuration data for configuring the field programmable gate array. The firmware code from which the pre-compiled form is derived may be written in a high-level description language (HDL). The FPGA has a set of configuration line connections operatively associated with the mass storage medium to allow configuration of the FPGA with the pre-compiled form of the function concerned. The FPGA also has a set of bus line connections operatively connected to communicate with the processor, for example through the system bus or a bus internal to the processor unit. The processor is operable to execute a call to one of said functions by delegating the principal data processing content of the function to the FPGA which is configured with the appropriate set of configuration data for that function. The function library may be held in system memory connected to the system bus, or held in an external device accessible through an I/O port of the computer system.
Returning to the example of a string search cited in the introduction, comparison of whole substrings can now take place in nanoseconds rather than hundreds of nanoseconds with a computer system based on current processor and FPGA technology. In the specific example from the introduction, the complete string xe2x80x9cfredxe2x80x9d could be compared concurrently against the first four characters xe2x80x9crollxe2x80x9d in less than 10 nanoseconds. By comparison, with conventional software-based execution, four distinct load/compare cycles are required, consuming hundreds of nanoseconds of processor time.
An additional speed advantage is gained by the inherent parallelism of the high-level description languages (HDL""s) used to write FPGA firmware. In this regard, most processors are essentially sequential devices and can only execute one instruction at a time. (Exceptions to this are parallel processors such as the Transputer). As a result, conventional hardware execution of a software function will in essence be a sequential process, even if pipelining is used to allow several instructions along an instruction stream to be worked on simultaneously. By contrast, HDL""s are inherently parallel, allowing more than one command level function to be performed by the FPGA at the same time.
Referring once more to the string search described above. The string xe2x80x9cfredxe2x80x9d is searched for in the substring xe2x80x9cmanfxe2x80x9d of xe2x80x9cmanfred mannxe2x80x9d. A byte-by-byte comparison indicates no match. At the same time, the first byte xe2x80x9cfxe2x80x9d of the string xe2x80x9cfredxe2x80x9d can be compared with every byte of xe2x80x9cmanfxe2x80x9d to indicate where the rest of the search should begin, in this case at the fourth character. Both these operations can take place contemporaneously in the FPGA hardware. Incidentally, two redundant comparisons are also removed by this procedure, namely the search for a match between xe2x80x9cfxe2x80x9d (the first character of xe2x80x9cfredxe2x80x9d) and each of xe2x80x9caxe2x80x9d and xe2x80x9cnxe2x80x9d (the second and third characters of xe2x80x9cmanfred mannxe2x80x9d).
Thus, by transferring tasks from the processor to the FPGA, a change from sequential to parallel execution can be achieved. The processor is thus not only freed up for carrying out its other tasks, but the functions are executed more efficiently with parallelism.
Another significant advantage of the FPGA approach is that it allows retention of a level of flexibility comparable with conventional software-based execution. By contrast, transfer of numeric functions to a dedicated co-processor, or to specific integrated circuit portions of a DSP, sacrifices the flexibility of conventional software-based execution.
One way in which the inherent flexibility of the firmware approach can be exploited is as follows. Taking a library of standard functions as a starting point, the most time consuming standard functions can be committed to firmware first. Attention can then be directed to developing firmware versions of those functions that previously took seemingly insignificant amounts of processor time. The new firmware versions of the functions can then be added to existing computer systems, simply by supplying a ROM or other recording medium on which is stored the firmware representation of the standard function. The firmware for each function will principally include a set of configuration data for configuring an FPGA to perform the standard function in hardware. Alternatively, the new firmware versions can be supplied via a network, such as the Internet, i.e. in the form of a transmission medium.
Engineers familiar with FPGA design only have to sift through standard function libraries, for example C libraries, to identify those functions that can be readily converted into HDL for implementation in an FPGA. The selection process can be made having regard to the inherent complexity of the underlying algorithm and the specifications of the FPGA provided in hardware. More complex algorithms can be added to the firmware library as expertise in generating firmware versions of the standard library functions develops.
Another example of the benefit of flexibility is for correcting bugs. Firmware bugs in a library function are easily removed by issuing an upgraded firmware version of the library function. Firmware patches can thus be provided, as is usual in software patches. By contrast, the equivalent bugs in conventional hardware, for example in an ASIC (Application Specific Integrated Circuit) or a dedicated co-processor, are a significant risk, since they require costly NRE (non-recurring engineering) to respin the integrated circuit and expensive field replacement of assembled computer boards.
In summary, software versions of existing library functions can be converted to FPGA firmware versions in a piecemeal fashion. A succession of firmware library upgrades can be issued to users as and when firmware versions of more of the standard functions are developed, or as bugs in existing firmware functions are detected. At any one time, development can concentrate on those standard functions which are readily convertible into firmware and those standard functions that would provide the greatest incremental improvement in system performance if their execution were taken away from the processor and delegated to the FPGA.
Further aspects of the invention relate to a storage medium including a library of functions and methods of executing an instruction stream including calls to library functions.
In one such further aspect of the invention a storage medium is provided in or on which each of the functions in said library of functions is stored in at least one of a first version and a second version. The first version is in a form obtained from compilation of software code, and a second version is in a form obtained from compilation of firmware code and comprises a set of configuration data for loading into a field programmable gate array to configure the FPGA to perform the function concerned.
In another such further aspect of the invention there is provided a method of executing program flow liable to include calls to any one of a plurality of standard library functions. The function library is held in a storage medium as firmware comprising configuration data for a FPGA. The method comprises: detecting a call to a library function in the program flow; determining whether the FPGA is configured to execute that library function call; if the FPGA is not configured to execute that library function call, configuring the FPGA by loading the function""s firmware into the FPGA from the storage medium; and executing the call using the FPGA configured with the library function.