The concept of reconfigurable computing was first proposed in 1960 by Gerald Estrin. In the paper “Organization of Computer Systems—The Fixed Plus Variable Structure Computer” he conceptualized a computer composed of two cooperative components: a standard processor and an array of reconfigurable hardware. The standard processor would control the behavior of the reconfigurable hardware. The reconfigurable hardware would be customized to perform a specific task, such as image processing or pattern matching, and would perform its assigned task as quickly as a dedicated piece of hardware. When finished, the customizable hardware could be reconfigured to perform another task. Estrin thus described a hybrid computer structure combining the flexibility of software with the speed of hardware.
Toward Estrin's idea, application specific integrated circuits (ASICs) are one form of configurable hardware. Normally these devices don't offer much in the way of reconfigurability once manufactured. However, field programmable gate arrays (FPGAs) can be re-programmed by the customer after manufacturing. FPGA devices offer greater flexibility through reprogrammability, but are generally much slower than ASIC devices designed for a specific purpose. FPGAs fit nicely as the “Variable” part of Estrin's vision. Interest in FPGAs has increased dramatically with the advent of modern devices that can be reconfigured during runtime. As such, FPGAs coupled with general purpose CPUs offer the possibility of more cost effective processing than general purpose CPUs alone. A large body of work has recently been published to solve specific problems by offloading processing from a general purpose CPU to a more efficient FPGA device reprogrammed for a specific purpose.
The paper “A Pattern-Matching Co-Processor for Network Intrusion Detection Systems”, Clark et al., focuses specifically on network intrusion detection systems, and in particular on efficient pattern matching in network packets using a FPGA as a co-processor. The idea is to match a large number of known patterns against a small number of data sets (packets). Software-based matching techniques are far too slow; thus a FPGA is programmed to do the matching by translating Snort Rules into FPGA circuits. However, this is just one specific example of how to employ an FPGA as an auxiliary processing device and not a general reconfigurable device management facility.
The paper “Assisting Network Intrusion Detection with Reconfigurable Hardware”, Franklin, et al., shows that compiling Snort Rules into FPGA bit streams yields a vast performance advantage over software techniques with respect to pattern matching and intrusion detection. Similar to the paper cited above, this is another example of how FPGAs can be advantageously employed to accelerate performance. Again, however, no reconfigurable device management facility is described.
The paper “The Shunt: An FPGA Based Accelerator for Network Intrusion Prevention”, Weaver et al., is another paper like the previous two cited above that focuses on acceleration specifically with respect to network intrusion detection in particular. Likewise, it too does not describe middleware for the management of a dynamically evolving cluster of computers, each computer potentially having one or more heterogeneous acceleration devices attached, all of which are to be shared over time among a set of users according to user and/or administrator policies.
The paper “Dynamic Reconfiguration to Support Concurrent Applications”, Jean et al., discusses a resource manager that manages allocation and de-allocation of a single FPGA among a collection of individual applications. Savings are realized by avoiding reloading the FPGA when more than one application has use for the currently installed FPGA image. However, it has shortfalls including no method for managing multiple, distributed FPGAs. It has no discovery capabilities and no dynamic FPGA add and remove capabilities. Further, its brute force scheduling method prohibits any ability to provide FPGA services according to user or administrator defined policy.
The paper “Reconfigurable Processor for Data-Flow Video Processing System”, Acosta et al., classifies uses of FPGAs in two broad categories: to offload bit parallel computations, and for computationally intensive program inner loops. It discusses a system named Cheops and cites prior art having other systems named Anyboard, SPLASH, PRISM-II and others that all pre-date the modern stream processing era. Cheops, in particular, is a one of a kind system designed for the specific purpose of processing and displaying digital video sequences. Neither it nor the prior art cited provide a general purpose stream processing acceleration method or system. None contemplate distributed acceleration device management.
The article “FPGAs for Stream Processing: A Natural Choice”, Littlefield et al., links the use of FPGAs to stream processing. Described is a typical configuration, where a multi-computer system's input devices are connected to FPGA computing engines via dedicated links, and the various processing elements are interconnected via a switching communications fabric. Also claimed is the suitability of FPGAs for early stage stream processing. No detailed information is disclosed about management facilities provided by the communication middleware package. No disclosures are made with respect to distributed architectures, dynamic discovery or policy-driven application allocation/de-allocation of reconfigurable resources.
In U.S. Pat. No. 5,828,858, the architecture disclosed allows multiple entities (applications) to control, allocate, and utilize resources (FPGAs) from a common pool simultaneously without multitasking or time slicing. Employed is a distributed control and decentralized scheduling approach.
In U.S. Published Patent Application no. 2008/0028186A1, employment of an FPGA directly on a motherboard as an acceleration device is disclosed. This system also fails to address distribution, sharing, policies and other management issues.
In U.S. Published Patent Application no. 2005/0278680A1, “[s]cheduling refers generically to a process of time sequencing a plurality of tasks or subtasks, [and] partitioning refers generically to a process of developing a physical hardware design for implementing the task or subtask in actual hardware. As used herein, hybrid network typically refers to a collection of elements including one or more processors preferably making up the nodes of a cluster or grid that are upgraded with FPGA boards for hardware acceleration . . . ” Also disclosed is a software tool that “implements application designs onto the hybrid network, controls data flow, and schedules executions on the network using application program interfaces to generate fast and accurate results.”
In U.S. Published Patent Application no. 2005/0097305A1, an on-demand non-distributed FPGA co-processor loader is disclosed. It has no facilities for dynamic accelerator detection nor does it do any scheduling. It is a load and go system—when the microprocessor needs acceleration the FPGA is loaded accordingly and dispatched.