In today's competitive multimedia marketplace Integrated Circuit (IC) suppliers, Original Equipment Manufacturer (OEMs) and network/service providers are faced with an array of dilemmas. Functional integration, dramatic increases in complexity, new technologies and every changing and competing standards together with increased time to market pressures are making the selection of the right functionality-cost mix ever more difficult. Furthermore, end customers are demanding more sophisticated feature sets, which in turn require an enormous amount of additional processing power.
The constant introduction of new standards means conventional equipment is effectively obsolete before it leaves the factory. This is a particular concern to network/service providers, such cable, satellite, terrestrial television providers and mobile phone operators as they significantly subsidize the cost of this equipment to the consumer. Consequently, the introduction of new equipment erodes their profits. Therefore, having equipment that could adapt to changing standards, upgrades and new applications via the Internet and or broadcast channel would be a significant advantage.
To further compound the issue the introduction of new European environmental legislation in 2004 will make OEMs responsible for waste management. Waste of Electrical and Electronic Equipment (WEEE) and Restrictions of the use of certain Hazardous Substances (RoHS) legislation will mean manufacturers of consumer goods will need to adopt a more environmentally friendly manufacturing strategy. They will also be responsible for product recycling.
At the IC device level, it is becoming increasing difficult with existing IC technologies and design methodologies for designers to meet the demands outlined above. Several IC technologies exist, but they all have disadvantages and are not optimised for a particular application.
Application Specific Integrated Circuits (ASICs) have their circuits and hence their functionality fixed at manufacture and so can't be used for new or different applications. They have long development cycles and require huge upfront Non-Recurring Engineering (NRE) costs. This makes them prohibitively expensive, especially for lower cost applications.
Microprocessors and Digital Signal Processors (DSPs) provide a degree of flexibility with regards reconfiguration through software. However, these devices still employ fixed or rigid hardware and as they are general purpose devices are not optimised to a particular application. This is particularly true when compared to a parallel hardware solution. A microprocessor can only process one instruction at a time and is therefore much slower and inefficient. While operating, many of their circuits are not being utilized. This is a waste of expensive silicon real estate and increases power consumption. To increase the throughput, designers can employ more than one processor. However, this just compounds the cost, power efficiency and area issues.
Current programmable logic devices, such as Field Programmable Gate Arrays (FPGAs), provide a better solution. However, FPGAs are very expensive and are a general-purpose device consisting of an array of uniform programmable element, usually based on look-up tables (LUTs) interconnected using programmable interconnect. Consequently, they are not optimised for a particular application and hardware utilization can be poor. Though they allow reconfiguration in the field the process is slow and cumbersome and doesn't allow real-time reconfiguration.
Many multimedia processes require several complex digital signal-processing algorithms. Each algorithm itself comprises of many sub-functions some of which can be executed in parallel. Some of these sub-functions or processes, such as digital filtering, convolution, Fast Fourier Transforms (FFTs), Discrete Cosine Transforms (DCTs), require many arithmetic and logical computations per data sample. These arithmetic and logical computation operations tend to be the same operation executed many times, such as multiply and accumulate (MAC) operations. Consequently, the hardware to implement these different processes is very similar and can be optimised and shared for these applications. Exploiting the parallel form of certain algorithms by implementing hardware to perform the separate parallel functions simultaneously provides hardware acceleration of the algorithm enabling it to be executed in a quicker time. A goal of the present invention is to provide processing resources in the reconfigurable integrated circuit that can execute functions in parallel and provide hardware acceleration.
FIG. 2 is a logical block diagram that outlines the processing and resource requirements for a generic multimedia system or algorithm 100. The algorithm can be partitioned into several distinct functions each having its own processing and resource requirements. The algorithm input block 101 operates at a lower rate than the core functions 103, but tends to require shared resources. Received data needs to be formatted or pre-processed 102 before being transferred to parallel algorithmic resources 103. These are dedicated resources, which operate at high frequencies that are many times the data sample rate. Data is then post processed or merged 104 before being output 105 via one or a plurality of output channels. These latter two functions require medium processing rates and shared resources.
As well as parallel processing an algorithm may contain certain sub-functions that are performed sequentially. Each subsequent sub-function requiring data to be processed by the previous sub-function. In an ASIC or FPGA design each sub-function will require dedicated circuitry. However, by reconfiguring the available logic resources the reconfigurable logic can be altered in real-time to implement each of the sequential sub-functions. Consequently, reducing the number of logic gates and silicon real estate. It is another goal of the present invention to provide a reconfigurable integrated circuit, which optimises the logic resources for a particular application.
Another problem facing integrated circuit designers is the choice of device interfaces. There are many interface standards available several of which are constantly being upgraded. One solution is to implement several interfaces on a device to enable it to be employed in several different applications. However, this is costly and inefficient especially when an interface requires wide address and data buses. One of the goals of the present invention is to provide reconfigurable logic resources to allow a designer to implement different interfaces using the same logic resources.
Another goal of the present invention is to provide logic resources with varying degrees for reconfiguration rate. Some reconfigurable resources only need to be configured at the start of device operation, such as interface type, clock rate and memory sizes. Other algorithmic blocks implement functions, which perform operations at a rate lower than the maximum clock frequency used by a particular device. These algorithmic blocks tend to perform similar operations. Therefore, several different algorithms can be implemented by dynamically sharing common logic resources.
This concept can be extended for implementing finite state machines. FIG. 3 shows a generic block diagram of a finite state machine. The current state 906 is stored in register 901 and is clocked using clocking signal 909. Current state 906 together with inputs 904 are input into the next state generation logic 900 to determine the next state 905 and actions. At the next clock cycle the next state vector 905 in transferred to the current state register 901. Likewise, any outputs are registered in register 902. In some finite state machines variables 908 need to be updated at certain times. Variable update logic 903 is used to perform these calculations. The finite state machine can be reset using reset signal 910.
The stages of operation are shown in FIG. 4. For each state there can be several test conditions. Each of these is tested 9A. Then the appropriate one is selected 9B. Based on the selected test condition the next state, outputs and actions are selected 9C. At the start of the next clock cycle the next state, outputs and actions are updated 9D.
However, one of the problems of implementing finite state machines is that logic circuitry is required to perform functions associated with each state. This also means these individual circuits are dissipating power even if they are not being used as in an ASIC or FPGA implementation. For a complex state machine with many states this requires a lot of silicon resources. A solution to this problem is to implement the logic for each state only when it is required. By dynamically reconfiguring and sharing logic resources a finite state machine can be implemented in a smaller area with reduced power consumption.
One of the disadvantages of using Field Programmable Gate Arrays (FPGAs) is that they are not optimised for a particular application due to replication of uniform programmable logic elements. Yet another goal of the present invention is to provide a reconfigurable integrated circuit that employs non-uniform or a diverse range of rigid elements and programmable-rigid elements, which target a particular group of applications, such as audio, video and telecommunication applications. The term rigid element means a hardwired circuit dedicated to implementing a particular function or functions. The hardwired circuit can be “constructed” from one or more hardwired sub-circuits. The term programmable-rigid element means a circuit that contains hardwired circuitry, but certain parts of the circuitry can be reconfigured via memory means so the circuit can implement one of a plurality of similar functions. This includes a micro-coded controller. The term reconfigurable element refers to a block of logic that can be reconfigured to implement a wide variety of combinatorial and or synchronous logic functions. Though synchronous logic is normally employed there is no reason why asynchronous logic (also referred to as clockless logic or self-timed logic) cannot be employed in the hardwired circuits used in the reconfigurable integrated circuit. There are several advantages to using asynchronous logic, namely reduced power consumption, as the logic will consume zero dynamic power when there is no logic activity, and a low electromagnetic signature because of the diffuse nature of digital transitions within the chip. This makes these devices an attractive option for use in portable or battery operated applications.
Video processing tends to work on 8-bit data values as in MPEG2. However, audio applications require a greater range of bit widths. Compact Disc (CD) data was originally set at 16-bits. However, the sample resolution for new audio systems has changed to 18-bits, 20-bits and now 24-bits. In voice data systems data is coded and transmitted serially. Consequently, fine grain bit resolution processing is required. Therefore, a reconfigurable integrated circuit targeted at audio applications will need to implement both coarse and fine grain processing elements.
Several attempts have been made to provide an integrated circuit device solution, which provides the speed of parallel hardware with the flexibility of software. However, these solutions have had many limitations. Some have provided replicated coarse grained processing elements to target particular digital signal processing problems and therefore lack the versatility of a full reconfigurable solution.
For example, Marshall et al. EP0858167 (priority EP 19970300562), entitled “Field Programmable Processor Arrays”, Jan. 29, 1997, describes a device in which processing units can be densely connected efficiently and in a flexible way so they can be interconnected. However, the processor array is made up from the same arithmetic logic units (ALUs) repeated many times. Each ALU is 4-bits wide and control functions seem limited. There are no diverse computational blocks. The device is geared to data path processing and in particular repetitive operations. The device has specific applications and does not provide functions for implementing control, interfaces, input, output, finite state machines and general reconfiguration operations, as required in a more general purpose device.
Tavana et al. U.S. Pat. No. 6,094,065, entitled “Integrated Circuit with Field Programmable and Application Specific Logic Areas”, issued Jul. 25, 2000, discloses use of a field programmable gate array in a parallel combination with a mask-defined application specific logic area. The intention is to provide post-fabrication reconfiguration logic means to enable bug fixes and error corrections. However, this approach is limited and suffers from the disadvantages associated with ASICs and FPGAs, such as low logic utilization, greater power consumption, low speed and high cost.
Master et al. U.S. Pat. No. 20020138716, entitled “Adaptive integrated circuitry with heterogeneous and reconfigurable matrices of diverse and adaptive computational units having rigid, application specific computational elements”, issued Sep. 26, 2002, describes an integrated circuit which employs rigid hardware elements which can be reconfigured in real time. However, there are several disadvantages to this approach. Firstly, each computation unit comprises several different rigid computational elements and a single computational unit controller. A plurality of computation units is used to form a matrix, which is then replicated many times to form an array of matrices. This is an inefficient use of hardware resources as the computational unit controller will only be using one of the plurality of computational elements depending on the algorithm be implemented. Therefore, the hardware utilization can be low. Secondly, the computational unit controller can only access the computational elements in its own computation unit. There is no sharing of resources by different computational unit controllers. Again, this is inefficient. Thirdly, the same computational elements and matrices are repeated across the integrated circuit to form a large array. There is no grading of reconfigurable resources across the integrated circuit in relation to the processing and resource requirements for different functions used to implement a system, such as input interfaces, output interfaces, parallel processing and protocol processing and data formatting.
De Hon (U.S. Pat. No. 5,956,518) describes a device architecture, which is based around a two dimensional array of Basic Functional Units or BFUs. All the BFUs are identical and a BFU is smallest logic unit from which more complex processing units can be built. There are many disadvantages with this architecture. Firstly, there is a large area overhead. Each BFU must contain all the circuitry required to perform any function, on the off chance that it might be required. This is also a disadvantage to similar array processors. A BFU can be both a datapath unit or part of a control unit and the BFUs are programmed to implement specific functions. For example, a whole BFU is programmed to implement a Program Counter (PC) of a control unit. The logic and hence the silicon area overhead is therefore greater than it would be if this function was implemented using dedicated logic. A Program Counter is just a basic counter. The same is true for all the other programmed versions of a BFU. If it is programmed to be a memory unit, then the rest of BFU circuitry is wasted as it is not being used. As this approach will lead to a greater silicon area and silicon costs. Also, the logic that isn't being used directly, will probably be dissipating power. Also, a BFU is a general purpose unit and not optimised for a particular application. Consequently, much of the logic is used to input and output signals, be used for other decodes and hence adds to the number of levels of logic a signal must pass to get between BFUs. This therefore degrades the performance of the architecture and adds to the path delays. The output of each BFU is registered so they can not be used to form concatenated datapath functions.
There is also a large routing overhead associated with each BFU. For example, each 8-bit BFU requires 8×(30×8-bit) buses or 240×8-bit buses per BFU.
The BFUs implement many basic logic and arithmetic functions, but they are still very limited. They take several cycles to implement a multiply. This is a serious disadvantage as most Digital Signal Processing (DSP) algorithms rely heavily on multiply-accumulate operations and the overall performance is degraded as it will take several clock cycles to implement a particular function.
Consequently, there is a need for a reconfigurable integrated circuit that provides the speed of parallel hardware, as employed in an ASIC device, with the reconfigurable flexibility of software for a targeted application. The reconfigurable integrated circuit will allow dynamic sharing of resources, both rigid and programmable-rigid, to maximise hardware utilization, employ different grades of processing resources depending on the algorithmic sub-function level within a system and be reconfigurable in both real-time and non real-time. These reconfigurable logic devices enable the same device to implement many different functions and standards in hardware. They effectively evolve with changing standards and so reduce obsolescence. The result is a reconfigurable integrated circuit solution with orders of magnitude functional density improvement over traditional integrated circuit solutions and one that is more efficient in terms of cost, power consumption and use of silicon real estate.