1. Field of the Invention
The present invention relates to an information processing system that enables a part of processing by an application program for example to be processed using a programmable logic device of which the circuit can reconfigured, a data processing method in the information processing system and the programmable logic device used for the information processing system. Particularly, the present invention relates to a method of reducing processing time including time required for the reconfiguration of the programmable logic device.
2. Description of the Related Art
In the field of a digital device, a programmable logic device (PLD) such as a field programmable gate array (FPGA) has been used for a prototype device before an application specific integrated circuit (ASIC) is produced or for an alternative device of ASIC requiring a long production term of several weeks or months. Also, recently, a programmable logic device is used for changing specifications after a logic device is produced and enabling a circuit to be modified.
FIG. 17 shows the structure of a general programmable logic device. A programmable logic device 1 includes a circuit information input controller 2 that reads circuit information from an external device and a programmable logic circuit sector 3 that implements circuit functions according to read circuit information.
Further, the detailed structure of the programmable logic circuit sector 3 includes a circuit element 4 and a configuration memory 5 connected to the circuit element 4 as shown in FIG. 18. The circuit element 4 includes an I/O device, logic circuit cells and wiring, and the programmable logic device is classified into an FPGA type and a complex programmable logic device (CPLD) type depending upon the connection type of the circuit element 4.
For an FPGA-type programmable logic circuit sector 3A, as shown in FIG. 19A, logic circuit cells 6A arrayed in the shape of a cross grating are mutually connected via wiring 7A. Also, a signal is inputted/outputted from/to an external device via each I/O device 8A connected to the four sides of the rectangular wiring 7A as a whole.
Also, for a CPLD-type programmable logic circuit sector 3B, as shown in FIG. 19B, I/O devices 8B and logic circuit cells 6B are connected to wiring 7B in tree structure.
In both structures, circuit information read in the programmable logic device 1 is written to the configuration memory 5 by the circuit information input controller 2. According to the circuit information written to the configuration memory 5, the features and the connection state of the circuit element are determined. The operation is called reconfiguration or configuration of the programmable logic device.
In a conventional type programmable logic device, every time circuit information is read, data of the whole configuration memory is rewritten and the whole circuits configured in the programmable logic circuit sector are reconfigured.
Recently, reading only circuit information corresponding to a part of a configuration memory has been enabled. As a result, the change of a part of a circuit being operated in a programmable logic device and the addition of a new circuit to a programmable logic circuit sector without stopping the circuit being operated have been enabled. At this time, intermediate data being processed in the programmable logic device is not lost. Such a programmable logic device is called a programmable logic device that can be partially reconfigured dynamically.
-New Application of Programmable Logic Device-
As a digital communication network represented by the Internet is developed and popularized, the development and the standardization of a digital communication system and a digital media system configured on a network using it are rapidly progressing. A device that processes a digital signal on a network according to these systems can be roughly classified into two in view of the device that processes.
One is software processing that processes using a general purpose processor according to a procedure described in a program and the other is hardware processing that processes according to a procedure described in the form of the connection of circuits using a dedicated processing circuit such as ASIC.
Software processing has a characteristic that one processor can process data of plural systems and can correspond to a new system respectively by changing a program. On the other hand, as overhead for fetching an instruction from a memory storing a program and decoding it and for writing the result of an execution to the memory is required, software processing is slower in processing speed, compared with hardware processing operated at the same clock frequency. Also, there is a defect that as a main storage for storing a program and a secondary storage are required, a processor is large-sized.
In the meantime, hardware processing has a characteristic that the description of a processing procedure is realized by the connection of circuits, the overhead of processing is smaller compared with software processing operated at the same clock frequency, as a result, processing speed is faster and also, as a memory for storing a processing procedure is not required, a processor is small-sized. On the other hand, as the connection of circuits once produced cannot be varied, hardware processing is short of flexibility, compared with software processing, plural dedicated processing circuits are required to process data of plural systems and a circuit once produced cannot correspond to a new system.
Hardware processing using a programmable logic device to solve the defect described above that hardware processing is short of flexibility is recently attracting attention. That is, the hardware processing described above is the one that corresponds to plural systems and a new system by suitably changing the circuit information of a programmable logic device while keeping the characteristic of the hardware processing that processing speed is fast and a processor is small-sized.
As described above, technique that has similar flexibility to software processing using a general purpose processor by hardware processing by a programmable logic device and implements higher-speed processing than software processing is called reconfigurable computing.
-Description of Reconfigurable Computing Technique-
In reconfigurable computing, a required circuit is realized in a programmable logic device by storing the circuit information of plural processing circuits required for application processing in an external storage beforehand and writing the circuit information read from the external storage to a configuration memory in the programmable logic device if necessary.
The above technique is also called cache logic technique from a viewpoint of saving required circuit information outside a programmable logic device and is also called virtual logic technique from a viewpoint that a larger-scale circuit than the actual scale of a programmable logic circuit sector can be realized by rewriting circuit information. In the following description, these techniques are generically called cache logic technique for simplification.
The cache logic technique means time sharing driving technique for configuring a different circuit in the same programmable logic device if necessary. As a result, a larger-scale circuit can be realized using a programmable logic device having a smaller-scale circuit, and the miniaturization and the reduction of the cost of the device are enabled.
For an example of reconfigurable computing technique, there is “Reconfigurable network computing” disclosed in Japanese Published Unexamined Patent Application No. Hei 10-78932 and it will be described below as a conventional type example 1 referring to FIG. 20.
An information processing system in the conventional type example 1 includes plural computers connected to a communication network NET, at least one of them is a computer (an application server) SB that distributes an application program and the rest includes computers (client computers) CL to be a client computer into which the application program is downloaded and which executes the downloaded application program. Extended hardware 11 of which the features can be varied by a program at any time and which can be reconfigured is mounted in a part of the plural client computers CL.
A program code (an extension code) of a part of program features executed by the extended hardware and a main processor code of a part of the program features executed by a main processor 12 of a client computer CL are included in an application program AP stored in the application server SB.
The operating system (OS) of a client computer CL is provided with a feature to judge whether the extended hardware 11 is mounted or not and is provided with a code selection function 13 to fetch a code suitable for hardware configuration from an application program AP based upon the judgment. In case the extended hardware 11 is mounted as in the upper client computer CL shown in FIG. 20, an extension code is fetched from an application program AP using the code selection function 13 and processing is executed by the extended hardware.
Also, in case the extended hardware 11 is not mounted as in the lower client computer CL shown in FIG. 20, a main processor code is fetched from an application program AP using the code selection function 13 and processing is executed by the main processor 12.
According to the conventional type example 1 described above, when an application program distributed from the server is run on the side of a client computer connected via the network, the application program can be processed at high speed by mounting the extended hardware of which the features can be changed by a program at any time and which can be reconfigured on the side of the client computer, including the main processor code of the client computer and an extension code in the application program stored in the server, changing the configuration of the client computer using the code selection function for judging whether the extended hardware is mounted or not and the type of the extended hardware and configuring so that the configuration is suitable for the processing.
However, in the case of the conventional type example 1, there is a problem that time for writing circuit information to a configuration memory of the programmable logic device of the client computer CL from the application server SB is long depending upon the scale of the circuit information (an extension code) to be written to the configuration memory of the programmable logic device and even if high-speed processing is implemented using the extended hardware which is a dedicated hardware processing circuit, the whole processing time including circuit reconfiguration time is longer than processing time by software.
One possible solution of this problem is device technique called multicontext technique. That is, in multicontext technique, a circuit is reconfigured in a programmable logic device by providing plural configuration memories so that plural circuit information can be stored in the programmable logic device and switching the configuration memories if necessary, and circuit reconfiguration time is greatly reduced.
-Description of Programmable Logic Device Based Upon Multicontext Technique-
FIG. 21 shows the structure of a programmable logic device based upon multicontext technique. The programmable logic device 20 based upon multicontext technique includes a circuit information input controller 21 that reads plural circuit information pieces from an external device, a circuit information selection controller 22 that selects required circuit information of the plural circuit information pieces and a programmable logic circuit sector 23 that realizes a circuit function according to the selected circuit information.
The detailed structure of the programmable logic circuit sector 23 based upon multicontext technique is shown in FIG. 22 and the programmable logic circuit sector 23 includes an 10 device, logic circuit cells, a circuit element 24 including wiring and a configuration memory 25 connected to the circuit element 24 as in the case described above. The configuration memory 25 in the case of the programmable logic circuit sector 23 based upon multicontext technique includes plural memory planes.
In the case of the programmable logic circuit sector 23 based upon multicontext technique, in both structures of the FPGA type and the CPLD type (see FIG. 19), plural circuit information pieces read in the programmable logic device 20 from an external device are written in a state that one circuit information piece is written to each memory plane of the configuration memory 25 by the circuit information input controller 21.
Of plural circuit information pieces written to plural memory planes of the configuration memory 25, the function of the circuit element 24 and a connection state are determined according to circuit information written to a memory plane selected according to a selection signal from the circuit information selection controller 22 and a circuit is reconfigured in the programmable logic device 20.
For an example of multicontext technique, there is “A Time-Multiplexed FPGA” announced at FPGAs for Custom Computing Machines in 1997 (FCCM'97). Referring to FIG. 23, the example described above will be described below as a conventional type example 2.
FIG. 23 shows the configuration of the announced time sharing driven FPGA. The time sharing driven FPGA is an improved product of XC4000E manufactured by Xilinx in the U.S. and is provided with eight sets of configuration memories including SRAM data which determines the logic cells and internal wiring of a circuit element 31. Circuit information corresponding to different circuit configuration is stored in each of the configuration memories 32 and a circuit of FPGA can be reconfigured by time sharing by switching these configuration memories 32.
As shown in the conventional type example 2, as in multicontext technique, plural circuit information pieces are stored in the configuration memories beforehand, circuit reconfiguration time can be reduced.
However, as plural planes or plural configuration memories are required inside the programmable logic device to store circuit information, the scale of the programmable logic circuit sector is enlarged. As the load capacity of the circuit element is increased when the scale of the circuit is enlarged, a problem is caused that the performance of the circuit is deteriorated and the power consumption is increased. Also, when the scale of the circuit is enlarged, a problem is caused that the manufacturing cost of the programmable logic device is increased.
In an information processing system for processing image data and others, image data is often sequentially processed in units of block including the predetermined number of groups of pixel data by plural processing circuits. For example, in case image compression coding processing is executed, image data is divided into blocks, orthogonal transformation is applied to data divided into a block in an orthogonal transformation circuit for example, quantization processing is applied to data after the orthogonal transformation in a quantizing circuit and further, variable-length coding processing is executed in a variable-length coding (an entropy coding) circuit.
In this case, generally, image data is sequentially supplied to plural processing circuits per block, in each processing circuit, processing is executed in units of block and an output signal per block is acquired. The processing is repeated by the number of blocks.
Therefore, in case processing by plural circuits is executed in the programmable logic device using cache logic technique and multicontext technique, it is general to sequentially reconfigure a processing circuit such as an orthogonal transformation circuit, a quantizing circuit and a variable-length coding circuit in the programmable logic circuit sector per data pieces in units of block and to execute processing independent of the programmable logic device in which a conventional type example is used.
However, in this method, the frequency of the reconfiguration of a circuit in the programmable logic circuit sector is required by the number of blocks to be processed, circuit configuration time has an effect upon the whole processing time, the whole processing time is extended and processing time may be longer than that of software processing in comparison in total processing time including circuit reconfiguration time.
Referring to drawings, processing time described above using the conventional type programmable logic device will be further detailedly described below.
Processing time will be described below using an application including three processing circuits C1, C2 and C3 as an example. Data to be processed includes N blocks (N: an integer which is two or more) and processing is completed by sequentially processing the data by the processing circuits C1, C2 and C3.
In case the application is the JPEG compression of an image for example, the processing circuits C1, C2 and C3 respectively correspond to a DCT circuit, a quantizing circuit and an entropy coding circuit and one block of data corresponds to 64 (8×8) pieces of pixel data in gradation that one pixel is represented by eight bits.
-Reconfiguration of Circuit Based Upon Conventional Type Cache Logic Technique-
As described referring to FIG. 17, the programmable logic device 1 in this case includes the circuit information input controller 2 and the programmable logic circuit sector 3 having the circuit element 4 and the configuration memory 5.
In this example, as shown in FIG. 24, circuit information CD1, CD2 and CD3 for respectively configuring circuits C1, C2 and C3 are sequentially read in the configuration memory 5 of the programmable logic circuit sector 3 via the circuit information input controller 2, processing circuits C1, C2 and C3 are configured in the part of the circuit element 4, the configured processing circuits C1, C2 and C3 respectively sequentially process N blocks of input data Din1, Din2, - - - , DinN and acquire N blocks of output data Duot1, Dout2, - - - , DoutN.
The processing procedure will be detailedly described below using a timing chart shown in FIG. 25 and a flowchart shown in FIG. 26.
As shown in FIG. 26, when processing is started, a value of a data counter that indicates the block number of input data is reset to 1 by an application controller not shown in FIG. 24 (a step S101).
Next, circuit information CD1 is read in the configuration memory 5 via the circuit information input controller 2 and a processing circuit C1 is configured in the programmable logic circuit sector 3 (a step S102). This is equivalent to a reading execution state shown as first “reading” in the timing chart shown in FIG. 25.
When the processing circuit C1 is configured, a block Din1 indicated by the data counter of input data is input to the processing circuit C1 (a step S103). When the data is input, a value of the data counter is incremented by one by the application controller (a step S104). The input data Din1 is processed in the processing circuit C1 (a step S105). The operation from the input of data to processing by the processing circuit C1 is equivalent to a data processing execution state shown as first “processing” in the timing chart shown in FIG. 25.
The processing circuit C1 configured based upon the circuit information CD1 includes an input data buffer 42 and a processing execution circuit 41 as shown in FIG. 27. The input data Din1 is temporarily stored in the input data buffer 42 and is sequentially processed in the processing execution circuit 41. The result of the processing is stored in the input data buffer 42 as intermediate data again. At this time, the first input data Din1 is overwritten by intermediate data and lost.
As described above, circuit information CD2 is read in the configuration memory 5 via the circuit information input controller 2 in a state in which the intermediate data acquired by processing the input data Din1 in the processing circuit C1 is stored in the input data buffer 42 in the programmable logic circuit sector 3 and a processing circuit C2 is configured in the programmable logic circuit sector 3 (a step S106). This is equivalent to a reading state shown as second “reading” in the timing chart shown in FIG. 25.
When the processing circuit C2 is configured, the intermediate data stored in the programmable logic circuit sector 3 is input to the processing circuit C2 and processed (a step S107). The operation from the input of the intermediate data to processing by the processing circuit C2 is equivalent to a data processing execution state shown as second “processing” in the timing chart shown in FIG. 25.
In this case, as shown in FIG. 28, the circuit information CD2 dynamically partially reconfigures the processing execution circuit 41 of the processing circuit C1 to be a processing execution circuit 43 with the input data buffer 42 included and configures the processing circuit C2. As a result, intermediate data is stored in the input data buffer 42 to be input data to the processing circuit C2. The result of processing by the processing execution circuit 43 is stored in the input data buffer 42 as new intermediate data again. At this time, the first intermediate data is overwritten by the new intermediate data and lost.
Circuit information CD3 is read in the configuration memory 5 via the circuit information input controller 2 in a state in which the result of the processing in the processing circuit C2 is stored in the programmable logic circuit sector 3 as the new intermediate data and a processing circuit C3 is configured in the programmable logic circuit sector 3 (a step S108). This is equivalent to a reading state shown as third “reading” in the timing chart shown in FIG. 25.
When the processing circuit C3 is configured, the intermediate data stored in the input data buffer 42 in the programmable logic circuit sector 3 is input to the processing circuit C3, is processed (a step S109) and the result of the processing is output as output data Dout1 (a step S110). The operation from the input of the intermediate data to the result of the processing by the processing circuit C3 is equivalent to a data processing execution state shown as third “processing” in the timing chart shown in FIG. 25.
In this case, as shown in FIG. 29, the circuit information CD3 dynamically partially reconfigures the processing execution circuit 43 of the processing circuit C2 to be a processing execution circuit 44 with the input data buffer 42 included, further adds an output data buffer 45 and configures the processing circuit C3. As a result, intermediate data is stored in the input data buffer 42 to be input data to the processing circuit C3. After the result of processing by the processing execution circuit 44 is temporarily stored in the output data buffer 45, it is output as output data.
In case a value of the data counter is smaller than the number N of all blocks of input data when the result of the processing by the processing circuit C3 is output, a processing cycle in the steps described above S102 to S110 since circuit information CD1 is read again until the processing of all input data is finished is repeated (a step S111).
As described above, one block of data is processed by reading circuit information three times and the succeeding data processing. All input data is processed by repeating this cycle by the number N of blocks of the input data.
In case an error occurs in reading circuit information and in processing in the processing circuit though the case is not shown in FIGS. 25 and 26, the occurrence of the error is informed to the application controller and processing is terminated.
In the example described above, the circuit information CD1 generates the input data buffer, the circuit information CD3 generates the output data buffer and they store the intermediate data generated by each processing circuit C1, C2, C3, however, the invention is not limited to the case described above. FIGS. 30, 31 and 32 respectively show examples of another circuit configuration.
In another example, a processing circuit C1 configured by circuit information CD1 includes a processing execution circuit 41, a left data buffer 42L and a right data buffer 42R as shown in FIG. 30. Input data is temporarily stored in the left data buffer 42L and is sequentially processed in the processing execution circuit 41. The result of the processing is stored in the right data buffer 42R as intermediate data. At this time, the first input data remains stored in the left data buffer 42L.
Circuit information CD2 dynamically partially reconfigures the processing execution circuit 41 of the processing circuit C1 to be the processing execution circuit 43 with the left data buffer 42L and the right data buffer 42R included and configures a processing circuit C2 as shown in FIG. 31. At this time, unlike the processing execution circuit 41, data is input from the right data buffer 42R to the processing execution circuit 43 and is output from the processing execution circuit 43 to the left data buffer 42L. As a result, intermediate data stored in the right data buffer 42R is input to the processing execution circuit 43. The result of processing by the processing execution circuit 43 is stored in the left data buffer 42L as new intermediate data. At this time, the first input data stored in the left data buffer 42L is overwritten by the new intermediate data and lost.
Circuit information CD3 dynamically partially reconfigures the processing execution circuit 43 of the processing circuit C2 to be the processing execution circuit 44 with the left data buffer 42L and the right data buffer 42R included and configures a processing circuit C3 as shown in FIG. 32. At this time, as in the processing execution circuit 41, data is input from the left data buffer 42L to the processing execution circuit 44 and is output from the processing execution circuit 44 to the right data buffer 42R. As a result, intermediate data stored in the left data buffer 42L is input to the processing execution circuit 44. After the result of processing by the processing execution circuit 44 is temporarily stored in the right data buffer 42R, it is output as output data.
-Reconfiguration of Circuit in Reconfigurable Computing Using Multicontext Technique-
As described referring to FIG. 21, the programmable logic device based upon multicontext technique includes the circuit information input controller 21 that reads plural circuit information pieces from an external device, the circuit information selection controller 22 that selects required circuit information of the plural circuit information pieces and the programmable logic circuit sector 23 that realizes a circuit function based upon the selected circuit information pieces.
For an example of an application of the programmable logic device based upon multicontext technique, as shown in FIG. 33, circuit information pieces CD1, CD2 and CD3 are sequentially read and stored in the configuration memory 25 of the programmable logic circuit sector 23 via the circuit information input controller 21.
N blocks of input data Din1, Din2, - - - , DinN are sequentially processed in processing circuits C1, C2 and C3 respectively configured by the circuit information selection controller 22 according to selection signals S1, S2 and S3 and output data for N blocks Dout1, Dout2, - - - , DoutN are acquired.
The processing procedure will be detailedly described below using a timing chart shown in FIG. 34 and a flowchart shown in FIG. 35.
As shown in FIG. 35, when processing is started, a value of a data counter that indicates the block number of input data is reset to 1 by an application controller not shown in FIG. 33 (a step S201).
Next, three circuit information CD1, CD2 and CD3 are sequentially read via the circuit information input controller 21 and are sequentially stored in the configuration memory 25 (steps S202, S203 and S204). This is equivalent to a reading execution state shown as “reading” in the timing chart shown in FIG. 34.
Next, according to a selection signal S1, a processing circuit C1 is configured according to a direction from the circuit information selection controller 22 (a step S205) and the block Din1 indicated by the data counter of input data is input to the processing circuit C1 (a step S206). When the data is input, a value of the data counter is incremented by one by the application controller (a step S207). The input data Din1 is processed in the processing circuit C1 (a step S208).
Next, a processing circuit C2 is configured in the programmable logic circuit sector 23 according to a direction from the circuit information selection controller 22 according to a selection signal S2 in a state in which intermediate data acquired by processing the input data Din1 in the processing circuit C1 is stored in the programmable logic circuit sector 23 (a step S209). When the processing circuit C2 is configured, the intermediate data stored in the programmable logic circuit sector 23 is input to the processing circuit C2 and processed (a step S210).
A processing circuit C3 is configured in the programmable logic circuit sector 23 according to a direction from the circuit information selection controller 22 according to a selection signal S3 in a state in which the result of the processing of the intermediate data in the processing circuit C2 is stored in the programmable logic circuit sector 23 as new intermediate data (a step S211). When the processing circuit C3 is configured, the intermediate data stored in the programmable logic circuit sector 23 is input to the processing circuit C3 and processed (a step S212). The result of the processing is output as output data Duot1 (a step S213).
In case a value of the data counter is smaller than the number N of all blocks of input data when the result of the processing by the processing circuit C3 is output, a processing cycle from the step S205 to the step S213 is repeated with a processing circuit C1 selected again according to a selection signal S1 until the end of the processing of all input data (a step S214).
As described above, one block of data is processed by selecting circuit information three times according to a selection signal and the succeeding data processing. All input data is processed by repeating this cycle by the number N of blocks of the input data.
In case an error occurs in reading circuit information and in processing in the processing circuit though the case is not shown in FIGS. 34 and 35, the occurrence of the error is informed the application controller and processing is terminated.
For the circuit configuration that the processing circuits C1, C2 and C3 store intermediate data in the description of this example, the same circuit configuration as the one described referring to FIGS. 27 to 29 and FIGS. 30 to 32 can be used.
As described above, in the case of a reconfiguration method using the conventional type programmable logic device, plural circuits are required to be sequentially repeatedly reconfigured per block, the frequency of reconfiguration is required by (the number of blocks×the number of circuits) and therefore, there is a problem that total processing time is long.