The present invention relates to a control program product described with microcodes or the like, and a data processing system capable of executing the control program.
Processors (data processing systems or LSIs) incorporating an operation function such as microprocessor (MPU) and digital signal processor (DSP) are known as apparatuses for conducting general-purpose processing and special digital data processing. Architectural factors that have significantly contributed to improved performance of these processors include pipelining technology, super-pipelining technology, super-scalar technology, VLIW technology, and addition of specialized data paths (special purpose instructions). The architectural elements further include branch prediction, register bank, cache technology, and the like.
In the VLIW technology, the data paths are configured in advance so as to allow for parallel execution, optimization is conducted so that a compiler improves the parallel execution and generates a proper VLIW instruction code. This technology adopts an extremely rational idea, eliminating the need for the circuitry for checking the likelihood of parallel execution of individual instructions as in the super-scalar. Therefore, this technology is considered to be extremely promising as means for realizing the hardware for parallel execution. However, given a processor for use in processing of an application that requires image processing or special data processing, the VLIW is not an optimal solution either. This is because, particularly in applications requiring continuous or sequential processing using the operation results, there is a limit in executing operations or data processing while holding the data in a general-purpose register as in VLIW. This problem is the same in the conventional pipeline technology.
On the other hand, it is well known from the past experiences that various matrix calculations, vector calculations and the like are conducted with higher performance when implemented in dedicated circuitry. Therefore, in the most advanced technology for achieving the highest performance, the idea based on the VLIW becomes major with the various dedicated arithmetic circuits mounted according to the purpose of applications.
However, the VLIW is the technology of improving the parallel-processing execution efficiency near a program counter. Therefore, the VLIW is not so effective in, e.g., executing two or more objects simultaneously or executing two or more functions. Moreover, mounting various dedicated arithmetic circuits increases the hardware, also reduces software flexibility.
The architecture of FPGA (Field Programmable Gate Arrays) is capable of changing connection between transistors and controlling dynamically to some degree, therefore, various dedicated arithmetic circuits may be implemented. However, in FPGA based architecture, it takes a long time for dynamically changing the hardware, and some another hardware for reducing that time is required. Therefore, it is difficult to dynamically control the hardware during execution of the application actually, and it dose not become an economical solution. It is possible to retain the reconfiguration information of the FPGA in a RAM of two faces or more for operating in the background so as to dynamically change the architecture in an apparently short time. However, in order to enable this reconfiguration to be conducted within several clocks, it is required to mount the RAM that stores all of number of combinations of information for reconfiguring the FPGA. This does not at all essentially solve the economical problem of a long reconfiguration time of the FPGA. Moreover, the original problem of the FPGA, i.e. poor AC characteristics at the practical level, that comes from the purpose of FPGA to efficiently implementing mapping in terms of the gate of the hardware, is not likely to be solved for the time being.
It is therefore an object of the present invention to provide a system, such as a program product, a data processing system capable of executing the program and a control method of the processing system, in the system, complicated data processings are flexibly executed at a high speed without using various dedicated circuits specific to those data processings originally. It is another object of the present invention to provide a more economical data processing system, a control method of the processing system and a program product, allowing for dynamic hardware control even during execution of an application, and capable of implementing the software-level flexibility at the hardware level and of executing various data processings at a high speed.
Therefore, the present invention provides a program product for controlling a data processing system including a plurality of processing units. The program product or program apparatus including a data flow designation instruction for designating input and/or output interfaces of at least one of the processing units independently of the time or timing of execution of the processing unit so as to define a data path configured by the processing unit. This program can be provided in a form recorded or stored on a recording medium readable with the data processing system, such as ROM or RAM. This program can alternatively be provided in a form embedded in a transmission medium capable of being transmitted over a computer network or another communication.
The present invention also provides the data processing system comprising a plurality of processing units including changeable input and/or output interfaces; a unit for fetching the data flow designation instruction for designating the input and/or output intercedes of at least one of the processing units independently of the time or timing of execution of the processing unit; and a data flow designation unit for decoding the data flow designation instruction and setting the input and/or output interfaces of the processing unit so as to configure a data path from a plurality of the processing units. The program product of the present invention controls the processing system. Accordingly, the data path formed from a combination of a plurality of processing units is changed with the program, so that various data processings are executed with hardware, i.e., the data path or data flow, that is suitable for each of that various processings.
A method for controlling the data processing system according to the present invention includes a step of fetching a data flow designation instruction that designates the input and/or output interfaces of at least one of the processing units independent of the processing execution timing of the processing unit; and a data flow designation step of decoding the data flow designation instruction and setting the input and/or output interfaces of the processing unit so as to configure some data path from a plurality of the processing units.
Conventionally, the only way to handle with a complicated data processing is to prepare dedicated circuitry and implement a special instruction for using the circuitry, thereby increasing the hardware costs. In contrast, in the system of the present invention, such as the program product, data processing system and control method thereof, the interfaces of the processing unit as an arithmetic logic unit are described, making it possible to introduce the structure of pipeline control and data path control into an instruction set, i.e., program product. This allows various data processings to be described with the program and executed with suitable hardware, whereby the data processing system having both the software flexibility and high-speed performance using dedicated circuitry is provided by this invention. Moreover, these data paths can be implemented without discontinuing execution of a main processing or general-purpose processing, therefore, the hardware is dynamic reconfigured during execution of an application.
Moreover, the present invention provides means that is effective not only in execution of parallel processing near a program counter, but also in simultaneous pseudo-execution of two or more objects and simultaneous pseudo-execution of two or more functions. In other words, in the conventional instruction set, two or more processings respectively based on remote program counters, such as data processings and algorithm executions having different contexts, cannot be activated simultaneously. In contrast, in the present invention, the data flows are appropriately designated with the data flow designation instructions, enabling the above processings to be performed regardless of the program counters.
Accordingly, with this instruction set, data path that seems to be effective in improvement in parallel processing performance from the application side can be incorporated previously from the software, so that the data path (data flow) thus implemented is activated from the software at the instruction level and as required. Since these data paths are used not only for the data processings corresponding to specific purpose, but also for such a purpose as for performing as a general state machine, the structure of this invention has an extremely high degree of freedom.
Moreover, the present invention enables a data path formed from a combination of the processing units to be changed by designating the interface of the processing units according to the data flow designation instruction. Accordingly, unlike the architecture of changing the connections between transistors like FPGA, the data paths are defined by merely switching the interfaces between the processing units having an appropriate and/or specific data processing function. Therefore, the hardware is reconfigured in a short time. Moreover, the data processing system of the present invention does not have the architecture requiring a general usage at the transistor level like FPGA, the mounting or packaging density is improved, whereby a compact, economical data processor such as system LSI can be provided. In addition, since the redundant structure is reduced, the processing speed is increased as well as the AC characteristic is improved.
Thus, in the program, data processing system and control method thereof according to the present invention, the instruction defining the interfaces of at least one processing unit included in the data processing system is recorded or described. Therefore, data flows become describable and the independency of the data paths is improved. As a result, such structures are readily provided that conducts the data flow designation while executing another instruction of the program, and even allows internal data path of the data processing system in the idle state to be lent for a more urgent processing that is being executed in another external data processor or another data processing system within the same chip.
Moreover, it is desirable that content or function of processing in the processing unit capable of configuring data paths by combining thereof, is changeable or variable according to the data flow designation instruction. In other words, in the data flow designation unit and the data flow designation step, it is desirable that the content of processing in the processing unit is changeable or variable according to the data flow designation instruction. This enables improvement in flexibility of the data path formed from a combination of the processing units, whereby an increased number of data processings can be conducted by the data-flow-type process with reduced hardware resources, allowing for improvement in performance.
The FPGA architecture may be employed in individual processing units. As described above, however, it takes a long time to dynamically change or reconfigurate the hardware, and also another hardware for reducing that time of reconfiguration is required. This makes it difficult to dynamically control the hardware within the processing unit during execution of an application. Should a plurality of RAMs be provided with a bank structure for instantaneous switching, switching on the order of several to several tens of units of clocks would requires a considerable number of bank structures. Thus, it is basically required to make each of the macro cells within the FPGA programmable independently and to detectable the time or timing of switching for implementing a program-based control mechanism. However, the current FPGA architecture is not enough to deal with such a structure, and a new instruction control mechanism for designating switching at an appropriate timing is required.
Accordingly, in the present invention, it is desirable to employ as the processing unit a circuit unit including a specific internal data path(s). The processing units having somewhat compact data paths are prepared as templates and combinations of the data paths are designated so as to conduct the data-flow-type processing. In addition, a part of the internal data path of the processing unit is selected according to the data flow designation instruction so as to change the function or content of processing performed in the processing unit. As a result, the hardware becomes more flexibly reconfigured in a short time.
For example, a processing unit including at least one logic gate and the internal data path(s) connecting the logic gate with the input/output interfaces makes it possible to change the processing content of the processing unit by changing the order of data to be input/output to the logic gate, changing connection between the logic gates or selecting the logic gate, and these changing and/or selecting are possible only selecting a part of the internal data path that is prepared in advance. Therefore, the content of processing in the processing unit is varied in a reduced or shorter time as compared to the FPGA that reconfigures the circuitry are possible in the transistor level. Moreover, the use of the internal data path that is ready to use some purpose previously reduces the number of redundant circuit elements and increases the area utilization efficiency of the transistors. Accordingly, the mounting or packaging density becomes high and economical processing system is provided. In the system, the data paths suitable for high-speed processings are provided and AC characteristics of the system become also excellent. Therefore, in the present invention, in the data Sow designation unit and step are capable, it is desirable to select a part of the internal data path of the processing unit according to the data flow designation instruction.
It is also desirable that the data flow designation unit has a fixation as a scheduler for managing the interface of the processing unit, in order to manage a schedule retaining the interface of each processing unit that is set based on the data flow designation instruction. For example, in the case where matrix calculation is performed only for a fixed time and filtering is conducted thereafter, the connection between the processing units within the data processing system are performed prior to execution of each processings and the each connection is kept using a time counter. Replacing the time counter with another comparison circuit or external event detector enables more complicated, flexible scheduling to be implemented.
Moreover, it is desirable that input and/or output interfaces of a processing block formed from a plurality of processing units are defined according to the data flow designation instruction. Since the interfaces in a plurality of processing units become changeable or reconfigurable with a single instruction, data paths associated with the plurality of processing units becomes changeable or reconfigurable with the single instruction. Accordingly, in the data flow designation unit, it is desirable that to change or configure the input and/or output interfaces in a processing block formed from a plurality of processing units, according to the data flow designation instruction.
It is more desirable to provide a memory storing a plurality of configuration data defining the input and/or output interfaces in the processing block, and, in the data flow designation unit or step, to change the input and/or output interfaces in the processing block by selecting one of the plurality of configuration data stored in the memory according to the data flow designation instruction. Since the configuration data is designated with the data flow designation instruction, changing of the interface of the plurality of processing units is controlled from the program without making the instruction itself redundant.