1. Field of the Invention
The present invention relates to an information processing apparatus, information processing method, and storage medium and, more particularly, a data-driven information processing apparatus, information processing method, and storage medium.
2. Description of the Related Art
An example of conventional methods of efficiently processing data through parallel execution of processing circuits implemented by hardware components is a bus pipeline connection method (see, for example, Japanese Patent No. 2734246).
According to this connection method, data input to an input terminal from an external memory via an I/F is processed in a connection order, and then output to an external memory or the like from an output terminal. The processing order is, therefore, limited by the order in which the hardware components are connected when implementing them in advance, and it is impossible to process the data in an arbitrary processing order including order swapping.
To solve the above problem, a method of connecting processing circuits by a ring bus is disclosed (see, for example, Japanese Patent Laid-Open No. 01-023340 and Japanese Patent Nos. 2834210 and 2518293).
As a technique for performing filter processes of images in parallel, there is provided a method of adding a control code to data, sending the data to a ring bus, and receiving the data in accordance with the control code. This enables a plurality of processors to receive data of an overlapping portion (see, for example, Japanese Patent Laid-Open No. 63-247858).
The following method is also disclosed. That is, to reduce a decrease in processing speed due to a bus conflict while allowing the configuration of image processing to be readily changed, a plurality of image processing units and an (input/output) control unit are connected in a ring shape. Data are packetized, and then transferred in one way on the ring bus (see, for example, Japanese Patent No. 03907471).
Furthermore, the following method is proposed. That is, to implement a high communication bandwidth, a pipelined computer graphics system has a bus structure in which graphics processing elements are coupled in a ring shape. Each graphics processing element has a core processing unit and an interface unit which are coupled to receive a command and an information signal from a preceding processing element in the ring, and send information to a succeeding processing element in the ring. The above ring bus has a clock signal (CLK), a buffered information ready signal (B_Rdy), an un-buffered information ready signal (U_Rdy), a busy signal (Busy), a type field signal (Type[8:0]), and an information field signal (Info[31:0]). The clock signal (CLK), type field signal (Type[8:0]), and information field signal (Info[31:0]) are sent downstream on the ring. On the other hand, the buffered information ready signal (B_Rdy), the un-buffered information ready signal (U_Rdy), and the busy signal (Busy) are supplied upstream on the ring (see, for example, Japanese Patent No. 03880724).
A data overflow occurs when the data traffic amount exceeds an upper limit of an amount of data circulating on the ring bus. As a method of avoiding such a data overflow, there is disclosed a technique for inserting a delay packet, temporarily saving data from the ring bus, or suppressing output to the ring bus (see, for example, Japanese Patent Laid-Open No. 2-171975 or 7-325800).
Moreover, a memory access conflict occurs when processors which are connected with a ring bus and operate independent of each other intensively access processing data, reference data, and the like in a shared memory. To avoid such a memory access conflict, there is proposed a technique in which each processor has a local distributed memory including a two-port memory, and data necessary for processing are transmitted/collected to/in the memory, as needed (see, for example, Japanese Patent Publication No. 6-46413).
On the other hand, there is also disclosed a technique for writing (W) or reading (R) internal data by issuing a command for a storage unit connected to each distributed processor instead of using a two-port memory (see, for example, Japanese Patent No. 4359490).
Conventionally, however, configuration data such as setting parameters and processing data such as image data are not separated in a distributed internal memory (or register) of each processor placed on a ring bus.
On the other hand, in recent hardware implementation, many parameters may be needed to increase the degree of freedom of image processing. An increase in number of parameters decreases the performance of the image processing. Therefore, transfer of such setting parameters at high speed presents a problem.
Japanese Patent Laid-Open No. 2-171975 does not disclose a control method when writing/reading data or setting parameters in/from a distributed internal memory (or register).
Consider an image processing apparatus in which a plurality of processors are connected by a single ring bus for distributing data in one way. In this case, in configuration data processing for a distributed memory (or register) which is included in each processor connected to the ring bus, and holds setting values and the like, it is essential to time-divisionally multiplex and transfer the setting values and the like together with processing data such as image data. The configuration data processing includes write (W) processing, read (R) processing, and exchange (Ex) processing. If there exists a data packet which a processor cannot receive because it is busy, a stall bit is set for the data packet, and the data packet circulates on the ring bus. It is known that if the number of packets circulating on the ring bus exceeds a certain number, a deadlock occurs in which it is impossible to input data to the ring bus or extract data from the ring bus.
To avoid such a deadlock, image processing target data should be controlled to be input at appropriate speed. The appropriate speed should be determined in consideration of the image processing speed of a processor used for the processing, and packet traffic which circulates on the ring bus due to a change in processing order. The same goes for configuration data, and image processing target data should be controlled to be input at speed depending on the read/write processing speed for a distributed memory (or register) of each processor.
In fact, image data may have a large amount in the case of high-resolution print image processing, or may have a small amount in the case of preview image generation processing which processes an extremely small number of pixels such as 160×120 pixels. In this processing, for example, when a one-dimensional lookup table (to be referred to as “LUT1D” hereinafter) for 10-bit R, G, and B input data is set before executing the processing, it is necessary to transfer configuration data of 1024 entries×3 colors=3072 entries. This data amount makes up 16% of a total processing data amount. If there are two LUT1Ds, their data amount makes up 30% of the total processing data amount. A large configuration data amount has a large influence on the total processing amount.
In such situation, as disclosed in, for example, Japanese Patent Laid-Open No. 7-325800, a technique for suppressing data transmission for a specified period of time is proposed. However, it is generally possible to process configuration data at high speed as compared with processing data. In this technique, the configuration data are processed at low speed for the processing data, and it is thus actually impossible to obtain the setting processing speed.
As described above, the conventional techniques cannot control a configuration data transfer rate independently of an image data transfer rate prior to image processing, and therefore cannot fully exploit the actual processing performance.
In consideration of the above problems, the present invention provides a technique for controlling a configuration data transfer rate independently of an image data transfer rate prior to image processing, and executing the processing at higher speed.