The present invention relates to a multiprocessor type time varying image encoding system which executes encoding in units of blocks by assigning processing tasks substantially equally to digital signal processor modules (hereinafter referred to as "DMM's") each comprising a plurality of digital signal processors (hereinafter referred to as "DSP's"). The present invention also relates to a data bus control method which employs DSP's, and to an image processor which subjects an input signal, for example, a television signal, to a high efficiency encoding processing.
Referring to FIG. 17, which is a block diagram showing the arrangement of a conventional multiprocessor type time varying image encoding system disclosed, for example, in PCS 88P15.2 "ARCHITECTURE OF A FULL MOTION 64KBIT/S VIDEO CODEC", a CPU 171 controls the system, and a frame store/common memory 172 stores input data 1000. DSP's (Digital Signal Processors) 173a to 173h execute encoding according to an encoding program, and local memories 174a to 174h store data. A VME bus 175 connects together the CPU 171 and the DSP's 173a to 173h. A memory bus 176 connects together the frame store/common memory 172 and the local memories 174a to 174h. Reference numeral 1001 denotes transmission data.
In this system, the DSP's are arranged in parallel to execute processing for respective regions of an input image which are fixedly assigned thereto.
In the arrangement shown in FIG. 17, six of the eight DSP'S 173a to 173h are in charge of processing a luminance signal. An input image is equally divided into six regions along vertical lines, and these regions are assigned to the six DSP's. The remaining two DSP's are in charge of processing two different kinds of color difference signal. Thus, each DSP executes encoding processing for the region assigned thereto.
In general, time varying image encoding involves feedback control in which a picture frame is divided into L (any integer, assumed to be 18 in this example) regions, and an encoding control parameter (TCR) for the (i+1)-th region is set on the basis of the amount of information generated up to the i-th region, which has already been subjected to the encoding process. FIG. 18(a) shows one example of the division of a picture frame among DSP's and the way of frame division for feedback control. In the example shown in FIG. 18(a), a picture frame is divided into three regions for three DSP's to effect feedback control, thereby encoding only the luminance signal, for simplification of the explanation. More specifically, the picture frame is divided into three regions A, B and C for the three DSP's, and the regions are each subdivided into three regions, i.e., A1 to A3, B1 to B3 and C1 to C3.
Referring back to FIG. 17, the input data 1000 is written into the frame store/common memory 172 for only one frame at a time.
The CPU 171 instructs the eight DSP's 173a to 173h to transfer data sequentially. In response to the instructions, the DSP's 173a to 173h transfer the input data for the regions assigned thereto, together with the feedback data, already coded, concerning the regions required to execute encoding for the regions assigned thereto, from the frame store/common memory 172 to the respective local memories 174a to 174h through the memory bus 176.
After completion of the transfer process, each of the DSP's 173a to 173h divides its first assigned region into blocks as being units of processing and sequentially executes a plurality of different kinds of processing task for each block according to a predetermined order. Then, the DSP's 173a to 173h transfer the coded data to the CPU 171 through the VME bus 175. In addition, the DSP's 173a to 173h locally decoded the coded data to prepare feedback data and transfer it to the frame store/common memory 172 through the memory bus 176.
After completion of the processing for the first assigned regions, the DSP's 173a to 173h stand by until the CPU 171 instructs them to start processing for the subsequent regions.
The CPU 171 receives the coded data-from the DSP's 173a to 173h through the VME bus 175, reconstructs the data in the sequence determined according to the transmission format, adds multiplex information to the reconstructed data to prepare transmission data 1001 and sends it to the transmission line. The CPU 171 further monitors the DSP's 173a to 173h as to whether or not they have completed the processing for the assigned regions. When detecting that all the DSP's 173a to 173h have completed the processing for the assigned regions, the CPU 171 instructs the DSP's 173a to 173h to start processing for the subsequent regions.
The conventional multiprocessor type time varying image encoding system, arranged as described above, suffers, however, from the following problems. When the amount of arithmetic operation required for a processing varies with spatial and temporal changes as in the time varying image encoding process [see FIG. 18(b)], a DSP which has completed the processing for the assigned region must wait until the other DSP's complete the processing for the assigned regions. Consequently, the processing efficiency per unit DSP is low. Accordingly, the number of DSP's arranged in parallel must be set by taking into consideration the maximum processing quantity for the assigned regions, and therefore the number of DSP's required becomes extremely large. As the number of DSP's arranged in parallel increases, the overhead increases. In a case where the processing block size differs depending on the kind of task, the data to be processed cannot be divided into blocks which are smaller than the largest block size and the number of DSP's which can be employed is limited. When the number of DSP's arranged in parallel is small, the capacity of each local memory must be large and it is difficult to effect feedback control.
In the conventional multiprocessor type time varying image encoding system, the input data and the feedback data are transferred from the frame store/common memory 172 to the local memories 174a to 174h through a single memory bus 176. However, an image encoding process needs to transfer a large amount of data, for example, about 1,400 words for effecting motion compensation and discrete cosine transform encoding in which a block consisting of 16.times.16 picture elements is defined as one processing unit. Although there is no problem when image data at predetermined positions are sequentially transferred to the local memories of the processors as in the case of the prior art, if adaptive block assignment wherein each DMM is adaptively assigned to a block at random is adopted, the transmission speed of the bus system functions as a bottle neck and this invites a lowering in the processing efficiency.
FIG. 17 also shows a conventional data bus control method that employs a plurality of DSP's.
In the figure, a VME bus 5 is connected to a CPU 171 on the one hand and to a frame store/common memory 172 and DSP's 173a to 173h on the other. The DSP's 173a to 173h are provided with local memories 174a to 174h, respectively. The local memories 174a to 174h and the frame store/common memory 172 are connected to each other through a memory bus 176. Input data 1000 is inputted to the frame store/common memory 172, and transmission data 1001 is inputted to and outputted from the CPU 171.
Even in the case of parallel processing wherein the DSP's 173a to 173h execute processing for respective regions of an input image which are fixedly assigned thereto as in the illustrated example, bus contention occurs when the DSP's 173a to 173h transfer the input data from the frame store/common memory 172 to the respective local memories 174a to 174h at the time of starting processing for the assigned regions, or when the DSP's 173a to 173h transfer the feedback data from the local memories 174a to 174h to the frame store/common memory 172 after completion of the processing for the assigned regions. When bus contention occurs, the DSP's 173a to 173h stand by until they receive a common memory access instruction.
In task distributing parallel processing in which regions and tasks are variably assigned to the parallel-arranged DSP's 173a to 173h as occasion demands, if a common memory access request is outputted each time a task is completed, bus contention occurs even more frequently as the number of parallel DSP's employed increases, resulting in a lowering in the processing efficiency of the DSP's 173a to 173h.
In the conventional bus control method described above, a common memory access request is outputted when it becomes necessary to access the common memory. Accordingly, when two or more processors output a common memory access request at the same time, bus contention occurs. In such a case, a processor which is not permitted to use the common memory can perform no operation until it obtains permission to use the common memory, which invites a lowering in the processing efficiency due to the bus transmission speed bottle neck. Thus, the conventional bus control method has problems to be solved.
FIG. 19 is a block diagram of a conventional image processor disclosed, for example, in Japanese Patent Public Disclosure No. 62-86464. This image processor is arranged such that one picture frame is divided into a plurality of sectional frames #1 to #3, as exemplarily shown in FIG. 20, and unit processors (i.e., unit signal processors) are assigned to the sectional frames #1 to #3, respectively, thereby parallel-processing image signals by a plurality of unit processors, and thus achieving high efficiency encoding of time varying image signal (e.g., television signals). In FIG. 19, reference numeral 191 denotes an input bus for a image signal, e.g., a television signal, 192 a feedback bus for coded/decoded partial frame signals, 193 an output bus for outputting the result of encoding, and 194a to 194c unit processors which are assigned the sectional frames #1 to #3, respectively. Each of the unit processors 194a to 194c has a fetching unit 195, a processing unit 196 and an output unit 197. The fetching unit 195 fetches and stores an input image signal (partial frame signal) for the assigned sectional frame region from the input bus 191 and a coded/decoded signal (described later) for neighbor processing from the feedback bus 192, in synchronism with a fetching instruction for the assigned sectional frame region. It should be noted that, as an example of the neighbor processing, a technique disclosed in Japanese Patent Public Disclosure No. 62-266678 is known. The processing unit 196 subjects the stored image data to a processing such as an encoding/decoding processing. The output unit 197 sends to the output bus 193 a coded signal as the result of the processing executed in the processing unit 196 and also sends the above-described coded/decoded signal to the other unit processors through the feedback bus 192, as being an input image auxiliary signal, in synchronism with a subsequent fetching signal.
FIGS. 21 and 23 show the relationship between the signal fetching time and the signal processing time in each of the unit processors 194a to 194c with regard to the signals on the buses. In the figures, the unit processors which are in charge of the sectional frames #1 to #3 are represented by #1 to #3, respectively, for simplification of the explanation.
Input partial frame signals S1 to S3 as being television signals corresponding to-the sectional frames A to C flow on the input bus 191 temporally sequentially, as shown in FIG. 21. The unit processor 194a, for example, fetches the input partial frame signal S1 for #1 from the input bus 191 and stores it into the fetching unit 195 in synchronism with a fetching operation timing such as that shown in FIG. 21. The input partial frame signals S1 to S3 are inputted at a constant speed, i.e., F (natural number) frames per second. Accordingly, the processing of the fetched input partial frame signals S1 to S3 must be completed before subsequent input partial frame signals S1 to S3 are fetched.
The coded signals that are obtained as the result of the processing are outputted to the output bus 193 at the same time as the subsequent fetching operation. In the motion compensating interframe encoding method that is often adopted as a high efficiency image encoding technique, encoding is effected by determining the difference between an input image P and an image Q in the decoded frame which immediately precedes the present frame, the image P being separated from the image Q on the frame by an amount corresponding to the motion, as shown in FIG. 22. Accordingly, the encoding processing necessitates a coded/decoded frame the area of which is enlarged by an amount corresponding to the motion. For such an encoding processing, data for a wider area than the area covered by each of the input partial frame signals S1 to S3 is needed. Further, the coded/decoded partial frame signals F1 to F3 are outputted to the feedback bus 192 from the output units 197 and fetched into the fetching units 195 in synchronism with the fetching timing shown in FIG. 21. At this time, the fetching time lengthens by the time t required to fetch data for the excess of the area covered by each of the input partial frame signals S1 to S3. Thus, the encoding processing is executed by fetching the coded/decoded partial frame signals F1 to F3 for wider areas than the assigned sectional frames, and the resulting signals O1 to O3 are outputted to the output bus 193.
In the conventional image processor having the above-described arrangement, a kind of pipeline processing is executed on the assumption that the processing time of the unit processors 194a to 194c falls within a predetermined time 1/F. Accordingly, in an image processing such as high efficiency encoding, the processing time varies depending upon the kind of input image, and the number by which the picture frame is divided must be set on the basis of the longest processing time, as described above. Even if the average processing time is much shorter than the longest processing time, the number by which the picture frame is divided cannot be reduced and, after all, a large number of unit processors must be prepared.
In the arrangement shown in FIG. 21, the sum total of the time required for the unit processors 194a to 194c to fetch partial image signals and the time required to execute processing, for example, encoding/decoding processing, for the partial image signals is not greater than the period of inputting one picture frame from the input bus 191. Accordingly, the above-described processing can be continued without delay. However, in such a case that a part of one picture frame, for example, the region #2 has more motion than the other frame regions, the processing time in the unit processor 194b, which is in charge of the region #2, becomes longer than the processing time in the other unit processors 194a and 194c, as shown by the hatching in FIG. 23, and the unit processors 194a and 194c undesirably have a waiting time.
Thus, since in the prior art the sectional frames assigned to the unit processors 194a to 194c are contiguous with each other, if the processing in one unit processor takes a relatively long time because a picture frame has a partiality in the nature (i.e., a large difference in the amount of data to be processed), the other unit processors are adversely affected, resulting in a lowering in the processing capacity, even if the image processor has an overall capacity sufficient to process the input image signal within a predetermined period of time. This problem may be solved by narrowing the sectional frame assigned to each unit processor and increasing the number by which one picture frame is divided. Such a solution, however, gives rise to another problem, that is, a rise in the cost of the image processor because of an increase in the number of unit processors used.