1. Field of the Invention
The present invention relates to a digital signal processing system and, more particularly, to a video codec system which turns input video data into divisions thereof, the divisions being coded in parallel and composed again for transmission to a receiving side.
2. Description of the Prior Art
FIG. 1 is a block diagram of a video coder implementing a typical prior art video codec method illustratively shown in "A Real-time Video Signal Processor Suitable for Motion Picture Coding Applications" (IEEE GLOBECOM '87, pp. 453-457, 1987). In this figure, reference numeral 1 is input video data; 2 is a plurality of digital signal processors (DSP's) which are disposed in parallel and which code the input video signal in parallel; 3 is a pair of data transfer controllers for controlling the division of the data and the transfer thereof to the DSP's 2; 4 is the data to be transferred from the data transfer controllers 3 to each of the DSP's 2; and 5 is the data processed by the DSP's 2.
In operation, the data transfer controllers 3 turn a single frame of the input video data 1 into divisions and distribute them to the DSP's 2. After being processed by the DSP's, the transferred data 4 is forwarded as the processed data 5 to the next processing block. FIG. 2(a) shows the data area to be processed by each of the DSP's 2. As indicated, the input video data 1 is turned in this case into four divisions, A through D, for parallel processing by the DSP's 2. All DSP's equally share the burden of the processing. The four areas constitute the single frame of the input video data 1 that was divided earlier.
A case may be assumed in which the input video data 1 is coded by a prior art interframe video coding method or its equivalent. This method generally involves conditional pixel compensation processing. That is, what is coded by this method is only those portions of video data whose differential between a given input frame and its preceding frame exceeds a certain level of magnitude; the remaining data portions are replaced by the previous frame data when coded. There may thus occur a case where the number of pixels is the same for the areas to be covered by the DSP's 2 but the amount of operation needed to perform the processing is not the same for these areas. In that case, the amount of operational requirements or the required operation time is proportional to the rate of effective pixels.
FIG. 2(b) is an example of how effective pixels are distributed where the input video data 1 is turned into four divisions, A through D, by the interframe coding method. The time required for the DSP's 2 to operate on each block of data is equivalent to the time taken by the DSP 2 whose number of effective pixels is the largest.
Where the numbers of effective pixels are unevenly distributed or vary over time throughout single frame input video data, the prior art video codec method, structured as outlined above, has been dependent for its processing time on the performance of the DPS whose processing time is the longest. A disadvantage of this scheme is that the overall processing efficiency per frame tends to be reduced. Another disadvantage is that where coding extends across different processing areas, the coordination of processes between the DPSs becomes complicated.
FIGS. 3 through 5 are views explaining how a typical prior art motion compensation method works, an example thereof being found illustratively in "Interframe Coding Using Motion Compensation and Predictive Coding" (Hideo Kuroda, Naoki Takekawa, Hideo Hashimoto; 85/1 Vol. J68-B No. 1, pp. 77-84; periodical of the Japan Society of Telecommunications Researchers). This example describes in particular a way to carry out a total search algorithm.
In FIGS. 3 through 5, reference numeral 21 is an input signal that conveys input video data; 22 is an input frame buffer that temporarily stores a single frame of input data; and 23 is a current input block with a block size of l1.times.l2 for motion compensation at a given location in the current input frame. Reference numeral 24 is a motion vector search range delimited by the limits l1+2m, l2+2n within which exists a block to be matched with the current input block 23 in preceding input frame reproduction data. In this case, the number of blocks (M blocks) to be searched for is given as EQU M=(2m+1).times.(2n+1) (100)
Thus the search range is between -m and +m pixels horizontally, and between -n and +n pixels vertically.
Motion compensation works as follows: In an interframe codec system, a frame-to-frame correlation is found between the current input frame data and the preceding input frame reproduction data. By use of this correlation, a process is performed in units of blocks to obtain the predictive signal closest to the current input frame data.
The motion vector search range 24 in the preceding input frame reproduction data is searched for the block whose block-to-block distortion is the smallest relative to the current input block, i.e., whose correlation thereto is the highest. A typical condition for this case to work is that the sum of absolute differential values be the smallest. This process provides motion vector and predictive signal data.
Also in FIGS. 3 through 5, reference numeral 25 is a motion compensation circuit that obtains a predictive signal by making correlation approximations on the current input block 23 of the input signal 21 and on the motion vector search range 24 given as the preceding input signal reproduction data; 26 is a predictive signal output by the motion compensation circuit 25; 27 is motion vector information that is also output by the motion compensation circuit 25.
Reference numeral 29 is a coder that outputs a coded signal 30 by coding a differential signal 28 derived from the difference between the input block signal 23 and the predictive signal 26; 31 is decoder that decodes the coded signal 30 following the coding by the coder 29.
Reference numeral 34 is a frame memory which adds the decoded signal 32 from the decoder 31 and the predictive signal 26 from the motion compensation circuit 25 to generate reproduction data 33 for storage, thereby providing the motion compensation circuit 25 with the vector search range 24. Numeral 35 is a transmission buffer, and 36 is a transmission signal.
Referring now to FIGS. 4 and 5, the operations involved will be further described. A block X is assumed as the current input block 23 located in a specific position inside the current input frame and measuring 1.times.2. With respect to the block X, there is calculated the amount of distortion among M blocks inside the motion vector search range 24 in the preceding input frame reproduction data. The calculation yields the block having the least distortion. This is the least distortion block "yi" whose position relative to the current input block 23 is obtained as a motion vector V. At the same time, a signal "ymin" corresponding to the block "yi" is output as the predictive signal 26.
The interframe codec system is also capable of generating the predictive signal 26 on the signal receiving side. For example, there may be assumed M motion vectors V to be searched for in a given motion vector search range 24, M being an integer larger than 1. In this case, the amount of distortion between the preceding frame block located in the motion vector V and the current input block is represented by the sum of absolute differential values therebetween. The distortion "di" is given as ##EQU1## The input block is given as EQU X={x1, x2, . . . xL}
The block to be searched for is given as EQU yi={yi1, yi2, . . . yiL}
where, i=1.about.M, and L is equivalent to l1.times.l2. The motion vector V is given as EQU V=Vi {min di.vertline.i=1.about.M} (102)
In the case above, the amount of operation, illustratively represented by S1, is obtained using the following expression in which "a" stands for a number of machine cycles needed to add absolute differential values and "b" for a number of machine cycles to carry out a compare operation: EQU S1=M(L.times.a+b) (103)
An example may be assumed where a=1 machine cycle; b=2 machine cycles; 1=8; 2=8; m=8; and n=8. In that case, L=64 and M=289. As a result, one gets: EQU S1.apprxeq.19000 (104)
The volume of operation S1, which thus amounts to 19,000 machines cycles, is a very large value considering the hardware configuration involved. The requirement has been met by use of high-speed operation systems featuring pipeline processing or the like in keeping with the cycles of the frames making up the video signal.
How to simplify the hardware configuration has been a big challenge. Japanese Patent Laid-open No. 63-181585, "TV Signal Motion Compensation Interframe Coding Apparatus," proposes a method for compensating tree-search motions in order to reduce the amount of operation involved.
As shown in FIG. 1, the prior art tree search motion compensation method involves disposing first target blocks to be searched for (O) spaced equally apart at a low concentration inside the motion vector search range 24. When the block having the least distortion is detected from among the first target blocks, second target blocks (.quadrature.) are disposed within a narrowed range around the least distortion block (O). When the block having the least distortion is again detected from among the second target blocks, third target blocks (.DELTA.) are disposed within a further narrowed range around the least distortion block (.quadrature.). Search for and detection of the least-detection blocks thus continue, until the block having the least distortion within the motion vector search range 24, in this case block (.DELTA.), is identified.
In the case above, the amount of operation S2 is given as EQU S2={9.times.L.times.a+9.times.b}.times.3 (105)
Under the same condition as given earlier, one gets: EQU S2.apprxeq.1800
The operation represented by 1800 machine cycles available with the tree search motion compensation method is an appreciable reduction from the high operational requirement in the case of the total search method.
Since the prior art motion compensation method is constructed as outlined above, attempts to perform the total search, which is highly reliable, during motion compensation have inevitably led to vastly increased amounts of operation. This has required setting up hardware on large scales. Likewise, attempts to reduce the amount of operation by use of the tree search method or the like have resulted in the deterioration in the system's ability to detect the least distortion block. That is, there is a growing possibility that a block located away from the true least distortion block will be selected during matching operation of the initial low concentration block search. Where that scheme is employed, there have been increasing numbers of cases in which the system fails to detect the predetermined least distortion and incorrectly passes a judgment of no correlation between blocks. There has been little choice but to accept the resulting inefficiency in data transmission.
FIG. 8 is a block diagram illustratively showing a typical prior art video coding system, "Real-time Video Signal Processor Module" (in Proc. ICSSP '87, Apr. 1987, pp. 1961-1964). In FIG. 8, reference numeral 51 is an input terminal through which input data is entered; 52 is a plurality of processors (M units) for performing signal processing of the input data; 53 is an output terminal through which the result of the processing by the processors 52 is output via an output bus; 70 is one frame of screen data to be output through the input terminal 51; and 71 is a plurality of divided windows constituting one frame of the screen data 70.
FIG. 7 is a block diagram showing a typical high efficiency coding algorithm. In this figure, reference numeral 51 is the input terminal; 60 is a motion compensation circuit that performs motion compensation on the input data from the input terminal 51; 61 is an interframe differentiation circuit that differentiates the data from the motion compensation circuit 60 and the data from the input terminal 51; 62 is a block discrimination circuit that separates the data from the interframe differentiation circuit 61 into significant and insignificant block; 63 is a codec circuit that codes and decodes the significant block data coming from the block discrimination circuit 62; 64 is an interframe addition circuit that adds the decoded data from the codec circuit 63 and the data from the motion compensation circuit 60; 65 is a coding frame memory that stores the data from the interframe addition circuit 64; 66 is a pre-processing circuit that includes the motion compensation circuit 60 and the interframe differentiation circuit 61; 67 is a post-processing circuit that contains the codec circuit 63 and the interframe addition circuit 64; and 68 is an output terminal through which the processed output data is output.
In operation, this video coding system addresses motion video signals as follows: The system divides one page of screen data 70 into M windows of screen data 71 which are assigned to the processors 52. It takes a single frame time for the processors 52 to get their respective window data 71. Then it takes another single frame time for the processors 52 to carry out the necessary process required of them. The results are synchronized between the processors 52 for output onto an output bus. At this time, the individually processed window data 71 are composed again into a single frame format.
When the processing method described above is employed, the time T required to turn one frame into M divisions for processing is given as ##EQU2## where,
T.sub.f : time required for one processor to process one frame
T.sub.fn : time required for an n-th processor to perform its processing per window
Therefore, increasing the number of data divisions allows processors 52 of a relatively low speed version to perform high speed video processing. Meanwhile, the slowest processor 52 determines the overall processing speed.
FIG. 7 thus illustrates the algorithm of a high-performance coder that addresses motion video screens. In this setup, the motion compensation circuit 60 performs motion compensation on all input data coming from the input terminal 51. After differentiation with the input data by the interframe differentiation circuit 61, only the significant blocks extracted by the block discrimination circuit 62 are sent to the codec circuit 63 for coding and decoding. At this point, the following relationship exists between the significant block ratio inside the windows .alpha. and the window processing time T: ##EQU3## where,
a, b: constants
B.sub.N : number of blocks inside windows
FIG. 9 illustrates the relationship given by the expression (2) above. In the prior art video coding system, the processors 52 synchronize with one another in carrying out their input and output. That is, the same maximum processing time need to be assigned to each of the processors 52. As shown in FIG. 9, the system develops during operation an idle time which is represented by the area of the shaded portion.
In cases where it takes different times to perform the processing depending on the block to be processed, the prior art video coding system, structured as described above, requires assigning the longest processing time equally to each of its processors 52. This has lead to the problem of unnecessarily increasing the number of processors despite the redundancy in their performance.
FIG. 10 is a block diagram of a typical prior art digital signal processing system in its simplified form presented in "A DSP Architecture for 64 Kpbs Motion Video Codec" (International Symposium on Circuit and System, ISCAS '88, pp. 227-230, 1988). In FIG. 10, reference numeral 81 is an instruction memory that stores a microprogram instruction word; 82 is an instruction execution control circuit that reads an instruction word from the instruction memory 81, interprets it, and performs operational control accordingly; 83 is a data input bus that mainly transfers data and control signals; 84 is a data memory which stores operation data and which has a plurality of input/output ports; 85 is a data operation circuit that performs various operations on up to two pieces of input data coming from the data memory 84 via the data input bus 83; 86 is an address generation circuit that generates addresses independently for two pieces of data input to the data operation circuit 85 and one piece of data output therefrom; and 87 is a data output bus that transfers the results of the data operation.
The operations involved will now be described by referring to the flowchart in FIG. 11. This is an example in which two pieces of input data comprising "n" bits ("n" is an integer greater than 0) are subjected to a binary operation by the data operation circuit 85. The data (comprising "n" bits) resulting from the operation is subjected to a limiting process in which "m" bits ("m" is an integer equal to or greater than "n") are regarded as significant bits and handled as such.
The instruction execution control circuit 82 notifies the instruction memory 81 of the address given via an address path 101. The corresponding instruction word is read from the instruction memory 81 via a data path 102. The instruction execution control circuit 82 then interprets the instruction word that was read, provides the address generation circuit 86 with a control signal via a data path 104 and, as required, transmits data or the like onto the data input bus 83 via a data path 103.
The control signal causes the address generation circuit 86 to notify the data memory 84 of the addresses of the two pieces of input data ("n" bits each) needed for the operation involved. In turn, the data memory 84 sends the two pieces of input data onto the data input bus 83 via a data path 105. The data operation circuit 85 receives via a data path 106 the two pieces of input data placed on the data input bus 83. The data operation circuit 85 performs the binary operation specified by the instruction execution control circuit 82 by way of the data path 103. The resulting data (of "n" bits) is transmitted to the data output bus 87 via a data path 108. The data placed on the data output bus 87 is input via a data path 109 to the data memory 84, and is stored at the address therein given by the address generation circuit 96 via a data path 107. The above processes constitute step ST1.
In step ST2, following the above-described input operation, the data operation circuit 85 admits, again via the data path 106, the operation result data from the data memory 84. The data operation circuit 85 then executes a MAX instruction (whose significant bit count is "m"), one of the instruction sets specified by the instruction execution control circuit 82. The MAX instruction is an instruction which, when executed, takes the larger of the two maximum values: one represented by the operation result data and the other by the data of "m" significant bits, and handles the chosen value as the resulting output. A check is made to see if the operation result data exceeds the maximum value. If it does, a limiting process is carried out. The above-described output operation causes the data resulting from executing the MAX instruction to be stored into the data memory 84. This completes step ST2.
In step ST3, by performing the input operation described above, the data operation circuit 85 admits again via the data path 106 the data resulting from executing the MAX instruction, the data being retrieved from the data memory 84. Then a MIN instruction (whose significant bit count is "m"), one of the instruction sets, is executed. The MIN instruction is an instruction which, when executed, takes the smaller of the two minimum values: one represented by the operation result data and the other by the significant bit count "m" and handles the value as the resulting output. A check is made to see if the operation result data is smaller than the minimum value. If it is, a limiting process is carried out. The output operation described above causes the data resulting from executing the MIN instruction to be stored into the data memory 84. This completes step ST3.
The limiting process will now be described in more detail by referring to FIG. 12. In this example, operation result data of "n" bits (MSB and LSB denote the most and the least significant bit, respectively) is assumed, as shown in FIG. 12(a). In a limiting process involving "m" significant bits (m&lt;n) , the high-order (n-m) bits are considered the data equivalent to the MSB. The remaining "m" bits are regarded as "m" bit data if the operation result data falls within a range represented by the "m" bits. If the operation result data exceeds the maximum value that can be represented by the "m" bits, the "m" bits are regarded unchanged as the "m" bit data; if the operation result data is smaller than the minimum value that can be represented by the "m" bits, the minimum value is regarded as the "m" bit data [FIG. 12(b)].
Where the significant bit count "m" equals "n", the limiting process is equivalent in effect to a case where no limiting process is carried out.
Since the prior art digital signal processing system is constructed as described above, performing the limiting process on operation result data has required executing as many as three instructions including an operation. This has led to the problem of reduced processing efficiency in prior art systems of this type.