1. Field of the Invention
The present invention relates to a coding apparatus for compressing and encoding digital video data and other data at a high speed, suitable for application to for example digital video cassette tape recorders (DVC) for home use, and a method of the same and a decoding apparatus for decoding the compressed and coded data at a high speed and a method of the same.
2. Description of the Related Art
As an example of the coding method and decoding method of images which are now being used, an explanation will be given of the method which has become the standard compression and coding method of moving picture images and which is being applied to home DVCs and other digital video apparatuses.
FIGS. 1A to 1E are views of the flow of encoding and decoding of digital video data.
At the time of encoding, first, as shown in FIG. 1A, the image of one frame to be encoded is divided into processing units referred to as "macroblocks" (MB). This processing is referred to as "blocking".
Next, five macroblocks in one frame are selected according to a predetermined rule and dispersed while changing positions as shown in FIG. 1B to form a video segment. This processing is referred to as "shuffling". The shuffling is carried out so as to make the amount of encoding uniform by dispersing continuous data. Note that the number of video segments in one frame generated by the shuffling becomes one-fifth of the total number of the macroblocks. Note that the algorithm of the shuffling (method of selection of five macroblocks) is determined in advance and does not change at the time of execution due to for example a change of the compression rate.
Next, discrete cosine transform (DCT), weighting, quantization, and variable length coding (VLC) are applied to the formed video segment to encode the five macroblocks (MB0 to MB4) to five compressed data units of fixed length referred to as "sync blocks" as shown in FIG. 1C.
When looking at the compressed data of the individual macroblocks at the time of this encoding, they sometimes cannot be contained in the corresponding sync blocks. This is because, at the time of quantization, the generated amount of encoding is controlled out so that the compressed data of the five macroblocks is contained in five sync blocks as a whole. Therefore, after the variable length decoding (VLD), the compressed data is moved among the five sync blocks. This processing is referred to as a framing.
The framed compressed data is sent to for example a video recording and reproduction apparatus or other recording system.
Next, an explanation will be made of the decoding.
The decoding proceeds by a reverse flow to that of the encoding.
First, the compressed data sent from a video apparatus or other reading system is processed by moving the compressed data among the five sync blocks as mentioned above in order to extract the compressed data of the macroblocks. This processing is referred to as deframing.
Next, the extracted compressed data of each macroblock is subjected to variable length decoding (VLD), inverse quantization, inverse weighting, and inverse discrete cosine transform (IDCT) to decode the video segments as shown in FIG. 1D.
Next, the five macroblocks (MB0 to MB4) comprising each video segment are returned as data of the original position in the frame. Namely, the data of each macroblock is stored at an address of the position of the image in a frame memory etc. This processing is referred to as deshuffling.
Finally, the frame which has been divided to macroblocks is converted to data of a raster scan system as shown in FIG. 1E. This processing is referred to as deblocking.
Then, the deblocked image data is sent to the video input of for example a display.
Next, an explanation will be made of the configuration of the processing apparatus of the related art, the processing algorithm, and the flow of the processing for this encoding and decoding.
First, an explanation will be made of the encoding.
As the coding apparatus for carrying out the encoding as mentioned above, an apparatus having a configuration as shown in FIG. 2 has been used.
A coding apparatus 80 shown in FIG. 2 is configured by a blocking unit 81, a shuffling unit 82, a motion detection unit 83, a DCT unit 84, a classification unit 85, a data amount estimation unit 86, a quantization unit 87, a VLC unit 88, and a framing unit 89 connected as illustrated.
In such a coding apparatus 80, sequentially input video data are divided into macroblocks at the blocking unit 81 which are then used to generate video segments at the shuffling unit 82.
Next, motion is detected at the motion detection unit 83, the coding mode of the DCT to be carried out at the DCT unit 84 is determined, and the DCT is actually carried out at the DCT unit 84. Further, the DCT unit 84 carries out weighting by applying a predetermined filter to the obtained DCT.
Then, the classification unit 85 determines the class number for determining the quantization step, the data amount estimation unit 86 estimates the amount of data based on the class number to determine the quantization step, and the quantization unit 87 carries out the quantization with respect to the DCT result obtained at the DCT unit 84.
The quantized DCT result is subjected to variable length coding at the VLC unit 88 and then subjected to framing at the framing unit 89 to generate sets of five sync blocks, and the generated compressed data is output from the coding apparatus 80.
Further, a processing algorithm in a case where such encoding is carried out by a general purpose processing apparatus using for example a digital signal processor (DSP) is shown in FIG. 3.
As shown in FIG. 3, when the encoding is started by the processing apparatus (step S110), first, blocking is carried out with respect to sequentially input video data to divide it into macroblocks (step S111) and then shuffling is carried out to generate the video segments (step S112).
Next, motion detection is carried out to detect the part having motion and determine the coding mode of the DCT (step S113), DCT is actually carried out and weighting is carried out with respect to the obtained DCT result by applied a predetermined filter (step S114).
Next, the class number for determining the quantization step is determined (step S115), the amount of data is estimated based on the class number to determine the quantization step (step S116), and quantization is carried out with respect to the DCT result obtained at step S114 (step S117).
Next, the quantized DCT result is subjected to variable length coding (step S118) and framing is carried out to generate sets of five sync blocks (step S119).
The processings of step S111 to step S119 are sequentially carried out for one frame's worth of data (step S120) and further repeatedly carried out with respect to sequentially input frames (step S121). When the processing is carried out with respect to all frames to be encoded, the series of encoding is ended (step S122).
A timing chart of the processing in a CPU of the processing apparatus where coding is carried out according to the algorithm shown in FIG. 3 is shown in FIG. 4.
Note that, in FIG. 4, the processings VSj-BLK to SFL indicate processings of blocking (BLK) and shuffling (SFL) with respect to a (j+1)th video segment j (step S111 and step S112 in FIG. 3); the processings VSj-DCT to Q indicate processings from the motion detection to the quantization (Q) with respect to the (j+1)th video segment j (step S113 to step S117 in FIG. 3); and the processings VSj-VLC to FRM indicate processings of variable length coding (VLC) and framing (FRM) with respect to the (j+1)th video segment j (step S118 and step S119 in FIG. 3).
As illustrated, the processing apparatus sequentially carries out the processing steps of the flow chart shown in FIG. 3 for every video segment.
Next, an explanation will be made of the decoding.
As the decoding apparatus for carrying out the decoding as mentioned above, an apparatus having the configuration shown in FIG. 5 has been used.
A decoding apparatus 90 shown in FIG. 5 is configured by a deframing unit 91, a VLD unit 92, an inverse quantization unit 93, an IDCT unit 94, a deshuffling unit 95, and a deblocking unit 96 connected as illustrated.
In a decoding apparatus 90 having such a configuration, compressed data which is encoded by the coding apparatus 80 as shown in for example FIG. 2 and sequentially input is processed by moving the data among the five sync blocks at the defraying unit 91 to restore the data corresponding to the macroblocks, carrying out variable length decoding at the VLD unit 92 with respect to the obtained data for every macroblock, and carrying out inverse quantization at the inverse quantization unit 93.
Next, the inverse quantized DCT result is processed by carrying out inverse DCT at the IDCT unit 94 to transform this to pixel data and by applying a filter reversely weighted to that used at the DCT unit 84 to return it to the data before weighting.
Next, the obtained pixel data is returned to the data of the original pixel position at the deshuffling unit 95 and converted to the data of the raster scan system at the deblocking unit 96 to reproduce the original image data.
The processing algorithm where such decoding is carried out by the general purpose processing apparatus as mentioned above is shown in FIG. 6.
As shown in FIG. 6, when the decoding is started by the processing apparatus (step S130), first, the sequentially input compressed data is processed by moving the data among sync blocks to restore the data corresponding to the macroblocks (step S131) and the obtained data for every macroblock is subjected to variable length decoding (step S132).
Next, inverse quantization is carried out with respect to the variable length decoded data (step S133), the inverse quantized DCT result is subjected to inverse DCT and the result thereof inversely weighted to transform it to the original pixel data (step S134), the result is returned to the data of the original position in the frame by the deshuffling (step S135), and the result further converted to the data of the raster scan system (step S136).
The processings of step S131 to step S136 are sequentially carried out for one frame's worth of data (step S137) and further repeatedly carried out with respect to the sequentially input frames (step S138). Then, when the processing is carried out with respect to all frames to be decoded, the series of decoding is ended (step S139).
A timing chart of the processing in the CPU of the processing apparatus when the decoding is carried out according to the algorithm shown in FIG. 6 is shown in FIG. 7.
Note that, in FIG. 7, the processings VSj-DFR to VLD indicate deframing (DFR) and variable length decoding (VLD) with respect to the (j+1)th video segment j (step S131 and step S132 in FIG. 6); the processings VSj-IQ to IDCT indicate inverse quantization (IQ) and inverse DCT (IDCT) with respect to the (j+1)th video segment j (step S133 and step S134 in FIG. 6); and the processings VSj-DSF to DBL indicate deshuffling (DSF) and deblocking (DBL) with respect to the (j+1)th video segment j (step S135 and step S136 in FIG. 6).
As illustrated, in the processing apparatus, the processing steps of the flow chart shown in FIG. 6 are sequentially carried out for every video segment.
Summarizing the disadvantage to be solved by the invention, there is a demand for higher speed encoding and decoding of image and other data using a parallel processing apparatus having a plurality of processing apparatuses. In the parallel processing apparatuses and parallel processing methods heretofore, however, there have been various disadvantages, so a sufficiently high speed processing could not be achieved.
For example, where the coding and decoding are carried out by the parallel processing, heretofore, the processings of the steps has been distributed among a plurality of processing apparatuses and executed in parallel. In the case of such a parallel processing method, if the execution times of processings in the processing apparatuses are equal, the loads become uniform and very efficient processing can be carried out, but usually the execution times are not become equal. For this reason, the loads of the processing apparatuses become unequal, so highly efficient processing cannot be carried out.
Further, in such a parallel processing method, in the case of for example the above image data, since the processing with respect to one unit of data such as one video segment is carried out divided among a plurality of processing apparatuses, it is necessary to synchronize the apparatuses and control communication accompanying the transfer of data, so there also exists a disadvantage that the configuration of the apparatus and the control method etc. become complex.
Further, the processings to be carried out in the processing apparatuses are different, so it is necessary to prepare processing programs for each individual processing apparatus and separately control the processings for each processing apparatus, so there also exists a disadvantage that the configuration of the apparatus and the control method etc. become further complex.