1. Field of the Invention
The present invention relates to a data processing system, and more specifically to a data input processing for a picture coding processing in a data processing system.
2. Description of Related Art
A high performance picture coding processing (data compression of picture information) is being standardized in a plurality of organizations so as to ensure compatibility of compressed data in various applications. In the field of communication, the CCITT (International Telegraph and Telephone Consultative Committee) recommendations H.261 and an MPEG (Moving Picture Coding Experts Group) format in the ISO (International Organization of Standardization) for a package system (for example, CD-ROM (compact disc read only memory)), are important in standardizing a moving picture coding.
A picture coding processing, which is now being standardized, is realized by a combination of a plurality of unitary processing algorithms as shown in FIG. 1. In the shown processing, a main processing is constituted of a movement prediction processing 101, a frame difference processing 102, a DCT (discrete cosine transform) processing 103, a quantization processing 104, a zigzag scanning and zero detection 105, and an entropy coding processing 106. In addition, to complete the predicted picture at the coding side, a reverse-quantization processing 107, a reverse-DCT processing 108 and a frame accelerating processing 109 are performed. The result thus obtained is stored in a frame memory 110, which is fed back to a succeeding prediction processing 101.
Most of the above mentioned processings are executed for each one of a plurality regions obtained by dividing a whole of the picture into a meshed form. These processings include a processing performed in units of an 8.times.8 pixel region (called a "block") (for example, DCT processing 103) and another processing performed in units of 16.times.16 pixel region (called a "macro-block") (for example, movement prediction processing 101). In this connection, it is defined that, in a macro-block layer, a brightness signal Y is composed of information of 16.times.16 pixels, and color difference signals Cb and Cr are treated as information of 8.times.8 pixels obtained by cutting a vertical direction in half. Therefore, the macro-block includes four blocks of brightness signals and one block for each of two color difference signals, and accordingly, have information of six blocks in total.
The coding processing, only a portion to which the present invention is directed, will be further described in detail. The zigzag scan and zero detection processing 105 is performed after the DCT processing 103 and the quantization processing 104. Here, the zigzag scanning is to scan from a low frequency side to a high frequency side in both vertical and horizontal spatial frequency components (See FIG. 6). According to viewing characteristics of a picture, even if the DCT coefficient precision is made low for a high frequency region, it insensitive to distortion in a restored picture, and therefore, the quantization processing is made in such a manner that the higher the frequency component becomes, the rougher the quantization becomes. As a result, a zero value is apt to appear in a high frequency component in the DCT coefficients after quantization. Therefore, the zigzag scanning is used, since at the time of a zero-run length coding in an in-block coding, to gather the zero components distributed in the high frequency region is effective in elevating the coding efficiency (compression ratio).
In the picture coding of H.261 and MPEG, furthermore, the conception of the zero run length is adopted not only in the in-block coding but also in the macro-block layer. Namely, information as to which of the blocks in the macro-block has all zero DCT coefficients, or information as to how many the macro-blocks having all zero DCT coefficients continue, are encoded so as to elevate the compression ratio.
The former is coded in a field named a "CBP" (Coded Block Pattern) in the macro-block layer, for the purpose of indicating which of the six blocks has all zero DCT coefficients. The latter is coded in a field named a "MBA" (Macro Block Address), indicative of the number of the macro-blocks which were skipped because of all zero DCT coefficients.
As seen from the above, in order to realize the coding standard of MPEG and H.261, the CBP coding and the MBA coding based on the zero detection after the zigzag scanning are necessary.
On the other hand, the processing illustrated in FIG. 1 is required to be executed in real time since the moving picture is processed. For example, if a picture composed of 360 pixels in a horizontal direction and 240 pixels in a vertical direction is processed at a rate of 30 frames per second, a processing period for each one image is 33.3 ms, and a processing period for each one micro-block is about 100 .mu.s. In other words, if an operating frequency of a processing device is on the order of megahertz (MHz), a 100 system clocks are required to be allocated for the processing for each one micro-block. In a processing device having a 50 MHz operation (cycle time of 0.02 .mu.s), if each one macro-block processing is completed with 5000 clocks, the real time processing can be ensured.
Under current digital electronics technology, the picture coding processing requires an extremely high performance processing device. It is necessary to maintain the processing performance on the order of 500 to 1000 MOPS (million operations per second) on average. A current RISC (reduced instruction set computer) CPU (central processing unit) chip is on the order of 50 to 100 MIPS (million instructions per second). The picture coding processing requires a performance which is ten times that of the current RISC CPU chip.
In order to realize the above mentioned high performance, two approaches have been adopted. The first approach is to use a high speed parallel-processing DSP (digital signal processor) so that most of the necessary processings are executed by a DSP program. The second approach is to couple together a plurality of specialized processors so that most of the necessary processings are executed by specialized hardware.
In the first approach using the high speed parallel-processing DSP, an operation unit is typically added for a heavy load processing in order to elevate the processing performance. One example is shown in U.S. Pat. No. 4,823,201 to Simon et al. Referring to FIG. 2 showing a system disclosed in FIG. 2 of U.S. Pat. No. 4,823,201, a micro sequencer 226 control, in parallel, a data path 242, an ALU (arithmetic and logic unit) 244, a pixel interpolator 246, an entropy decoder 230, two input/output FIFOs (first-in first-out memory) 232 and 234, and an output FIFO 236, by use of a relatively long instruction word of 48 bits. Among these functional units, the pixel interpolator 246 and the entropy decoder 230 can be considered to correspond to the operation unit added for the heavy load processing.
In the Second approach using the specialized processors, on the other hand, specialized operation units are used for a heavy load processing such as the movement predicting processing 101 and the DCT processing 103, for the purpose of speeding up the processing. One example is shown in "Multimedia International standardized System `MPEG` Fixed, Moving Picture Compression/Expansion Chip Appears", NIKKEI ELECTRONICS No.554, pp147-154, Jan. 6, 1992.
In the disclosed approach using the specialized processors, however, a hardware for the block skip discrimination in the CBP coding and the MBA coding is not disclosed, although consideration was made on the operation units for a heavy load processing such as the movement predicting processing 101 and the like.
In the case of using a general-purpose processor, a zero block discrimination would be realized by an instruction sequence, in which data is read out from a memory, and then, a test instruction is executed by an ALU, and on the basis of the result of the execution, a condition for a conditional branch is discriminated.
Furthermore, when either a multiplier or a multiplicand is zero, an answer is also zero, and therefore, it is not necessary to actually perform the multiplication processing. Japanese Patent Application Laid-open Publication JP-A-59-066747 discloses a technique in which, there is added a hardware for discriminating whether or not either a multiplier or a multiplicand is zero before the multiplication is actually executed, so that the processing when the answer becomes zero is speeded up. In the disclosed system, a zero detection circuit is added to an input of a multiplication circuit, so that the zero detection is performed in parallel to the multiplication processing.
In brief, as shown in FIG. 3, a multiplier and a multiplicand supplied from a pair of inputs 302 and 303 are applied to a multiplication circuit 301 and also through a pair of gates 305 and 306 to a zero detection circuit 307, so that a zero flag 308 can be raised without waiting for the result 304 of the multiplication. The multiplication circuit 301 can be controlled by a control circuit 309 on the basis of the zero flag 308.
For the zero discrimination in the entropy coding, it is necessary to discriminate whether or not all of data included in the block (8.times.8=64 items of data) and in the macro-block (64.times.6=384 items of data) is zero. In order to execute this processing by use of a general purpose processor, it is necessary to repeat the zero discrimination a number of times corresponding to at least the data amount. 384 repetitions of the zero discrimination will occupy a substantial proportion of the moving picture real time processing. Assuming that the zero discrimination can be performed with only one clock in the 50 MHz general purpose processor, the 384 repetitions of the zero discrimination will occupy 7.6% of the processing time. This cannot be neglected in comparison with other processing, and therefore, should be further speeded up to ensure the real time operation.
In this connection, the above mentioned U.S. Pat. No. 4,823,201 processor was added with no hardware means for executing the zero discrimination processing by a manner other than the above mentioned instruction sequence.
On the other hand, if a hardware for simultaneously executing the zero discrimination of a large amount of data such as 64 words or 384 words at one time, is constructed in accordance with the two-word circuit of Japanese Patent Application Laid-open Publication JP-A-59-066747, the hardware becomes extremely large, and therefore, is not practical.
The zero discrimination has to be performed prior to the entropy coding including the CBP coding and the MBA coding. In this connection, the conception of Japanese Patent Application Laid-open Publication JP-A-59-066747 in which the zero detection and the multiplication processing are executed in parallel to each other, is not suitable for the entropy coding.