1. Field of the Invention
The present invention relates to a signal-processing apparatus that performs audio and image compression/decompression at high speed by use of a parallel processor and dedicated hardware, and an electronic apparatus using the same.
2. Description of the Related Art
In response to the recent trend toward higher performance and downsizing of image processing apparatuses and image display apparatuses that handle moving images, the ISO (International Organization for Standardization) and the ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) are co-planning the standardization of MPEG-4 AVC (Advanced Video Coding) as a next-generation compression and decompression technology. The MPEG-4 AVC realizes a high image compression rate by introducing new technologies such as integer conversion of 4×4 pixels, intra prediction at up to nine directions, seven kinds of sub-macro-block types, up to 16 motion vectors per macro-block, multi-frame reference, a de-blocking filter in the loop and arithmetic coding, and aims at a code amount compressed to 50% of the MPEG-2 that has already been put into practical use.
However, the newly introduced coding tools adopt algorithms attaching importance to the coding efficiently; therefore, the processing amount is large and mounting to the built-in system is difficult.
For a prior signal-processing apparatus that performs compression and decompression with the encoding method, parallel processing by the processor and the a dedicated hardware structure have been used.
An example of the speed-enhanced signal processing using the parallel processing method by a processor is Document 1 (Japanese Patent Application No. H03-508269). The example shown in Document 1 is a parallel processor comprising a combination of a parallel data processor of the SIMD (Single Instruction Multiple Data) type in which the number of control streams is one and the number of data streams to be processed is more than one and a parallel data processor of the MIMD (Multiple Instruction Multiple Data) type in which the number of control streams and the number of data streams are both more than one.
FIG. 16, which is referred from FIG. 1 of Document 1, is a block diagram illustrating a signal-processing apparatus combining a prior SIMD parallel data processor 902 and an MIMD parallel data processor 903.
The signal-processing apparatus comprises a system controller 901 that controls the entire processor, the SIMD parallel data processor 902, the MIMD parallel data processor 903, a shared memory bus 904 and a shared memory 905.
The system controller 901 performs execution of application programs.
The SIMD parallel data processor 902 comprises an overall controller 910, calculators 911 to 914 and local memories 915 to 918 respectively. One calculator and one local memory constitute one processor. The overall controller 910 executes the program, and issues the same instruction to all of the calculators 911 to 914. The calculators 911 to 914 process data stored in local memories 915 to 918 respectively based on the same issued instruction.
The MIMD parallel data processor 903 comprises an overall controller 920, controllers 921 to 924, calculators 925 to 928 and local memories 929 and 932. One controller, one calculator and one local memory constitute one processor. A different program is executed by each of the controllers 921 to 924, a different instruction is issued to each of the calculators 925 to 928, and the data stored in each of the local memories 929 to 932 is processed. The overall controller 920 performs control for synchronization and monitoring of the entire MIMD parallel data processor 903.
In the parallel data processor as described above, when the object processing is simple and the data processing amount is large, the SIMD parallel data processor 902 performs processing, whereas when the object processing is complicated and the data processing amount is small, the MIMD parallel data processor 903 performs processing.
On the other hand, the speed enhancing method, which improves the calculation, is used by forming the most suitable calculator for the processing that is objected with the dedicated hardware. As an example thereof, Document 2 (Japanese Patent Application No. 2000-118434) discloses a technology that realizes a speedup of the processing by performing the variable-length encoding/decoding of the image processing with the dedicated hardware.
FIG. 17, which is referred from FIG. 1 of Document 2, is a block diagram illustrating an image processor 1001 combining the prior SIMD parallel data processor and the dedicated hardware.
The image processor 1001 is connected to an external video input device 1009, a video output device 1010 and an external memory 1011 through an external video data bus 1008. The image processor 1001 comprises an instruction memory 1002, a processor 1003, SIMD calculating means 1004, VLC (Variable-Length Coding) processing means 1005, an external data interface 1006, and an internal data bus 1007. The VLC processing means 1005 comprises the dedicated hardware.
The processor 1003 performs scalar operations, bit manipulation the issuance of comparison and branch instructions, and decodes the instruction held by the instruction memory 1002. The processor 1003 also controls the SIMD calculating means 1004, the VLC processing means 1005, the external data interface 1006, the video input device 1009 and the video output device 1010.
The video input device 1009 inputs the video signals from the outside, and the video output device 1010 outputs the video data to the outside.
The image data inputted by the video input device 1009 is transferred to the external memory 1011, and at the next step, is transferred to the external data interface 1006 according to the processing performed by the SIMD calculating means 1004. The SIMD calculating means 1004 performs motion compensation, DCT and quantization processing, and acquires transformed coefficient data. At the next step, in the VLC conversion means 1005, the transformed coefficient data is encoded in variable-length encoding by the VLC transforming means, and the bit stream is generated.
The SIMD calculating means 1004, which comprises eight parallel pipeline calculators, is capable of efficiently performing routine processing such as DCT.
The signal-processing apparatus comprising a combination of the SIMD data-parallel processor and the MIMD data-parallel processor is typified by the above-described Document 1, and is flexible for various coding algorithms. Thus, the signal-processing apparatus can sufficiently handle image processing by enhancing the degree of parallelism. This is because the prior motion detection processing is for macro-block sizes of not less than 8×8 pels and not more than 16×16 pels.
However, according to the MPEG-4 AVC, since the smallest sub-macro-block size is 4×4 pels, with the prior signal-processing apparatus, the processing efficiency of the calculators does not improve even if 16 or more parallel calculators are provided.
Moreover, in the arithmetic coding/decoding processing of the MPEG-4 AVC, since the processing is performed while the probability of occurrence is changed in accordance with the contexts of peripheral macro-blocks, it is necessary to perform coding bit-by-bit, which means the parallel processing cannot be performed. That is, with the prior signal-processing apparatus, the processing performance in the MPEG-4 AVC cannot be improved even if the degree of parallelism of the MIMD parallel data processor is enhanced.
In de-blocking filters of the MPEG-4 AVC, the filter parameter is calculated in the unit of sub-macro-blocks of 4×4 pels, and filtering processing is performed based on the result. When an SIMD calculator is used, although filtering processing can be performed in parallel, the calculators cannot be effectively used in determination processing.
Moreover, the signal-processing apparatus comprising a combination of the SIMD data-parallel processor and the dedicated hardware is typified by the above-described Document 2. Although the processing performance is improved by adopting the dedicated hardware for the arithmetic coding/decoding processing that requires high processing performance, performing motion detection with the largest processing amount by the SIMD parallel data processor causes the following problem.
In the MPEG-4 AVC, motion compensation of ¼ pixel precision is introduced, and it is necessary to perform 6-tap filtering processing for the pixel generation of a half pel. Further, since the sub-macro-block size of 4×4 pels is introduced, up to 16 motion vectors per macro-block can be set. The motion detection processing in which with the small sub-macro-block size, a search of ¼ pixel precision is performed and up to 16 motion vectors per macro-block are calculated is drastically increased in processing amount.
For the SIMD data-parallel processor to perform such motion detection processing, it is necessary to enhance the degree of parallelism of the calculators and set the operating frequency to a high value. The capability of the SIMD parallel data processor, is then more than required in the decoding processing; therefore, the entire processor can not be efficiently used.
Furthermore, even if it is attempted to improve the processing performance by enhancing the degree of parallelism of the SIMD parallel data processor, since the block size is 4×4 pels, it is impossible for the degree of parallelism to be more than 16.