1. Field of the Invention
The present invention relates to a data processor, and to improvements for efficiently performing media processing, such as decoding of image or audio data, and demodulation processing of communicated data.
2. Background Art
When performing media processing such as decoding of image or audio data or demodulation processing of communicated data, it is often necessary to process data arranged in the form of a matrix, that is, a two-dimensional array of N rows and M columns. To process a data matrix at high speed, data processors which perform media processing tend to employ an architecture that is suitable for parallel data processing. It is commonly believed that high-speed processing of a data matrix can be achieved by a quantitative expansion, i.e., by increasing the number of processing elements (hereafter simply referred to as “PE”). APE is a hardware expansion unit which includes an operation unit and a register for feeding data to the operation unit. Conventional data processors for media processing are architectured to adapt to parallel processing by incorporating PEs corresponding to the number of operations to be executed in parallel.
A construction of a data processor for processing a data matrix having 16 rows and 16 columns is explained below. A matrix of data elements with 16 rows and 16 columns is stored in a memory device. The 16 columns of the matrix are assigned in a one-to-one correspondence with 16 PEs. Each time 16 data elements which constitute one row are read from the memory device, the 16 PEs simultaneously process the data elements which belong to the assigned columns. This enables 16 data elements of one row to be processed in one cycle. By performing the same for 16 cycles, the processing of the 16 by 16 data matrix is completed.
Here, parallel execution of 16 operations is possible only if the operation units of the 16 PEs are simultaneously fed with data elements which they use as operands. Suppose the operation unit of each PE uses a data element which belongs to a column other than the one assigned to it, as an operand. In such a case, it is necessary to change the arrangement of data elements in the matrix before supplying them to the PEs. If such a rearrangement takes time, it becomes impossible to supply the data elements simultaneously to the PEs. This causes a significant drop in processing efficiency of the PEs. Thus, an architecture having a plurality of PEs is vulnerable in that the processing efficiency drops significantly if a data element which belongs to one column needs to be fed to a PE corresponding to another column.
Particularly in media processing, there are a number of instances where a changing of places between data elements is necessary. If the processing efficiency decreases every time such a changing is made, it is impossible to meet strict specifications required of digital electrical household appliances. To solve this problem, developers of data processors take the trouble of narrowing down the types of operations to be performed by PEs in media processing and redesigning the architecture for each type of operation. With the developers getting caught up in the trouble of such redesigning, it is widely considered to take a long time to develop data processors for media processing.