1. Field of the Invention
The present invention relates to an apparatus, a method and a data is processing element (DPE) for efficient parallel processing of multimedia data, and more particularly, to a parallel data processing array including a plurality of DPEs for processing a massive amount of multimedia data, and an apparatus and a method for transferring a massive amount of data between each DPE.
This work was supported by the IT R&D program of MIC/IITA [2006-S-048-02, Embedded DSP Platform for Audio/Video Signal Processing].
2. Description of the Related Art
With developments in information technology (IT), the number of not only portable products, but also residential electronic products, which can process multimedia data, such as video and audio, has remarkably increased. The use of such multimedia products is expanding in various product groups related to DVD technology, MPEG-2 technology, moving image reproduction functions of mobile phones, HDTVs, etc. An image or audio file for a corresponding multimedia product includes a massive amount of data if it is raw data. For example, when each pixel of an image having a screen size of 1920×1200 is expressed in 24-bit, transmission performance of 1.66 Gbps is required in order to transmit 30 frames per second in a continuous serial bitstream. As the frame rate increases, better transmission performance is required, and thus most images and sounds are transferred after those are compressed by an advanced compression technology.
Various compression technologies of multimedia data exist, including MPEG-2, MPEG-4, H.264, bit sliced arithmetic coding (BSAC), advanced audio coding plus (AAC+), etc. Hardware having a function of coding and decoding an image is required so as to enable the use of such compression technologies. Most mobile and residential multimedia devices include very large scale integration (VLSI) for a multimedia codec in order to perform coding and decoding in real time.
Performance of VLSI for a codec differs according to the complexity or characteristics of a codec algorithm. Recent multimedia codecs require a data processing performance of 0.6 giga instructions per second (GIPS) to 1.5 GIPS, and in several years from now, it is predicted that multimedia codecs will require a data processing performance of 2 GIPS to 5 GIPS. Accordingly, a high performance chip for codecs is required.
When various kinds of multimedia codecs are embodied as hardware within a short period of time, a processor array structure or a parallel processing system is used in order to realize high performance. A programmable processor can be embodied as various multimedia codecs within a short period of time, but processing speed of the programmable processor decreases. A programmable processor having an array structure for overcoming such a disadvantage can process multimedia data in parallel, and thus can effectively and efficiently process the multimedia data.
While processing multimedia data, the same operation is repeatedly performed on a series of data streams, and thus a parallel processing can be easily adopted for processing multimedia data. The parallel processing means that tasks for processing data are independently assigned to each processor, and the assigned tasks are simultaneously performed.
There is a need for an efficient structure of each of a plurality of data processing elements (DPEs) constituting a processor array and a method of efficiently interconnecting each DPE in order that the processor array can effectively and efficiently process multimedia data.