1. Field of the Invention
The present invention relates to a method and apparatus for data transmission between processors, and more particularly, to a method and apparatus for efficiently transferring a massive amount of multimedia data between two processors, when there are at least two processors that process multimedia data.
This work was supported by the IT R&D program of MIC/IITA. [2006-S-048-01, Embedded DSP Platform for Audio/Video Signal Processing]
2. Description of the Related Art
With the development of information technology (IT) industries, not only portable products, but also residential electronic products, which can process multimedia data, such as video and audio, have remarkably increased. The use of such multimedia products is expanding in various product groups, such as DVD technology, MPEG-2 technology, moving image reproduction functions of mobile phones, HDTVs, etc. An image or audio for a corresponding multimedia product includes a massive amount of data from a raw data point of view. For example, when each pixel of an image having a screen size of 1920×1200 is expressed in 24-bit, transmission performance of 1.66 Gbps is required in order to transmit 30 frames per second in a continuous serial bitstream. As the frame rate increases, better transmission performance is required, and thus most images and sounds use an advanced compression technology.
Various compression technologies of multimedia data exist, including MPEG-2, MPEG-4, H.264, bit sliced arithmetic coding (BSAC), advanced audio coding plus (AAC+), etc. Hardware having a function of coding and decoding an image is required so as enable the use of such compression technologies. Most mobile and residential multimedia devices include very large scale integration (VLSI) for a multimedia codec in order to perform coding and decoding in real time.
Performance of VLSI for a codec differs according to the complexity or characteristics of a codec algorithm. Recent multimedia codecs require a data processing performance from 0.6 giga instructions per second (GIPS) to 1.5 GIPS, and several years from now, it is predicted that multimedia codecs will require a data processing performance from 2 GIPS to 5 GIPS. Accordingly, a high performance chip for codecs is required.
A codec chip can be realized in a processor or an application specific integrated circuit (ASIC).
In case of a processor, when a new codec is used, the processor can realize the new codec in a short time, and even when the processor is manufactured in VLSI, the processor has a flexibility to re-realize another new codec in a short time. However, the speed of processing data is low.
In the case of an ASIC, the speed of processing data is high, but it takes a long time, up to several months, to realize a new codec in a real ASIC. When ASIC is realized in VLSI with regards to a certain codec, when a new codec is used, a new codec chip should be manufactured.
In order to complement the low speed of a processor, a plurality of processors is realized in one VLSI. While processing multimedia data, the same operation is repeatedly performed on a series of data streams, and thus a VLSI structure for processing data can be in parallel. When the VLSI structure is in parallel, tasks for processing data can be independently assigned to each processor, and the assigned tasks can be simultaneously performed. Accordingly, unlike a general purpose processor that processes general data, processors in parallel can process multimedia data. Such processors in parallel maintain the intrinsic advantages of a processor, such as short development time and flexibility, while showing high performance. Processors in parallel that have a suitable structure for processing streams are referred to as a stream processor. Also, each processor in a stream processor is referred to as a processor element.
The most important issue in a stream processor is the transferal of data between processor elements. As described above, multimedia data is a flow of a massive amount of data that requires repetitive operations. In order to transfer data between a plurality of processor elements in the stream processor, a communication bandwidth should be maximized.
A bus structure is used to transfer data between two processor elements. The bus structure includes a series of wires and a protocol to control the transfer of data between the wires, and thus stream data is continuously transmitted during each clock cycle. Examples of the bus structure include advanced microcontroller bus architecture (AMBA), CoreConnect, peripheral component interconnect (PCI), and PCI extended (PCI-X).
However, due to a basic characteristic of the bus structure, a transmission bandwidth is limited while transmitting data between processor elements in the stream processor, and thus the stream processor is not suitable for transmitting a massive amount of multimedia data.
Accordingly, a method of quickly transmitting a massive amount of multimedia data between two predetermined processor elements is required.