Digital media decoders decode compressed digital data that represent some form of media in order to reconstruct the media. The present invention relates to efficient on-chip memory utilization for programmable media processors. Such media processors include, but are not limited to, programmable video and/or audio processors. The main functionality of such processors is decoding binary bit-streams conforming to certain coding standards/formats and performing appropriate processing in real time. In the case of video decoding processors, examples of video coding formats are MPEG1, MPEG2, MPEG4, H.263, Microsoft Video 8, RealVideo 8 and the emerging new standard, H.26L. In the case of audio decoding processors, examples of audio coding formats include MPEG layers 1 and 2, Dolby AC-3, MP3, AAC, etc.
A typical programmable media processor contains a microprocessor, a few hardware accelerators and memory. They are communicatively coupled together through an interface of some kind and typically further include means of accessing external memory and a host central processing unit (CPU) as well. The microprocessor typically controls the other modules through issuing appropriate instructions and/or commands. The efficiency of the whole processor depends on how media processing tasks are partitioned and assigned to different modules and how memory is utilized.
A video or audio decoding processing usually involves tasks such as bit-stream syntax parsing, control information/decision calculation, special transformations, filtering, post-processing, etc. For example, in the case of MPEG2 video decoding, syntax parsing includes header information extraction and variable length decoding (VLD). Control information calculation includes motion vector calculation. Special processing includes inverse quantization (IQ) and inverse discrete cosine transformation (IDCT). Filtering comprises motion compensation (MC). Such tasks are usually carried out by different modules in the media processor in a pipelined way. The pipelined operation is efficient and is possible because the media coding is usually based on partitioned picture or audio data. Each media segment is typically represented by a distinct data unit (DU). For example, video coding is based on macroblock (MB) data units, typically of size 16×16. Due to the high complexity and evolving progress of different media coding formats, control information and intermediate data passed along the pipeline would be a bottleneck for efficient execution or infeasible for cost effective design if conventional ways for control information and intermediate data passing were adopted.
In typical decoding methods, the data unit includes two main parts: a header portion containing control data and a data portion containing substantive data. In one conventional method of propagating the controls along the various decoding functions in the pipeline, the controls for the current data unit, e.g., a macroblock in MPEG2 video, are extracted by the microprocessor and then passed to the next decoding function before the microprocessor continues to process the next data unit. Then another decoding element, such as an accelerator, starts processing for this data unit by using the control data latched in registers between the microprocessor and the accelerator, and so on. This scheme is very expensive because there are usually a lot of controls for a media coding format and different coding formats would have different controls. It is inflexible and hard to expand as coding formats evolve because the pipeline is fixed.
In another conventional method of passing data along the pipeline, each decoding module has its own input and output buffers. This is sometimes referred to as a double-buffered scheme. In such a scheme, while one decoding element, such as an accelerator, is producing data to one buffer, another decoding element is reading data from another buffer. Both the intermediate data and controls can be passed this way. Such a system is usually not efficient enough to support multiple media coding formats. For example, in MPEG4 video, after variable-length decoding by a variable-length decoder, further processing for the decoded run/length pairs include run/length decoding (RLD), inverse scan (IS), AC/DC prediction, inverse quantization, inverse discrete cosine transform, motion compensation and deblocking. In one implementation, these tasks are each hardwired as a hardware accelerator. In another implementation, several of them are hardwired into one hardware accelerator. In the former case, many buffers would be needed between the prolonged decoding functions of the pipeline, which results in an expensive design. In the latter case, the system would not be flexible enough to handle different coding formats efficiently and cost-effectively, as various media formats usually require quite different task partitioning and grouping, making it difficult to predetermine the buffer requirement for each accelerator.
The present invention introduces an efficient way to manipulate the control information and data along the pipeline by proper on-chip memory utilization.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.