The present invention relates to multiple reference frame architecture for video coding, and more particularly, to methods for coding (encoding or decoding) digital media data with prediction information and prediction error information being respectively carried by different bit stream sections.
Regarding a multiple reference frame architecture (for example, an apparatus complying with H.264 specifications), some problems such as complicated memory access behavior and a high memory access rate of a main memory are introduced while multi-frame motion compensation is performed, where the main memory can be a dynamic random access memory (DRAM) to be accessed by a processor of the apparatus. Typically, the processor and the main memory are respectively positioned in different chips within the apparatus, so the memory bandwidth of the main memory may be insufficient due to the complicated memory access behavior and/or the high memory access rate of the main memory.
According to the related art, some suggestions with regard to a reduction of the corresponding memory requirement (e.g., the memory requirement of the DRAM) are proposed in order to solve some of the problems mentioned above. One suggestion comprises scaling decoded pictures, however, the picture quality is usually degraded through scaling. Another suggestion is compressing decoded pictures in a simpler way without randomly accessing a macroblock (MB). According to this suggestion, however, it is also very hard to prevent the picture quality from being degraded. According to another suggestion, just-in-time decoding of specific frames may be applied, but the corresponding computation load is extremely heavy to cost-efficient hardware architecture.
As mentioned, the overall performance of an architecture implemented with at least one of the aforementioned suggestions is typically degraded due to some native characteristics of the multi-frame motion compensation. For example, referring to a situation shown in FIG. 1, reference data of an MB may be derived from multiple frames. In addition, more motion vectors and more intra information are involved in contrast to single frame motion compensation. Additionally, some issues related to long-term memory management might be encountered. Thus, according to the related art, even though the goal of reducing the corresponding memory requirement may be achieved, it is hard to prevent introducing unwanted side effects.
Regarding an essential characteristic of the multiple reference frame architecture of the related art, as too much information—such as the MB type, the reference frame list, the motion vector difference(s), the coded block pattern, the transform type, the residual, and so on—are all encoded in the same MB layer, the conventional encoding is certainly imperfect since decoded results may occasionally have redundant information.
More particularly, prediction information (for example, inter-frame information such as motion vector information and reference frame information, or intra-frame information such as information of an intra prediction mode) and prediction error information (for example, information of a coded block pattern or residual information) are encoded in the same MB layer according to a conventional syntax, where each reference frame list is individually encoded, and each reference frame is independent of others. As a result, the multiple reference frame architecture implemented according to the conventional syntax such as that utilized in the procedure 10 shown in FIG. 2 (e.g., the MB layer syntax as shown in H.264 Standard Section 7.3.5) suffers from heavy load. A novel method is therefore required for decreasing the amount of the redundant information mentioned above in order to reduce the heavy load.