Advances in video coding technology and standardization along with the rapid developments and improvements of network infrastructures, storage capacity, and computing power enable an increased number of video applications nowadays. The video transmission systems using the Internet and mobile communication networks are for real-time services characterized by a wide range of connection qualities and receiving devices. For example, the receiving devices with different capabilities may range from cell phones with small display screens and restricted computing power to high-end personal computers with high-definition display apparatuses and powerful computing power. Regarding the problems encountered by the characteristics of the above-mentioned video transmission systems, scalable video coding (SVC) may be a highly attractive solution for video frame transmission.
SVC is an extension of the H.264/AVC standard and standardizes the encoding of a high-quality video bitstream that also contains one or more subset bitstreams. The subset bitstream can represent a lower spatial resolution (smaller screen), lower temporal resolution (lower frame rate), or a lower video quality compared to the bitstream it is derived from. For example, the spatial scalability over H.264/SVC may allow 8 layers for different spatial resolutions at most. Besides, the inter-layer dependency may be exploited for improving the coding efficiency. Preferably, a low-resolution layer (e.g., a base layer) is referenced by a high-resolution layer (e.g., an enhancement layer) when the high-resolution layer is being coded at a video encoder. Therefore, inter-layer intra prediction, inter-layer residual prediction, and/or inter-layer motion prediction may be employed by the video encoder for generating coded enhancement layer frames.
Regarding the decoding flow performed at a video decoder, the conventional design fully decodes a base layer frame to generate a complete decoding result, store the complete decoding result into an external memory, and decoding an enhancement layer frame by reading information provided by the complete decoding result stored in the external memory. However, such a conventional design of decoding an enhancement layer frame requires a large storage capacity for buffering a complete decoding result of a base layer frame and a large bandwidth for accessing an external memory.