H.264 SVC (Scalable Video Coding) includes spatial scalability (different picture sizes), quality scalability (different bit rates) and temporal scalability (different frame rates). In spatial scalability, video is coded at multiple spatial resolutions. Each spatial resolution is coded as a layer. The data and decoded samples of lower resolutions are used to predict data or samples of higher resolutions to reduce the bit rate when coding higher resolutions.
Referring to FIG. 1, an H.264 SVC decoder 30 is shown handling two layers of spatial scalability. The decoder 30 receives a base layer stream (i.e., BASELAYER_STREAM) and a target layer stream (i.e., TARGETLAYER_STREAM). The BASELAYER_STREAM is then decoded into base layer coefficients (i.e., BASELAYER_COEFFICIENTS) by a H.264 CABAC/CAVLC decoder 34. The TARGETLAYER_STREAM is decoded into target layer coefficients (i.e., TARGETLAYER_COEFFICIENTS) by a H.264 CABAC/CAVLC decoder 40. The BASELAYER_COEFFICIENTS is then presented to a transform stage circuit 32. The circuit 32 transforms the base layer coefficients into base layer information (i.e., BASELAYER_INFORMATION). The BASELAYER_INFORMATION includes information of all macroblocks together with residuals and intra samples of the base layer picture. The BASELAYER_INFORMATION is then stored in a memory 36. The BASELAYER_INFORMATION is then presented to a transform stage circuit 38. The circuit 38 receives both the BASELAYER_INFORMATION and the TARGETLAYER_COEFFICIENTS. The circuit 38 then presents target layer samples (i.e., TARGETLAYER_SAMPLES).
In conventional approaches, hardware typically handles SVC layer by layer. Hardware decodes one layer, collects all necessary information, stores the information into memory and then uses the information for decoding a next layer. After the next layer is decoded, the information is used for the next higher layer. Up to eight layers may be coded in an SVC stream.
In conventional approaches, an H.264 SVC decoder 30 will typically decode the entire BASELAYER_STREAM, acquire all information about the BASELAYER_INFORMATION, and then store the BASELAYER_INFORMATION to the memory 36. Later in the decoding process, the BASELAYER_INFORMATION will be retrieved from memory to decode the TARGETLAYER_STREAM. After the current target layer stream is decoded, the current target layer becomes a base layer for the next layer. Since the BASELAYER_INFORMATION contains all the macroblocks information as well as residuals and intra samples of the entire base layer picture, a significant amount of memory space is needed. Also, if the memory is an external device, a significant amount of bus bandwidth will be needed.
Since this approach uses a large amount of memory for base layer information, implementation on hardware may cause issues. If information is stored on chip memory, chip die size will increase. If information is stored on external memory, system performance will be limited by bus bandwidth.
It would be desirable to implement a chip to decode an H.264 SVC bitstream using a minimal amount of memory.