Conventional video coding standards, such as the Moving Pictures Expert Group (MPEG)-1, H.261/263/264 standards, incorporate motion estimation and motion compensation in order to remove temporal redundancies between video frames. The scalable extension to the H.264/AVC (which stands for Advanced Video Coding) standard currently enables fine-grained scalability, according to which the quality of a video sequence may be improved by increasing the bit rate in increments of ten percent or less. Currently, fine granularity scalability (FGS) information is not considered to be a separate “layer,” but instead is stored along with the “base layer” it is encoded relative to. However, when forming subsequent enhancement layers, it would be beneficial to have the option of basing the enhancement upon the base layer either with or without FGS.
Conventional systems, though moderately useful, include at least two substantial problems. First, scalability does not always follow a “linear” path. For example, it may be desirable to have a low spatial resolution base layer encoded at some minimal acceptable quality, with FGS used to enhance the quality. Furthermore, it may also be desirable to have a spatial enhancement encoded relative to the base layer (excluding FGS). This could be desired, for example, due to bit rate constraints on a transmission channel that does not permit the “expense” of transmitting the extra FGS data when only a spatial enhancement is desired.
In the currently-planned H.264/AVC scalability extension, the FGS information is not considered to be a separate layer. Consequently, there is no mechanism of specifying whether the spatial enhancement layer is encoded relative to the base layer with or without FGS. In other words, the operation must be “hard wired”.
Second, the progressive enhancement/refinement slices (i.e., FGS slices) and the corresponding base layer picture are currently envisioned as being in the same picture and therefore the same access unit. These items also have the same value for the DependencyId. This architecture is less than optimal for system-layer operations. In the media file format, e.g., the AVC file format specified in ISO/IEC 14496-15, metadata information is typically stored for each sample containing a picture or an access unit. The above picture (access unit) definition therefore requires a streaming server to parse into samples, even for non-FGS scalable streaming (i.e. when truncation of FGS slices is not needed to reach the desired scalable presentation point). From this point of view, the current design enforces a media file format for storage of scalable video content with increased complexity, which implies streaming server operations with increased complexity.