In the first video coding standards (up to MPEG-2 and H.263), the video is assumed to be rectangular and to be described in terms of three separate channels: one luminance channel and two chrominance channels. With MPEG-4, additional channels have been introduced, for example the alpha channel (also referred to as the “arbitrary shape channel” in MPEG-4 terminology), the disparity channel, or the depth channel. The spatial and temporal resolutions of these channels are described at the sequence level (Video Object Layer, or VOL, in MPEG-4 terminology).
For the frame rate, only one description is given for all channels, as follows. The temporal resolution of the sequence is described by means of the following syntactic elements:                “vop_time_increment_resolution” (coded on 16 bits),        “fixed_vop_rate” (coded on 1 bit), and        “fixed_vop_time_increment” (coded on 1 to 16 bits),as defined for instance in pages 36 and 112 of the MPEG-4 document w3056, “Information Technology—Coding of audio-visual objects—Part 2: Visual”, ISO/IEC/JTC1/SC29/WG11, Maui, USA, December 1999. These elements are now described in a more detailed manner.        
The syntactic element “vop_time_increment_resolution” is a 16 bit unsigned integer that indicates the number of evenly spaced subintervals, called ticks, within one modulo time (modulo time represents the fixed interval of one second).
The syntactic element “fixed_vop_time_increment” represent the number of ticks between two successive VOPs in the display order. The length of a tick, given by “vop_time_increment_resolution”, can take a value in the range [0, vop_time_increment_resolution]. The number of bits representing said value is calculated as the minimum number of unsigned integer bits required to represent the above range. This element “fixed_vop_time_increment” is only found in the bitstream if “fixed_vop_rate” is “1”, and its value must be identical to the constant given by the distance between the display time of any two successive VOPs in the display order. In this case, the fixed VOP rate is given as the ratio “vop_time_increment_resolution”/“fixed_vop_time_increment”, a zero value being forbidden.
The syntactic element “fixed_vop_rate” is a one-bit flag which indicates that all VOPs (pictures in MPEG-4 terminology) are coded with a fixed VOP temporal rate. Its value is “1” if and only if all the distances between the display time of any two successive VOPS in the display order in the video object layer are constant. In this case, the VOP rate can be derived from the “fixed_vop_time_increment” syntactic element. If the value of the flag is “0”, the display time between any two successive VOPs in the display order can be variable: it is then indicated by the time stamps provided in the VOP header.
In either case (fixed VOP rate or not), the display time of each encoded VOP is retrieved from the bitstream by a syntactic element “vop_time_increment” coded on 1 to 16 bits in the VOP header (see pp. 40 and 120 of the MPEG-4 document already cited). It can take a value in the range of [0, vop_time_increment_resolution]. The number of bits representing said value is calculated as the minimum number of unsigned integer bits required to represent the above range. The local time base in the units of seconds is recovered by dividing this value by “vop_time_increment_resolution”.
From the previous indications, it can be seen that, unfortunately, all channels have to share the same description. It is not possible to describe, for instance, a video sequence encoded at a frame rate of 30 Hz in luminance, 15 Hz in chrominance and 10 Hz in shape.