1. Field of the Invention
Embodiments of the present invention generally relate to limiting the maximum size of an encoded picture in an encoded video bit stream using sub-picture based rate control.
2. Description of the Related Art
In some video applications, it is important to ensure that the maximum encoded size of a picture is less than a specified maximum limit. As used herein, the term picture refers to a frame (for progressive video) or a field of a frame (for interlaced video) and the term frame refers to a complete image captured during a known time interval. For example, in video conferencing, the glass to glass delay, i.e., the end-to-end delay, is strongly influenced by the encoded picture size. If the encoded picture size is very large, the transmission time for such a picture (assuming a constant bit rate channel) will also be very large. The long transmission time will in turn cause the decoder receiving the encoded picture to incur a large buffering delay, which is undesirable from a real-time interaction perspective. In another example, in interactive gaming, the end-to-end delay is also strongly influenced by the encoded picture size, e.g., the delay should be less than 100 ms to provide for real-time interaction with a video game.
Another scenario in which control over the picture size is desirable is when H.241 is used during video encoding. H.241 refers to the ITU-T Recommendation H.241, entitled “Extended Video Procedures and Control Signals for H.300 Series Terminals”, which establishes the communication procedures for the use of advanced video codecs, including H.264, with H.300 series terminals such as the communication terminals of ITU-T Recs. H.310, H.320, H.321, H.322, H.323 and H.324. These communication procedures include control, indication, capability exchange and transport mechanisms. H.241 also specifies that the maximum size of Network Access Layer (NAL) units generated by a video codec is constrained by the size of the maximum transmission unit (MTU) of an IP-network. That is, to avoid IP-layer packet fragmentation, H.241 states that NAL units should be strictly shorter than the MTU size of the network. For example, on an Ethernet network with a 1472 byte MTU, H.241 recommends a maximum size of 1200 bytes for a NAL Unit to allow for addition of a header without exceeding the MTU size of the network.
To reduce error due to packet losses in video streaming over an IP-network, the NAL units may be generated such that each NAL unit contains an independently decodable piece of video data, i.e., a slice of a picture in a video stream. That is, in H.264 and other coding standards, a picture may be segmented into sequences of macroblocks referred to as slices that are separately encoded. When the size of an encoded picture is large, the number of NAL units (or slices) may also be large due to the MTU size constraint. This increases the encoding time for the picture as additional overhead is incurred for slice header generation each time a new slice is started.
Further, it is the responsibility of the video encoder to enforce the MTU size constraint. Typically, the decision to end a slice and begin a new one due to the MTU size constraint is made in the entropy coding stage of the video encoder. If the video encoder has a pipelined architecture in which multiple macroblocks are processed concurrently in different coding stages, the macroblocks in the pipeline are assumed to be in the same slice and may have data/encoding dependencies. When the decision is made to start a new slice at entropy coding, at least some of the macroblocks in the pipeline may need to be re-encoded, thus increasing the encoding time for the picture.
Such increases in encoding time may result in the video encoder not being able to achieve real-time encoding, i.e., the encoder may take more time to encode a picture than the time between capture of two successive pictures. Thus, the encoder may start dropping pictures in order to meet the real-time coding requirement, thus reducing the quality of the encoded video.