1. Field of the Invention
The present invention relates to MPEG-4 video coding, and more particularly, to a bit rate control method and apparatus for MPEG-4 video coding.
2. Description of the Related Art
Moving picture experts group (MPEG) has proposed a method of compressing moving pictures by which temporal redundancy and spatial redundancy are removed. The temporal redundancy is removed using a motion compensation method and the spatial redundancy is removed by applying discrete cosine transform (DCT) to still frames.
MPEG-4 is an object-based technique of compressing moving pictures according to MPEG compression standards. Unlike conventional techniques, MPEG-4 enables individual coding of an object having an arbitrary shape.
FIG. 1 shows a hierarchy of MPEG-4. A video session (VS) 110 denotes the entire sequence of an image. The VS 110 comprises one or more video objects (VO) 120. For example, when a person exists in the middle of background, only the person's sequential motions can be described using a single VO, or a background sequence can be separately described. Each VO 120 comprises one or more video object layers (VOL) 130. The VOL 130 gives each VO 120 spatial and temporal resolution.
The lowermost video object plane (VOP) 150 refers to instant data corresponding to the resolution of each VO. Also, a new class, a group of VOP (GOV) 140, exists between the VOL 130 and VOP 150 to perform random access. If the group of VOP 140 exists, coding starts from a mode in which temporal-directional estimation is not performed.
Most MPEG-4 systems encode raw video data into a variable bit rate (VBR) bit stream using fixed bit quantization. In this case, if data traffic of an outputted bitstream varies suddenly, an output buffer is very likely to overflow or underflow. When raw video data is encoded into a constant bit rate (CBR) bitstream, data traffic is maintained at a constant level irrespective of the kinds of inputted images by flexibly adjusting bit quantization.
The rate control methods can be categorized as frame-based rate control methods, in which outputted data traffic is controlled in frame units, or macro-block-based rate control methods, in which outputted data traffic is controlled in macro-block units. The macro-block-based rate control method enables more accurate control of bit rates, but requires more complicated and difficult techniques than the frame-based method. Accordingly, the frame-based rate control method is typically used.
In real-time video communications, video encoding requires accurate rate control. The accurate rate control should meet the end-to-end delay condition and also should enable estimation of rate distortion (RD) function of a video encoder such that a buffer used for encoding does not overflow or underflow. If traffic of data stored in a buffer is too high, an encoder decreases a delay of the buffer and skips an encoding frame to avoid overflow of the buffer. Once the encoding frame is skipped due to the overflow, motions of decoded images become unnatural due to discontinuity of encoded video sequence.
When the frame-based video is encoded, a suitable quantizer should be selected considering a limit of the bit used in quantization. This point is important for organizing a suitable and adaptive rate distortion model.
On the other hand, an adaptive rate distortion model is based on a self-organizing learning Petri net (SOLPN). A Petri net is a useful mathematical tool for modeling various events or actions. Petri nets were first developed in 1962 by Carl Petri in West Germany. The Petri net comprises two types of nodes, i.e., places and transitions, which can be coupled to a different type of node by an arc. Here, the transition is a function for generating an output signal corresponding to an input signal, and the place is a space for storing any input/output signal. A learning Petri net (LPN) is obtained by adding a learning ability, such as provided by a neural network, to the Petri net.
FIG. 2 shows a basic learning structure of an LPN. Each transition, excluding input transition, comprises a predetermined number of input places and a predetermined number of output places. For simplicity, different transitions do not share the same input or output places. Although a limited number of transitions or places are shown in FIG. 2, more transitions or places may be coupled to each other in parallel or in series, building different shapes.
The foregoing LPN has the learning and reproducing abilities of a neural network. However, unlike neutral networks, the LPN has the characteristics of a distribution function. Parameters of the LPN are pre-set based on a user's experience, like in the case of normal neural networks. In the LPN, since the numbers of transitions and places between an input layer and an output layer and connections thereof are pre-fixed according to the user's experience, output values are quite incorrect. For this reason, the SOLPN was proposed.
The SOLPN is a self-organizing LPN in which the learning rate is high and accurate modeling is enabled since learning is performed not in a system based on the user's experience but in a system based on training samples.
The DCT-based video encoder uses various rate distortion (RD) models. One of the RD models encodes respective image blocks and intelligently selects the best parameter. However, a method of using this RD model is not suitable for real-time encoding due to complicated calculations.
In another method, a quantizer is selected based on a predetermined mathematical model and a control parameter is estimated from RD data of a coding system. Although this method is suitable for real-time encoding, frame skips occur frequently and channel bandwidth is wasted during a low-delay application service. Also, to obtain a high coding efficiency, more complex RD algorithms are needed, and even more experiments are required to obtain suitable control parameters.
A self-organized map based on a rate control scheme was disclosed in “Rate Control Algorithm Using SOFM-based Neural Network,” Electronics Letter, vol. 36, No. 12, pp. 1041-1158, 2000. This map organizes a frame-based global RD model using a neural classifier. The disclosed map may lead to good results, but requires off-line training. That is, to collect all modifications capable of being generated from image characteristics using a plurality of video samples, a fixed quantization parameter (QP), which is varied from 1 to 31, should be used. However, this method is not suitable for on-line control and cannot easily update the configurations or the structures of corresponding neural networks.