1. Field of the Invention
The present invention relates to video encoding technology; and particularly to a method and system for two-pass video encoding using sliding windows.
2. Description of the Related Art
A video sequence (VS) can be seen as a series of static frames, requiring considerable storage capacity and transmission bandwidth. A 90-min full color video stream, for example, having 640×480 pixels/frame and 15 frames/second, requires bandwidth of 640×480 (pixels/frame)×3 (bytes/pixel)×15 (frames/sec)=13.18 (MB/sec) and file size of 13.18 (MB/sec)×90×60=69.50 (GB). Such a sizeable digital video stream is difficult to store and transmit in real time, thus, many encoding techniques have been introduced.
MPEG standards ensure video encoders create standardized files that can be opened and played on any system with a standards-compliant decoder. Digital video contains spatial and temporal redundancies, which may be encoded without significant sacrifice. MPEG coding is a generic standard, intended to be independent of a specific application, involving encoding based on statistical redundancies in temporal and spatial directions. Spatial redundancy is based on the similarity in color values shared by adjacent pixels. MPEG employs intra-frame spatial encoding on redundant color values using DCT (Discrete Cosine Transform) and quantization. Temporal redundancy refers to identical temporal motion between video frames, providing smooth, realistic motion in video. MPEG relies on prediction, more precisely, motion-compensated prediction, for temporal encoding between frames. MPEG utilizes, to create temporal encoding, I-Frames, B-frames and P-frames. An I-frame is an intra-coded frame, a single image heading a sequence, with no reference to previous or subsequent frames. MPEG encodes only within the frame with no reference to previous or subsequent frames. P-frames are forward-predicted frames, encoded with reference to a previous I- or P-frame, with pointers to information in a previous frame. B-frames are encoded with reference to a previous reference frame, a subsequent reference frame, or both. Motion vectors employed may be forward, backward, or both.
MPEG achieves encoding by quantizing the coefficients produced by applying a DCT to 8×8 blocks of pixels in an image and through motion compensation. Quantization is basically division of the DCT coefficient by a quantization scale related to quality level, with higher indices for greater encoding but lower quality, and lower indices for the reverse.
Typical approaches for a MPEG video encoder utilize a constant bitrate (CBR) for a group of picture (GOP) regardless of the complexity of the video interval. Bitrate is used to represent video quality and defines how much physical space that one second of video takes in bits. They assume equal weighting of bit distribution among GOPs and reduce the degree of freedom of the encoding task. The CBR encoders enforce different quantizing scales for each frame type to achieve good quality streams within a GOP. The CBR method works adequately when the complexity of the source varies slowly over time and therefore the encoding algorithm has sufficient time to adjust itself. However, if the statistical features of the source change rapidly over time, a CBR operation may result in good frame quality for a short time window (e.g., a few frames or a GOP) and discontinuous quality when the whole video is acquired.
Since the VS is inherently variable, a better encoding approach has been introduced by employing a variable birate (VBR) encoder algorithm. Generally speaking, a VBR encoder produces non-constant output bitrate during a period of time, and a complex frame with a higher bitrate than that of a simple one. VBR encoder use the same or different quantization scales throughout the entire VS to achieve constant video quality.
Although the VBR algorithms are feasible for constant video quality, the distribution of bitrates may not be optimal for VS in only single pass. In order to allocate the given bitrate budget into different frames more properly, a rate-control scheme needs to redistribute bitrate for each frame in the entire VS properly under all possible quantization scales. Two-pass variable bitrate (VBR) technology has been introduced to achieve the above object. FIG. 1 is a schematic diagram of a conventional two-pass encoding method with VBR. The first pass encoding 21 encodes the entire source 20a to acquire the statistical features and then determines the optimal distribution of bitrates which satisfy requisite constraints. Thereafter, the second pass encoding 22 encodes the entire source 20a by the redistribution of bitrates according to the acquired statistical features and generates a VBR video stream 20b. It is noted that the encoded VS in the first pass is ignored and does not be carried into the second pass.
Although the solution is feasible, the entire two-pass encoding method with VBR is time intensive. Thus, the encoded result is acquired after completing entire two-pass encoding, resulting in hindering the requirement of quick response.
In view of the described limitations, a need exists for a system and method providing an efficient approach to satisfy the requirement of quick response.