1. Field of the Invention
The present invention relates to a video decoding system. More particularly, the present invention relates to a video decoding system capable of generating a resampling reference picture, and a resampling reference picture generation apparatus and a method thereof.
2. Description of Related Art
During a video encoding process, a video encoder generally divides an image frame into macroblocks which have the same size with a unit of 16×16 pixels, wherein the macroblocks are not overlapped. Then, the video encoder performs an intra prediction and an inter prediction to the macroblocks, so as to eliminate a spatial and a temporal redundancy. Thereafter, the video encoder performs a discrete cosine transform (DCT), a quantization and an entropy coding on a residual block obtained by subtracting the original block from the prediction block, so as to obtain a bit stream.
Next, referring to FIG. 1, FIG. 1 is a system block diagram illustrating a conventional video decoding system 10. The conventional video decoding system 10 includes a variable length decoding (VLD) unit 11, an inversed quantization (IQ) unit 12, an inversed discrete cosine transform (IDCT) unit 13, an adder 14, an in-loop filter 15, a selector 16, an intra prediction unit 17, a motion compensation unit 18 and a frame memory 19. Wherein, coupling relations of the devices are shown in FIG. 1, and are not described herein.
A decoding flow of the video decoding system 10 is reversed to the aforementioned encoding flow. First, the VLD unit 11 performs an entropy decoding on the bit stream. Next, the IQ unit 12 performs an inversed quantization on the output stream of the VLD unit 11. Thereafter, the IDCT unit 13 performs an inversed discrete cosine transform to an output stream of the IQ unit 12 to obtain the residual block. Next, the adder 14 adds the residual block to an intra prediction block or an inter prediction block to obtain one block of a reconstruction frame. Wherein, the intra prediction unit 17 is used for generating the intra prediction block, and the motion compensation unit 18 generates the inter prediction block according to a plurality of image frames stored in the frame memory 19.
Finally, the in-loop filter 15 of the video decoding system 10 filters the block of the reconstruction frame, so as to obtain one block of a relatively smooth output frame without grains, and transmits the block of the output frame to a display device or the frame memory 19 to serve as one block of a reference picture for a next inter prediction. Wherein, the reference picture used for the next inter prediction has to be an I frame or a P frame.
Moreover, in most of video standards, a frame that can be reconstructed without referring to any frame during the decoding process is referred to as the I frame, while a frame that is reconstructed by referring to a previous non-B frame is referred to as the P frame, and a frame that is reconstructed by simultaneously referring to a previous non-B frame and a future non-B frame is referred to as the B frame, wherein, the non-B frame refers to the I frame or the P frame.
Referring to FIG. 2, FIG. 2 is a schematic diagram illustrating frames and display time displayed by a display device after a bit stream is decoded. As shown in FIG. 2, the frames displayed by the display device after the bit stream is decoded are sequentially an I frame I0, B frames B1-B3, a P frame P4, B frames B5-B7 and a P frame P8.
As described above, the I frame I0 can be reconstructed without referring to any frame, and the P frame can only be reconstructed by referring to the previous non-B frame. For example, reconstruction of the P frame P4 must refer to the I frame I0, and reconstruction of the P frame P8 must refer to the P frame P4. The B frame can only be reconstructed by simultaneously referring to the previous and the future non-B frames. For example, reconstructions of the B frames B1-B3 all refer to the I frame I0 and the P frame P4, and reconstructions of the B frames B5-B7 all refer to the P frames P4 and P8.
Next, referring to FIG. 3, FIG. 3 is a schematic diagram illustrating decoding time corresponding to the frames of FIG. 2. During the decoding process, a sequence of the generated reconstruction frames is different from that of the frames displayed on the display device. After the bit stream is decoded, the first generated frame is the I frame I0. Then, the P frame P4 is reconstructed by referring to the I frame I0. Thereafter, the B frames B1-B3 are sequentially reconstructed by referring to the P frame P4 and the I frame I0. Thereafter, the P frame P8 is reconstructed by referring to the P frame P4. Then, the B frames B5-B7 are sequentially reconstructed by referring to the P frames P4 and P8.
During the video encoding process, due to an insufficient channel bandwidth, a compression ratio has to be adjusted to satisfy the bandwidth limitation, and a commonly used method thereof is to adjust a quantization value or decrease a frame resolution. However, if the frame resolution is changed at the video decoding system, the reference picture has to be correspondingly amplified or minified to match a size and a shape of the frame to be predicted.
Referring to FIG. 4, FIG. 4 is a schematic diagram illustrating a reference picture R1 being adjusted to a reference picture R2. As described above, when a size and a shape of a next frame are different form those of the reference picture R1, the size and shape of the reference picture R1 have to be changed. In this example, the size and shape of the next frame are assumed to be different from the reference picture R2, so that the reference picture R1 is adjusted into the reference picture R2.
A reference picture resampling (RPR) algorithm is an algorithm for changing the size and shape of the reference picture before the reference picture is referred, so as to match the size and shape of the frame to be predicted.
A TMN 3.0 decoding program matching a H263+ standard, which is developed by the University of British Columbia of Canada is taken as an example. When the TMN 3.0 decoding program decodes each of the frames, if a resolution of a current frame is found to be different from the resolution of the reference picture, suitable resampling is first performed to the reference picture, and then the generated resampling reference picture is served as the reference picture for the current frame. Wherein, the suitable resampling of the reference picture is a down sampling or an up sampling of the reference picture.
Referring to FIG. 5, FIG. 5 is diagram illustrating a relation between decoding time and frames in a decoding process according to an algorithm applied by the TMN 3.0 decoding program. In the example of FIG. 5, the bit stream only includes I frames and P frames. During the decoding process of FIG. 5, reconstruction of the P frame P1 refers to the I frame I0, and reconstructions of the P frames P2 and P3 respectively refer to the P frames P1 and P2. It should be noted that the resolutions (i.e. the frame size) of the P frames P3 and P4 are different. Therefore, according to the algorithm applied by the TMN 3.0 decoding program, the reference picture resampling is first performed on the P frame P3 to generate the resampling reference picture P3′ (i.e. the P frame P3′), and then reconstruction of the P frame P4 can refer to the P frame P3′. In the example, the reference picture resampling performed on the P frame P3 means that the P frame P3′ is generated by performing the up sampling to the P frame P3.
The reconstructions of the P frames P5 and P6 respectively refer to the P frames P4 and P5, and the reconstructions of the P frames P8 and P9 respectively refer to the P frames P7 and P8. Since the resolutions of the P frames P6 and P7 are different, according to the algorithm applied by the TMN 3.0 decoding program, the reference picture resampling is first performed to P frame P6 to generate the resampling reference picture P6′ (the P frame P6′), and then reconstruction of the P frame P7 can refer to the P frame P6′. In the example, the reference picture resampling performed on the P frame P6 means that the P frame P6′ is generated by performing the down sampling to the P frame P3.
During the decoding process, according to the algorithm applied by the TMN 3.0 decoding program, resamplings of the P frames P3 and P6 have to be first performed to obtain the P frames P3′ and P6′ for providing references to the P frames P4 and P7, so as to successfully reconstruct the P frames P4 and P7. Therefore, when the P frames P4 and P7 are decoded, the TMN 3.0 decoding program inevitably increases an operation clock number and memory accessing times for a computer device executing such decoding program.
In summary, according to the algorithm applied by the TMN 3.0 decoding program, extra time is required for calculation, and accessing times of a memory are increased. Therefore, regarding a real-time video service, when the resolution of the reference picture is required to be changed, the frames probably cannot be displayed on the display device in real-time due to extra time consumption of the TMN 3.0 decoding program. Moreover, since the TMN 3.0 decoding program inevitably increases the memory accessing times when the resolution of the reference picture is changed, the required memory bandwidth has to be accordingly increased.