The video coding scheme using hierarchical B-picture is a coding scheme that performs hierarchical prediction using B-picture that predicts motion in both directions to add temporal scalability to the existing block-based video coding schemes used in existing international video standards, such as MPEG-1, MPEG-2, MPEG-4 Part 2 Visual, MPEG-4 Part 10 AVC (Advanced Video Coding), or ITU-T H.264, in the same manner as a motion compensated temporal filtering (MCTF) scheme. In this manner, an encoded bitstream can be decoded in the existing international standard video system.
The existing video coding technique based on the hierarchical B-picture performs encoding in units of a power of 2, each of which generally equals the size of a GOP. FIG. 1 shows an encoding concept of a video sequence where a GOP size is 8.
FIG. 2 shows a process of performing prediction using hierarchical B-picture structure in a GOP having a size of 16. First, a bidirectional prediction picture “B1” can be predicted from both intra pictures “I”. Second bidirectional prediction pictures “B2” can be predicted using the pictures “B1” and “I”. Third bidirectional prediction pictures “B3” can be predicted using the pictures “I” and “B2”; and “B1” and “B2”. And fourth bidirectional prediction pictures “B4” can be obtained using the pictures “I” and “B3”; :B1” and “B3”; and “B2” and “B3”. After this hierarchical prediction, a bitstream is generated using the existing international video standard. The temporal scalability function like hierarchical B-picture can be realized by adopting each of the pictures “I”, “B1” and “I” as a base layer, and each of the pictures “B2”, “B3” and “B4” as an enhancement layer.
In the process of constructing the hierarchical B-picture structure as in FIG. 2, when the picture “B1” or “B2” is predicted, there is a high possibility of low prediction efficiency, because a reference frame is temporally distant. The prediction efficiency is highly related to a degree of motion, which is one of characteristics of the video sequence.
FIG. 3 shows data obtained by dividing a part of “Foreman” QCIF (Quarter Common Intermediate Format) 15 Hz video sequence into a 8-sized GOPs and then performing hierarchical B-picture encoding on the divided sequence. It can be seen that the picture has little motion in the GOP, and the encoded data show that good prediction has been made. In this manner, the hierarchical B-picture structure-based encoding produces a good prediction result in a static video.
Meanwhile, FIG. 4 shows data obtained by dividing a part of “Football” QCIF 15 Hz video sequence into 8-sized GOPs and then performing hierarchical B-picture-based encoding on the divided sequence. It can be seen from the figure that image frames change dynamically in a GOP. Thus, it can be concluded that, in a dynamic video sequence, the prediction using hierarchical B-picture structure is not performed well and the more intra blocks are generated in “B1” image frame. In other words, it can be shown that the coding efficiency depends on the degree of motion in a video.
FIG. 5 shows an example where too many intra blocks are included in a prediction frame due to the poor motion prediction when “Football” QCIF 15 Hz video sequence is encoded.
On the basis of the fact that, in a dynamic video sequence, the larger the GOP size, the lower the prediction efficiency of the prediction picture, experiments have been performed while varying the GOP size. FIGS. 6 and 7 show graphs of coding efficiency results with 4 different GOP sizes (1, 2, 4 and 8) for “Football” sequence at QCIF 7.5 Hz and 15 Hz sequences, respectively. As shown, the smaller the GOP size, the higher the coding efficiency.
FIGS. 8 to 10 show the hierarchical B-picture construction process with three different GOP sizes 8, 4 and 2, respectively, for 16th to 24th frames of “Football” QCIF 15 Hz sequence. As a result, it can be seen that, when the GOP size is decreased, the intra frame is increased in one GOP, but the coding efficiency is further improved. Thus, it can be predicted that, in the dynamic video sequence, the smaller the GOP size, the higher the coding efficiency.
In contrast, with regard to the static “Forman” QCIF 15 Hz video sequence, the graphs representing the coding results with different GOP sizes 8, 4, 2 and 1 are shown in FIG. 11. As shown, in the static video sequence, the larger the GOP size, the higher the coding efficiency.
FIG. 12 shows the frame-based PSNR (Peak Signal-to-Noise Ratio) results of hierarchical B-picture-based coding for frames from 17th to 24th of “Football” sequence at QCIF 15 Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2. As shown in this figure, the 2-sized GOP has the best coding efficiency.
FIG. 13 shows the frame-based PSNR results of hierarchical B-picture-based coding for frames from 137th to 144th of “Foreman” sequence at QCIF 15 Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2. As shown in this figure, the 8-sized GOP shows the best coding efficiency.
Although the foregoing descriptions explains the relationship between the GOP size and coding efficiency, by giving examples of a dynamic video sequence with a lot of motion variations and a static video sequence having little motion variations, it is general for one video sequence to include various degrees of motion variations. For example, there are the various degrees of motion variations in “Foreman” video sequence, as can be seen in FIG. 14 shows the frame-based PSNR results of hierarchical B picture-based coding for frames from 97th to 104th of “Foreman” sequence at QCIF 15 Hz at the same bit rate, based on three different GOP size, such as 8, 4 and 2. It can be seen from the figure that the 8-sized GOP has higher coding efficiency than the 4 or 2-sized GOP, which is the opposite to the overall result of “Foreman” video sequence. The 4 or 2-sized GOP may have slightly improved the overall coding efficiency. It can be expected that front four frames have the best coding efficiency when the GOP size is 2, while the rear four frames have the best coding efficiency when the GOP size is 4.
In view of the PSNR results for 97th to 112th frames for “Foreman” QCIF 15 Hz sequence of FIG. 15, it is possible to obtain the optimal coding efficiency when, as shown in FIG. 14, the first four frames are encoded with 2-sized GOP, the next four frames are encoded with 4-sized GOP s and the remaining eight frames are encoded with 8-sized GOP.
Accordingly, when performing the hierarchical B picture-based coding of a video sequence, it is possible to achieve a high coding efficiency by intelligently selecting the GOP size.