1. Field of the Invention
The invention relates to a video encoding method, especially to a video encoding method with support for editing when scene changed.
2. Description of the Related Art
In MPEG (Moving Pictures Experts Group), there are three picture types: I-picture, P-picture and B-picture. I-pictures are coded without referring to other pictures. I-pictures provide the coded sequence with access points, which are the starting points for the decoding process, but are coded with only moderate compression. P-pictures are coded more efficiently using motion compensated prediction from a past I-picture or P-picture and are generally used as a reference for further prediction. B-pictures provide the highest degree of compression but require both past and future reference pictures for motion compensation. B-pictures are never used as references for prediction. The organization of the three picture types in a sequence is very flexible. The choice of the sequence is determined by the encoder and will depend on the requirements of the application.
Because the B-pictures must refer to the past and future reference pictures, the encoding process of the B-pictures is delayed until the future reference picture is coded. Therefore, the display order is different to the coding order. This is called the reordering of B-pictures.
In MPEG-1, there is a group-of-pictures (hereinafter called as GOP) structure used to enclose some pictures into a group for manipulation. A GOP contains one I-picture, some P-pictures and some B-pictures. A GOP begins with an I-picture, and ends before the next I-pictures, in the coding order. In MPEG-2, the GOP structure becomes an option.
Generally, an encoder employs a fixed GOP structure. The size of a GOP is defined as N, and the distance between two reference pictures is defined as M. FIG. 1 illustrates a GOP with N=15 and M=3.
Typically, if the input signal for the encoder is in NTSC (National Television System Committee) format (29.97 fps), the GOP structure with N=15 and M=3 is used. If the input signal is in PAL (25 fps) or film format (24 fps), the GOP structure with N=12 and M=3 is used. These fixed default settings can achieve a good balance between the complexity of an encoder and the coding performance of most types of videos.
Typically, the editing process would cut the whole video sequence into pieces based on the scene, and then rearrange them to form a new video sequence. If a video sequence is coded with a fixed pattern composed with only I- and P-pictures, like IPPPPIPPPP . . . , the situation is pretty simple. If a scene change occurs in an I-picture of the video sequence (IPPPPIPPPP . . . ), the video sequence can be cut into two parts without any loss. If a scene change occurs in a P-picture of the video sequence, the former part of the video sequence is in a normal operation, but the remaining part of the video sequence has to be re-encoded. The first P-picture of the of the remaining part of the video sequence has to be decoded and then re-encode to an I-picture. However, because the re-encoded I-picture differs from the original P-picture, there will be some error propagations. Re-encode the whole remaining part of the GOP until the next I-picture would be a better solution, but we would remind that re-encoding degrades the image quality significantly.
If there are B-pictures in the coded sequence, video editing becomes more complex. Please refer to FIG. 2. If a scene change occurs in the picture just after the I-picture in the coding order, like the picture B4, cutting from picture I6 can separate the two scenes easily. However, even the picture P3 and picture B4 are belong to different scenes, there would be some macroblocks in picture B4 and B5 which needs to refer to the picture P3. Therefore the picture B4 and B5 have to be re-encoded according to the picture I6 merely. Discarding the pictures B4 and B5 is the easiest way, but losing the beginning pictures of a scene would not be acceptable.
If a scene change occurs in the picture B5, the former part and the remaining part of the GOP have some pictures to be re-encoded. The picture B4 has to be re-encoded to a P-picture and then append to the former part. In the remaining part, the coded data of the picture B4 is removed and the picture B5 has to be re-encoded.
If a scene change occurs in the picture I6, the remaining part of the GOP has only to remove the coded data of the pictures B4 and B5. However, the former part of the GOP requires a complicate process. One solution is to re-encode the picture B5 to a P-picture, and then re-encode the picture B4 according to the pictures P3 and B5. Another solution is to change the two B-pictures B4 and B5 to two P-pictures.
If a scene change occurs in the picture B7, the former part of the GOP doesn't need any additional process, and a new I-picture has to be generated for the remaining part. A choice is to change the picture B7 to an I-picture, and then re-encode the remaining GOP. However, because the B-pictures usually coded with a lower quality than the I- and P-pictures, a better choice is to change the picture P9 to an I-picture, and re-encode the remaining GOP. The pictures B4 and B5 are B-pictures with only backward reference. This method can also reduce the number of P-pictures to reduce the error caused by referring to a re-encoded picture.
If a scene change occurs in the picture B8, the former part of the GOP has only to re-encode the picture B7 to a P-picture. The remaining part of the GOP can change the picture P9 to an I-picture and then re-encode the remaining part of the GOP.
Finally, if a scene change occurs in the picture P9, the former part of the GOP is processed like the situation of picture I6. For the remaining part of the GOP, the picture P9 has to be changed to an I-picture, and then re-encode the remaining GOP.
Therefore, all the other situations can be processed like the methods described above, even if the number of B-pictures between two reference pictures increases to three or more.
Generally, the I-pictures are designed for the purpose of random access and preventing of error propagation. The P-pictures use the motion compensation to remove the temporal redundancy between the current picture and the reference picture to improve the compression performance. However, if there is almost no temporal redundancy between the current picture and the reference picture, such as a scene change, coding a picture as a P-picture can't obtain any benefit. In this case, coding a picture as an I-picture can achieve the same coding quality with fewer bits. Therefore, an encoder has to detect the existence of a scene change and then start a new GOP. There are many researches of the scene change detection and the algorithm of adjusting the rate control. A general idea is to detect the difference of the current picture and the reference picture from the result of motion estimation. If more than a percentage of macroblocks select the intra-coded mode, the encoder can decide that there is only few temporal redundancy existed, and therefore a scene change can be detected.
However, when a scene change is detected, if the encoder just starts a new GOP without any effort, the re-encoding of some pictures would be unavoidable during the video sequence editing process as described above.