The present invention relates to the field of audio/visual content. More specifically, one embodiment provides a system for editing bitstreams in a compressed environment.
The amount of multi-media content and digital video is growing and has become essential for media applications. The Moving Picture Experts Group (xe2x80x9cMPEGxe2x80x9d) has developed a series of standards (MPEG-1, MPEG-2, . . . ) to provide a means for representing digital video and audio signals in a compressed form.
In an MPEG environment, video sequences are represented by compressed bitstreams, which are composed of group of pictures (xe2x80x9cGOPxe2x80x9d) units. A GOP is usually fixed at a certain number of frames, such as 15 frames, and can contain intra (xe2x80x9cIxe2x80x9d), predicted (xe2x80x9cPxe2x80x9d), and bi-directional (xe2x80x9cBxe2x80x9d) frames. An I frame can be independently encoded or decoded and contains only information present in the frame itself. However, a P and B frame must be encoded or decoded using information from a reference frame, which can be either an I or P frame. Accordingly, a P frame is encoded or decoded depending on a past reference frame and a B frame can be encoded or decoded with a dependence on a past frame, a future frame, or both past and future frames. Further, each GOP can be independently decoded without reference to other GOPs.
Manipulation of these video sequences has become increasingly popular and various methods have been proposed to edit MPEG bitstreams. For example, a straightforward way to edit MPEG bitstreams is to decode all the segments, edit the segments, and then re-encode the edited decompressed frames into new MPEG bitstreams. However, two major drawbacks of this approach are (1) the process is too computationally intensive, and (2) the accumulated quality losses associated with multiple editing.
Additionally, compressed domain editing solutions have been developed. When applying editing operations, e.g., cut and paste operations on MPEG video bitstreams, two important issues related to: (1) frame type conversion and (2) buffer constraint must be taken into account. A frame type conversion involves decoding and re-encoding a frame type of a GOP to another frame type. For example, a B frame could be decoded and re-encoded to an I frame. However, decoding and re-encoding frames could present problems related to buffer control. An I, P, and B frame each contain a different number of bits with a common ratio in the art as 100:10:1 for I, P, and B frames respectively. Thus, the number of bits needed for I, P, and B frames can be compared as I greater than P greater than B. Basically, I frames contain the most number of bits with P frames containing less, and B frames containing the least. Therefore, I frames contain the most information with P frames containing less, and B frames containing the least. Usually, these relationships generally follow the common ratio of 100:10:1. Thus, a conversion of a B frame to an I frame would result in great bitrate reduction. In other words, the converted I frame would contain 100 times more bits than the original B frame. Also, in a constant-bit-rate encoding, where video sequences are encoded with rate control constraints, the bitrate increase could result in overflow/underflow issues at the decoder buffer.
FIG. 1 shows an example of a compressed domain editing solution. A cut and paste editing operation on first and second MPEG bitstreams where a frame type conversion is conducted by simply re-encoding a few frames to generate a shorter GOP starting with an I frame is shown.1 Basically, a segment is cut out of MPEG bitstream I and pasted into the new MPEG bitstream. As shown, the segment contains a first broken GOP of four frames, a GOP, an indeterminate number of GOPs, a GOP, and a second broken GOP of three frames. Additionally, a second segment is cut out of MPEG bitstream 2 and pasted into the new MPEG bitstream. As shown, the second segment contains a first broken GOP of three frames, a GOP, an indeterminate number of GOPs, a GOP, and a second broken GOP of four frames.
1 J. Meng and S. F. Chang, xe2x80x9cBuffer Control Techniques for Compressed-domain Video Editing,xe2x80x9d The Proceedings of IEEE International Conference on Image Processing, pp. 600-03, 1996. J. Meng and S. F. Chang, xe2x80x9cCVEPS: A Compressed Video Editing and Parsing System,xe2x80x9d ACM Multimedia Conference, Boston, Mass., November 1996. 
A frame type conversion is conducted by re-encoding the first frame of the first broken GOPs in both segments to an I frame. As shown in FIG. 1, the B frame of the first segment in the first broken GOP is converted from a B frame to an I frame and the P frame of the second segment in the first broken GOP is converted from a P frame to an I frame. In addition to performing the frame-type conversion, the four newly created GOPs have a shorter GOP size. For example, the first GOP of the first segment is four frames in length, the last GOP of the first segment is three frames, the first GOP of the second segment is three frames, and the last GOP of the second segment is four frames.
Some drawbacks of the approach in FIG. 1 can be summarized in terms of video quality, bitrate control, and flexibility. The frame type change from a B to I frame usually generates the worst video quality because a B frame is bi-directionally encoded from its predicted frames so that a B frame contains only minimum information. Basically, the I frame should be coded with much more information. Also, a big bitrate change from a B frame to I frame (e.g., in a factor of 50 or more) usually causes the difficulty of controlling the bitrate given the video buffer constraint as mentioned above. Further, if a segment only contains one frame, the bitrate and buffer control becomes complicated and difficult and resulting in less flexibility.
A system and method for editing a bitstream is provided by virtue of the present invention. In one embodiment, a first segment is cut from a first bitstream, which contains multiple GOPs. Additionally, a second segment is cut from a second bitstream, which also contains multiple GOPs. In cutting the segments from the bitstreams, the first and last GOPs in the segments can be cut between frames in the GOP. Thus, a segment can contain a broken GOP at the beginning and/or the end of the segment where a frame type conversion might be required.
The number of frames of the broken GOP is then compared to a predetermined threshold value. If the number of frames of the broken GOP is less than or equal to the threshold value, the broken GOP is combined with a neighboring GOP. For example, the broken GOP is combined with the next GOP if the broken GOP is the first GOP in the cut segment or the broken GOP is combined with the previous GOP if the broken GOP is the last GOP in the cut segment. Therefore, the newly created GOP is longer than the standard size regular GOP. Additionally, the reference frames in the broken segment will be converted to either a B or P frame depending on the frame type of the reference frame and the position of the reference frame. For example, either a P frame is converted to a B frame or an I frame is converted to a P frame. Further, most of B frames in the broken GOP will be modified depending on their positions. If a B frame needs to be converted, it will become one directional prediction only. Those B frames in the first broken GOP will become backward prediction only. But those B frames in the last broken GOP will become forward prediction only. However, if the last frame in the last broken GOP is a P frame, those B frames between I frame and P frame will remain unchanged. Additionally, only B frames in the broken GOP need to be changed. Those frames inside a complete GOP remain unchanged.
J. Meng and S. F. Chang, xe2x80x9cTools for Compressed-Domain Video Indexing and Editing,xe2x80x9d SPIE Conference on Storage and Retrieval for Image and Video Database, San Jose, February 1996. 
However, if the number of frames in the broken GOP is greater than the threshold value, a new GOP is created with the frames of the broken GOP. Therefore, in this case, the newly created GOP is shorter than the standard size regular GOP. Additionally, the first reference frame in the first broken GOP is converted to an I frame. For example, a first P frame becomes an I frame in the newly created GOP. Additionally, any B frames right before the new I frame are converted to backward prediction only.
In the last broken GOP case, if the last frame in the last broken GOP is a P frame, the directional conversion of the B frames does not need to be applied. But, if the last frame in the last broken GOP is a B frame, all B frames right after the last P frame will be converted from bi-directional prediction to forward prediction. Finally, the edited segments from the first and second bitstream are combined to create a new bitstream.
Although only two bitstreams were used to describe this process, it is noted that there is no limit to the number of bitstreams that can be cut, edited, and pasted into a new bitstream.
In an embodiment of a computer-readable medium, a computer system includes instructions for editing a plurality of bitstreams. The computer readable medium comprising: one or more instructions for cutting a plurality of segments from the plurality of bitstreams, the plurality of segments comprising at least one group of frames, wherein the at least one group of frames comprises at least one broken group of frames; one or more instructions for comparing a number of frames of the at least one broken group of frames in the plurality of segments to a threshold value; one or more instructions for editing the plurality of segments according to the comparison; and one or more instructions for creating a new bitstream by combining the edited plurality of segments.
In an embodiment of a computer data signal embodied in a carrier wave, the signal is generated by a method and includes instructions for editing a plurality of bitstreams comprising: one or more instructions for cutting a plurality of segments from the plurality of bitstreams, the plurality of segments comprising at least one group of frames, wherein the at least one group of frames comprises at least one broken group of frames; one or more instructions for comparing a number of frames of the at least one broken group of frames in the plurality of segments to a threshold value; one or more instructions for editing the plurality of segments according to the comparison; and one or more instructions for creating a new bitstream by combining the edited plurality of segments.
A further understanding of the nature and advantages of the invention herein may be realized by reference to the remaining portions of the specification and the attached drawings.