1. Field of the Invention
The present invention relates generally to methods for editing video files. More particularly, the invention relates to various methods and apparatuses for processing video frames and video segments to facilitate editing. In one aspect, methods and apparatuses for copying a segment of video from an input data stream and processing the copied segment using a stitcher to be independent of information contained in the original input data stream are disclosed.
2. Description of the Related Art
MPEG (motion pictures experts group) is a standard promulgated by the International Standards Organization (ISO) to provide a syntax for compactly representing digital video and audio signals. The syntax generally requires that a minimum number of rules be followed when bit streams are encoded so that a receiver of the encoded bit stream may unambiguously decode the received bit stream. As is well known to those skilled in the art, a bit stream will also include a xe2x80x9csystemxe2x80x9d component in addition to the video and audio components. Generally speaking, the system component contains information required for combining and synchronizing each of the video and audio components into a single bit stream. Specifically, the system component allows audio/video synchronization to be realized at the decoder.
Since the initial unveiling of the first MPEG standard entitled MPEG-1, a second MPEG standard known as MPEG-2 was introduced. In general, MPEG-2 provided an improved syntax to enable a more efficient representation of broadcast video. By way of background, MPEG-1 was optimized to handle data at a rate of 1.5 Mbits/second and reconstruct about 30 video frames per second, with each frame having a resolution of 352 pixels by 240 lines (NTSC), or about 25 video frames per second, each frame having a resolution of 352 pixels by 288 lines (PAL). Therefore, decoded MPEG-1 video generally approximates the perceptual quality of consumer video tapes (VHS). In comparison, MPEG-2 is designed to represent CCIR 601-resolution video at data rates of 4.0 to 8.0 Mbits/second and provide a frame resolution of 720 pixels by 480 lines (NTSC), or 720 pixels by 576 lines (PAL). For simplicity, except where distinctions between the two versions of the MPEG standard exist, the term xe2x80x9cMPEG,xe2x80x9d will be used to reference video and audio encoding and decoding algorithms promulgated in current as well as future versions.
Typically, a decoding process begins when an MPEG bit stream containing video, audio and system information is demultiplexed by a system decoder that is responsible for producing separate encoded video and audio bit streams that may subsequently be decoded by a video decoder and an audio decoder. Attention is now directed at the structure of an encoded video bit stream. Generally, an encoded MPEG video bit stream is organized in a distinguishable data structure hierarchy. At the highest level in the hierarchy is a xe2x80x9cvideo sequencexe2x80x9d which may include a sequence header, one or more groups of pictures (GOPs) and an end-of sequence code. GOPs are subsets of video sequences, and each GOP may include one or more pictures. As will be described below, GOPs are of particular importance because they allow access to a defined segment of a video sequence, although in certain cases, a GOP may be quite large.
Each picture within a GOP is then partitioned into several horizontal xe2x80x9cslicesxe2x80x9d defined from left to right and top to bottom. The individual slices are in turn composed of one or more macroblocks which identify a square area of 16-by-16 pixels. As described in the MPEG standard, a macroblock includes four 8-by-8 pixel xe2x80x9cluminancexe2x80x9d components, and two 8-by-8 xe2x80x9cchrominancexe2x80x9d components (i.e., chroma red and chroma blue).
Because a large degree of pixel information is similar or identical between pictures within a GOP, the MPEG standard takes particular advantage of this temporal redundancy and represents selected pictures in terms of their differences from a particular reference picture. The MPEG standard defines three general types of encoded picture frames. The first type of frame is an intra-frame (I-frame). An I-frame is encoded using information contained in the frame itself and is not dependent on information contained in previous or future frames. As a result, an I-frame generally defines the starting point of a particular GOP in a sequence of frames.
A second type of frame is a predicted-frame (P-frame). P-frames are generally encoded using information contained in a previous I or P frame. As is well known in the art, P frames are known as forward predicted frames. The third type of frame is a bi-directional-frame (B-frame). B-frames are encoded based on information contained in both past and future frames, and are therefore known as bi-directionally predicted frames. Therefore, B-frames provide more compression that both I-frames and P-frames, and P-frames provide more compression than I-frames. Although the MPEG standard does not require that a particular number of B-frames be arranged between any I or P frames, most encoders select two B-frames between I and P frames. This design choice is based on factors such as amount of memory in the encoder and the characteristics and definition needed for the material being coded.
Although the MPEG standard defines a convenient syntax for compactly encoding video and audio bit steams, significant difficulties arise when a segment of an encoded bit stream is clipped out for use in a new bit stream. In particular, because P-frames use information from previous frames in the bit stream, and B frames use information from both previous and future frames, clips must be performed at I-frames. That is, the clipped segment must have an I-frame as a starting frame and a P or an I frame as the final frame in the clipped segment. Performing clips at I-frames therefore eliminates producing video clips that have beginning and ending frames which cannot be decoded without the reference frames contained in the original bit stream.
Unfortunately, typical encoded video bit streams have a larger number of P and B frames in between I-frames. Consequently, this disadvantageously limits the locations at which a clip may be performed, and therefore renders encoded MPEG bit streams unsuitable for the video editing industry which demands frame accurate precision.
In view of the forgoing, what is needed is a method and apparatus for editing video bit streams with frame accurate precision. In particular, there is a need for a method and apparatus for clipping segments in a video bit stream which allow beginning and ending clipped segments of video at any frame within the bit stream without losing the ability to decode frames in the clipped segment.
To achieve the foregoing in accordance with the purpose of the present invention, a method and apparatus for editing a video file through the use of an editing engine is disclosed. In one embodiment of this invention, the editing engine is used to clip segments of video from an MPEG bit stream file and processing portions of the clipped segment to generate a bit stream segment that is independent of information contained in the original bit stream file. Generally, the editing engine processes the clipped segment in two processing passes through an edit list provided by an application requesting a particular editing operation. In the first processing pass, the editing engine preferably generates glue segments for the clipped segment based on the type of frames located at the beginning and at the end of the clipped segment. In the second processing pass, any glue segments generated in the first pass may be stitched to any un-processed portion of the clipped segment. Once any glue segments and un-processed portions are stitched in a time ordered sequence, the stitched segment is output to the application. Advantageously, the stitched segment will not require information contained in the original bit stream file in order to accurately decode video frames in the clipped segment.
In another embodiment, a method for clipping a segment from a video file having a multiplicity of video frames is disclosed. Preferably, at least some of the frames in the video file are encoded as predicted frames. The method includes selecting a mark-in location in the video file that defines the beginning of the clipped segment. A mark-out location defining the end of the clipped segment is also selected in the video file. Once the mark-in and mark-out locations are selected, the method decodes a first frame associated with one of the mark-in location and the mark-out location. The first frame is preferably a predictive frame that has an associated first format. The first frame is then re-encoded into a second format and stored. The method then proceeds to create a clipped segment that includes the re-encoded first frame.
In yet another embodiment, a method for copying a segment from a video file is disclosed. The method includes the steps of selecting a mark-in location in the video file such that the mark-in location defines the beginning of the copied segment. Once the mark-in location is selected, each of the frames positioned between the mark-in location and a final group frame associated with a group of frames that includes the mark-in location are decoded. Preferably, each decoded frame will have an associated first format. The decoded frames are then re-encoded into an associated second format such that the re-encoded second format of at least one of the decoded frames is different than its associated first format. The re-encoded frames are then stored. The method then generates a copied segment that includes at least the re-encoded frames.
Although the advantages are numerous, a particular advantage of this invention is that the generated copied video segment will not require information from the original input stream in order to decode video frames in the copied segment. Specifically, the copied segment will be a frame accurate segment that is an independently playable output stream. Further, the copied video segment may be joined with other copied segments to create new edited video streams. It should also be appreciated that the editing engine of this invention may process any type of editing request by creating an appropriate editing operator. Additionally, new editing operators may be installed by future applications requesting a particular editing operation.