1. Field of the Invention
The present invention relates generally to methods for editing video files. More particularly, the invention relates to various methods and apparatuses for processing video frames and video segments to facilitate editing. In one aspect, methods and apparatuses for copying a segment of video from an input data stream and processing the copied segment to be independent of information contained in the original input data stream are disclosed.
2. Description of the Related Art
MPEG (motion pictures experts group) is a standard promulgated by the International Standards Organization (ISO) to provide a syntax for compactly representing digital video and audio signals. The syntax generally requires that a minimum number of rules be followed when bit streams are encoded so that a receiver of the encoded bit stream may unambiguously decode the received bit stream. As is well known to those skilled in the art, a bit stream will also include a "system" component in addition to the video and audio components. Generally speaking, the system component contains information required for combining and synchronizing each of the video and audio components into a single bit stream. Specifically, the system component allows audio/video synchronization to be realized at the decoder.
Since the initial unveiling of the first MPEG standard entitled MPEG-1, a second MPEG standard known as MPEG-2 was introduced. In general, MPEG-2 provided an improved syntax to enable a more efficient representation of broadcast video. By way of background, MPEG-1 was optimized to handle data at a rate of 1.5 Mbits/second and reconstruct about 30 video frames per second, with each frame having a resolution of 352 pixels by 240 lines (NTSC), or about 25 video frames per second, each frame having a resolution of 352 pixels by 288 lines (PAL). Therefore, decoded MPEG-1 video generally approximates the perceptual quality of consumer video tapes (VHS). In comparison, MPEG-2 is designed to represent CCIR 601-resolution video at data rates of 4.0 to 8.0 Mbits/second and provide a frame resolution of 720 pixels by 480 lines (NTSC), or 720 pixels by 576 lines (PAL). For simplicity, except where distinctions between the two versions of the MPEG standard exist, the term "MPEG," will be used to reference video and audio encoding and decoding algorithms promulgated in current as well as future versions.
Typically, a decoding process begins when an MPEG bit stream containing video, audio and system information is demultiplexed by a system decoder that is responsible for producing separate encoded video and audio bit streams that may subsequently be decoded by a video decoder and an audio decoder. Attention is now directed at the structure of an encoded video bit stream. Generally, an encoded MPEG video bit stream is organized in a distinguishable data structure hierarchy. At the highest level in the hierarchy is a "video sequence" which may include a sequence header, one or more groups of pictures (GOPs) and an end-of sequence code. GOPs are subsets of video sequences, and each GOP may include one or more pictures. As will be described below, GOPs are of particular importance because they allow access to a defined segment of a video sequence, although in certain cases, a GOP may be quite large.
Each picture within a GOP is then partitioned into several horizontal "slices" defined from left to right and top to bottom. The individual slices are in turn composed of one or more macroblocks which identify a square area of 16-by-16 pixels. As described in the MPEG standard, a macroblock includes four 8-by-8 pixel "luminance" components, and two 8-by-8 "chrominance" components (i.e., chroma red and chroma blue).
Because a large degree of pixel information is similar or identical between pictures within a GOP, the MPEG standard takes particular advantage of this temporal redundancy and represents selected pictures in terms of their differences from a particular reference picture. The MPEG standard defines three general types of encoded picture frames. The first type of frame is an intra-frame (I-frame). An I-frame is encoded using information contained in the frame itself and is not dependent on information contained in previous or future frames. As a result, an I-frame generally defines the starting point of a particular GOP in a sequence of frames.
A second type of frame is a predicted-frame (P-frame). P-frames are generally encoded using information contained in a previous I or P frame. As is well known in the art, P frames are known as forward predicted frames. The third type of frame is a bi-directional-frame (B-frame). B-frames are encoded based on information contained in both past and future frames, and are therefore known as bi-directionally predicted frames. Therefore, B-frames provide more compression that both I-frames and P-frames, and P-frames provide more compression than I-frames. Although the MPEG standard does not require that a particular number of B-frames be arranged between any I or P frames, most encoders select two B-frames between I and P frames. This design choice is based on factors such as amount of memory in the encoder and the characteristics and definition needed for the material being coded.
Although the MPEG standard defines a convenient syntax for compactly encoding video and audio bit stearns, significant difficulties arise when a segment of an encoded bit stream is clipped out for use in a new bit stream. In particular, because P-frames use information from previous frames in the bit stream, and B frames use information from both previous and future frames, clips must be performed at I-frames. That is, the clipped segment must have an 1-frame as a starting frame and a P or an I frame as the final frame in the clipped segment. Performing clips at I-frames therefore eliminates producing video clips that have beginning and ending frames which cannot be decoded without the reference frames contained in the original bit stream.
Unfortunately, typical encoded video bit streams have a larger number of P and B frames in between I-frames. Consequently, this disadvantageously limits the locations at which a clip may be performed, and therefore renders encoded MPEG bit streams unsuitable for the video editing industry which demands frame accurate precision.
In view of the forgoing, what is needed is a method and apparatus for editing video bit streams with frame accurate precision. In particular, there is a need for a method and apparatus for clipping segments in a video bit stream which allow beginning and ending clipped segments of video at any frame within the bit stream without losing the ability to decode frames in the clipped segment.