The present invention is related to the capture, editing and playback of motion video and associated audio in digital form, wherein the motion video data is compressed using interframe and intraframe techniques.
Several systems are presently available for capture, editing and playback of motion video and associated audio. A particular category of such systems includes digital nonlinear video editors. Such systems store motion video data as digital data, representing a sequence of digital still images, in computer data files on a random access computer readable medium. A still image may represent a single frame, i.e., two fields, or a single field of motion video data. Such systems generally allow any particular image in the sequence of still images to be randomly accessed for editing and for playback. Digital nonlinear video editors have several benefits over previous video tape-based systems which provide only linear access to video information.
Since digital data representing motion video may consume large amounts of computer memory, particularly for full motion broadcast quality video (e.g., sixty field per second for NTSC and fifty fields per second for PAL), the digital data typically is compressed to reduce storage requirements. There are several kinds of compression for motion video information. One kind of compression is called xe2x80x9cintraframexe2x80x9d compression which involves compressing the data representing each still image independently of other still images. Commonly-used intraframe compression techniques employ a transformation to the frequency domain from the spatial domain, for example, by using discrete cosine transforms. The resulting values typically are quantized and encoded. Commonly-used motion video compression schemes using intraframe compression include xe2x80x9cmotion-JPEGxe2x80x9d and xe2x80x9cI-frame onlyxe2x80x9d MPEG. While intraframe compression reduces redundancy of data within a particular image, it does not reduce the significant redundancy of data between adjacent images in a motion video sequence. For intraframe compressed image sequences, however, each image in the sequence can be accessed individually and decompressed without reference to the other images. Accordingly, intraframe compression allows purely nonlinear access to any image in the sequence.
More compression can obtained for motion video sequences by using what is commonly called xe2x80x9cinterframexe2x80x9d compression. Interframe compression involves predicting one image using another. This kind of compression often is used in combination with intraframe compression. For example, a first image may be compressed using intraframe compression, and typically is called a key frame. The subsequent images may be compressed by generating predictive information that, when combined with other image data, results in the desired image. Intraframe compressed images may occur every so often throughout the sequence. Several standards use interframe compression techniques, such as MPEG-1(ISO/IEC 11172-1 through 5), MPEG-2(ISO/IEC 13818-1 through 9) and H.261, an International Telecommunications Union (ITU) standard. MPEG-2, for example, compresses some images using intraframe compression (called I-frames or key frames), and other images using interframe compression techniques for example by computing predictive errors between images. The predictive errors may be computed for forward prediction (called P-frames) or bidirectional prediction (called B-frames). MPEG-2 is designed to provide broadcast quality full motion video.
For interframe compressed image sequences, the interframe compressed images in the sequence can be accessed and decompressed only with reference to other images in the sequence. Accordingly, interframe compression does not allow purely nonlinear access to every image in the sequence, because an image may depend on either previous or following images in the sequence. Generally speaking, only the intraframe images in the sequence may be accessed nonlinearly. However, in some compression formats, such as MPEG-2, some state information needed for decoding or displaying an intraframe compressed image, such as a quantization table, also may occur elsewhere in the compressed bitstream, eliminating the ability to access even intraframe compressed images nonlinearly.
One approach to handling the playback of serially dependent segments in an arbitrary sequence is described in U.S. Pat. No. 4,729,044, (Keisel). In this system, the dependency between images in a segment is due to the linear nature of the storage media, i.e., video tape. Several tapes containing the same material are used. For any given segment to be played back, an algorithm is used to select one of the tapes from which the material should be accessed. At the same time, a tape for a subsequent segment is identified and cued to the start of the next segment. As a result, several identical sources are processed in parallel in order to produce the final program.
In nonlinear systems, the need for multiple copies of video sources to produce arbitrary sequences of segments has been avoided by the random-access nature of the media. Arbitrary sequences of segments from multiple data files are provided by pipelining and buffering nonlinear accesses to the motion video data. That is, while some data is being decompressed and played back, other data is being retrieved from a data file, such as shown in U.S. Pat. No. 5,045,940 (Peters et al.).
In such systems, video segments still may need to be processed in parallel in order to produce certain special effects, such as dissolves and fades between two segments. One system that performs such effects is described in PCT Publication No. WO 94/24815 (Kurtze et al.). In this system, two video streams are blended by a function xcex1A+(1-xcex1)B wherein A and B are corresponding pixels in corresponding images of the two video streams. A common use of this system is to play segment A, and to cause a transition to segment B over several images. The data required for segment B is loaded into a buffer and decompressed while A is being played back so that decoded pixels for segment B are available at the time the transition is to occur. Similar systems also are shown in U.S. Pat. No. 5,495,291 (Adams) and U.S. Pat. No. 5,559,562 (Ferster). When using interframe compression, if a second segment starts with an interframe image, the processing of the second segment may have to begin earlier during processing of a previous first segment to allow the desired image of the second segment to be available. Ideally, the second segment should be processed from a previous intraframe compressed image. However, these preceding images are not used in the output.
A problem arises when a third segment of interframe and intraframe compressed video is to be played. In particular, the second segment must be long enough to allow the first image of the third segment to be completely processed from a previous intraframe compressed image. If only two channels of decoders are available, this processing for the third sequence would be performed using the same decoder used to process the first segment, after the first sequence is processed. In some cases, the first decoder also may output several images after the last desired image is output. The minimum size of any second segment is referred to as the cut density. While the cut density in principle can be reduced to a single field by using only intraframe compression, interframe compression provides better compression. Accordingly, it is desirable to minimize the cut density using interframe compression.
Another problem in designing a system that is compatible with some standards, such as MPEG-2, is that there are many options that may or may not be present in a coded bitstream. For example, an MPEG-2 formatted bitstream may include only I-frames, or I and P frames, or I, B and P frames. The order in which these frames is displayed also may be different from the order they are stored. Each compressed image also may result in the output of anywhere from zero to six fields. State information needed to decode any particular image, including an I-frame, may also occur at any point in the bitstream. As a result, the ability to randomly access a particular field in an arbitrary MPEG-2 compliant bitstream may be determined by the actual format of the bitstream.
Accordingly, a general aim of the present invention to provide a system which allows nonlinear editing of interframe and intraframe compressed motion video with a minimum cut density. Another general aim in one embodiment of the invention is to allow mixed editing of interframe and intraframe compressed data streams with different compression formats.
Random access to arbitrary fields of a video segment compressed using both interframe and intraframe techniques is enhanced by including state information, for decoding and display, at appropriate points in the compressed bitstream in order to enable random access to each intraframe compressed image to allow each intraframe compressed image to be randomly accessed. In addition, a field index is generated that maps each temporal field to the offset in the compressed bitstream of the data used to decode the field. Additional benefits are provided by playing back segments using two or more alternatingly used decoders. The cut density may be improved by eliminating from the bitstream applied to each decoder any data corresponding to bidirectionally compressed images that would otherwise be used by the decoder to generate fields prior to the desired field.
Accordingly, one aspect of the invention is computer system for editing motion video compressed using interframe and intraframe techniques. The computer system stores a compressed bitstream for each motion video source to be edited. Each compressed bitstream is processed to detect state information which is used to decode and/or display compressed data. The detected state information is added at appropriate points in the bitstream for each intraframe compressed image. The state information also may be properly inserted during compression. The computer system also processes the compressed bitstream to generate an index that maps each temporal field of a corresponding decompressed output image sequence to a first compressed image used to start decompressing the temporal field, and the offset in the bitstream of the data for the first compressed image. The index may be created while the motion video is captured or imported or by using a post-processing approach. The computer system provides an editing system that permits a user to specify a composition of motion video segments, wherein each segment is defined by a range specified in terms of temporal fields within a motion video source. The field index is used to identify portions of the compressed bitstream to be used to generate each of the motion video segments using the range defining the segment. Two or more decoders are used to process, alternatingly, the identified portions of the compressed bitstream for each of the motion video segments.
Another aspect of the invention is a process for enabling each intraframe image in a compressed bitstream of motion video data compressed using intraframe and interframe techniques to be randomly accessed. The compressed bitstream is processed to detect state information. The detected state information is added to the bitstream for each intraframe compressed image, thereby allowing random access to any intraframe compressed image.
Another aspect of the invention is a process for generating a field index for a compressed bitstream of motion video data compressed using intraframe and interframe techniques. In this process the number of video fields represented by each compressed image is determined. The compressed image which is used to start decompressing the bitstream to obtain the temporal field is then identified. A field index entry is then generated for each temporal field which maps the temporal field to an offset in the bitstream of the compressed motion video data which is used to start decompressing the bitstream to produce the temporal field. The index may be accessed using as an input an indication of the desired temporal field.
Another aspect of the invention is a circuit for decoding a plurality of motion video data streams compressed using interframe and intraframe techniques. This circuit includes a plurality of decoders for decoding the compressed video data. An interface receives the compressed video data, and provides the compressed video data to the decoders. This interface eliminates from the bitstream applied to each decoder any data corresponding to bidirectionally compressed images that would otherwise be used by the decoder to generate fields prior to the desired field. A switch connected to the output of the decoders controls which fields of motion video are output from the decoders so that only those fields within a range of specified temporal fields are output.
Other aspects of the invention include the processes and systems or circuits corresponding to the foregoing aspects of the invention, and their various combinations.