The disclosed invention is for use with film and more specifically with the processing of edited digital film and video.
Digital film is composed of multiple frames of digital data each corresponding to an image captured at a different point in time as shown in FIG. 1A. Currently, film is shot and displayed at 24 frames per second. In contrast, video is displayed at the rate of 60 xe2x80x9cfieldsxe2x80x9d per second as shown in FIG. 1B. A field consists of one-half the number of lines forming a complete image. In video, the odd and even lines of a complete image are displayed in successive fields. This process is known in the art as xe2x80x9cinterlacing.xe2x80x9d
In order to show digital film in a digital video environment the digital film data must undergo a conversion process known as xe2x80x9c3:2 pulldown.xe2x80x9d 3:2 pulldown creates ten fields of video from four frames of film thus allowing a film sequence to be displayed at 60 fields per second. Let A, B, C, D represent four consecutive film frames and AaBbCcDd represents the four frames represented as 8 interlaced fields as shown in FIG. 2A. The ten corresponding video fields are then
A, a, B, b, B, c, C, d, D, d
where A, B, C, D represent, respectively, the odd lines of A, B, C, D and a, b, c, d represent, respectively, the even lines of A, B, C, D as shown in FIG. 2B. The odd lines of frame B and the even lines of frame D are used twice.
In the video industry, digital film sequences and digital video sequences are often edited together. After editing in which video originating material, film originating material (after 3:2 pulldown), and computer generated effects are combined, it is often desirable to return the edited video sequence back to a film format. A film format for a digital video sequence allows for efficient compression and theater display. Yet there is no direct and simple way to return to the 24 f/s film format without introducing visual distortions, which reduce the commercial value for broadcasting as well as degrade the output of standard forms of processing, such as MPEG compression.
For example, assume that two pieces of film are to be edited together and that there will be no mid frame splices; then there are 25 possibilities for the type of splice, corresponding to joining each of five possible types of cuts in first piece to each of five types in the second piece:
Each of the five sections in the first column can be joined to any of the five sections in the second column.
Returning the edited sequence to a film format requires reconstituting the sequence of fields into another sequence of (approximately) the same length which has the proper cadence. Each successive group of 10 fields could then be converted to 8 fields such that adjacent pairs (2n+1, 2n+2) correspond to the same instant in time, that during compression the locations of redundant fields are the same as after 3:2 pulldown.
Two of these 25 combinations are perfectxe2x80x9d as they stand, namely pairing row 1 of the first column with row 1 of the second column, or pairing row 5 with row 2. Consider the (1,1) pairing, namely
(A,a,B,b,B,c,C,d,D,d,Axe2x80x2,axe2x80x2,Bxe2x80x2,bxe2x80x2,Bxe2x80x2,cxe2x80x2,Cxe2x80x2,dxe2x80x2,Dxe2x80x2,dxe2x80x2).
This means that either a section of material composed of whole cycles (contiguous fields of size ten starting with type A) was removed during editing or perhaps that such a section was inserted elsewhere in the sequence, but precisely between two adjacent cycles. In this case, a sequence of two cycles could be created directly from these 20 fields, namely
(A,a,B,b,C,c,D,d,Axe2x80x2,axe2x80x2,Bxe2x80x2,bxe2x80x2,Cxe2x80x2,cxe2x80x2,Dxe2x80x2,dxe2x80x2)
The corresponding (virtual) film frames are A,B,C,D,Axe2x80x2,Bxe2x80x2,Cxe2x80x2,Dxe2x80x2, obtained by simply deleting one copy of each of the fields B,d,Bxe2x80x2,dxe2x80x2 and reordering the remaining fields as indicated. The (5,2) pairing, namely
(A,a,Bxe2x80x2,bxe2x80x2,Bxe2x80x2,cxe2x80x2,Cxe2x80x2,dxe2x80x2,Dxe2x80x2,dxe2x80x2)
might occur, for example, if the section BbBcCdDd is edited out. In this case, the natural rhythm could be recovered by creating four film frames, namely A,Bxe2x80x2,Cxe2x80x2Dxe2x80x2, where A=(A,a),B=(Bxe2x80x2bxe2x80x2), etc.
Digital video, whatever its origin, is usually heavily processed, due especially to standard editing and the introduction of special effects. The disclosed method determines an instruction set for reordering an edited digital video sequence composed of digital video fields from multiple sources. When the digital video sequence is reordered temporal cadence is provided which will allow for the conversion to a digital film format through a reverse 3:2 pulldown.
Let Fold=(F1old,F2old, . . . ,FNold) be the given edited sequence of video fields. The method calculates an instruction set which is then used to transform Fold into a new sequence of video fields, denoted Fnew, where most of the fields in Fnew come from Fold and the remaining fields are xe2x80x9cupconvertedxe2x80x9d fields from Fold. This reconstitution of Fold is obtained by optimizing a set of instructions based on various constraints which express the characteristics of the pattern AaBbBcCdDd. By assigning a cost to each violation of the constraints, and to each disruption of the natural flow of time, and to other undesirable properties, a real-valued function is constructed. This real valued function can then be optimized through dynamic programming.
One constraint used to define the instruction set is that only orphan fields are upconverted. An orphan field is defined as a field which does not have a partner field of the opposite parity. For example, in the sequence AaBbVvcBbDd the field c is an orphan field whereas field A has partner field a. Further, the constraint applies upconversion only when the xe2x80x9ccostxe2x80x9d for upconverting is less than that of any other method of restoring cadence by re-arranging existing fields. Another constraint which is used to determine the instruction set is that the ordering of the fields in Fnew is preserved from Fold. A further constraint is that the number of fields in the old and the new video sequence should be approximately equal.
After the cost for the instruction set is minimized, Fold is reordered into Fnew such that each successive group of ten fields is of the form AaBbBcCdDd and thus the video field sequence of Fnew has perfect cadence. Once Fnew is determined, the new sequence of fields is converted to a film format, where each film frame corresponds to two video fields. The conversion is achieved by deleting the fifth and tenth fields of each cycle of ten fields of Fnew and reversing the order of cC and dD.
In one embodiment, the constraints are defined in terms of video field labels. Video field labels are labels assigned to the video fields as defined in U.S. provisional patent application serial No. 60/150,020 entitled xe2x80x9cVideo Field Labelingxe2x80x9d filed on Aug. 20, 1999 which is incorporated herein, in its entirety, by reference. The video field labels convey information about the origin of the fields in Fold, namely whether each one is film-originating or video-originating, odd or even, and its location relative to edit points, and the location in the AaBbBcCdDd cycle in the case of film-origination.
In another embodiment perfect cadence is obtained using an alternative procedure. First video field labels are determined for the edited video field sequence. Based on the designated labels, orphan fields are determined through a quick label comparison and the orphan fields are marked. The method then eliminates repeated fields. For example, the third field of a three field sequence forming a pattern of first repeated odd field, even field, second repeated odd field of the form BbB or first repeated even field, odd field, second repeated even field of the form dDd is eliminated.
The edited video field sequence then undergoes a reordering for all film frames so that each pair of video frames is ordered such that it is an odd/even pairing. Then all fields designated as video-field-originating undergo motion compensated standards conversion or are uniformly decimated, such that 60 video originating fields are decimated to 24 fields and then the 24 fields are each upconverted resulting in 24 frames. After the video frames are converted, a decision is made regarding the marked orphan fields. The overall temporal length of the video sequence is determined for the video at a rate of 24/frames per second and this is compared to the overall temporal length of the original edited video sequence which is displayed at 60 fields per second.
Orphan fields are either upconverted to increase the temporal length of the new sequence so that the temporal length is identical to the edited video field sequence or the orphan fields are dropped to decrease the overall temporal length of the new film format video sequence. A further step may include performing a 3:2 pulldown on the new film format video sequence.