The disclosed invention is for use with film and more specifically with the processing of edited digital video.
Digital film is composed of multiple frames of digital data each corresponding to an image captured at a different point in time as shown in FIG. 1A. Currently, film is shot and displayed at 24 frames per second. In contrast, video is displayed at the rate of 60 xe2x80x9cfieldsxe2x80x9d per second as shown in FIG. 1B. A field consists of one-half the number of lines forming a complete image. In video, the odd and even lines of a complete image are displayed in successive fields. This process is known in the art as xe2x80x9cinterlacing.xe2x80x9d
In order to show digital film in a digital video environment the digital film data must undergo a conversion process known as xe2x80x9c3:2 pulldown.xe2x80x9d 3:2 pulldown creates ten fields of video from four frames of film. Let A, B, C, D represent four consecutive film frames and AaBbCcDd represents the four frames represented as 8 interlaced fields as shown in FIG. 2A. The ten corresponding video fields are then
A,a,B,b,B,c,C,d,D,d
where A, B, C, D represent, respectively, the odd lines of A, B, C, D and a, b, c, d represent, respectively, the even lines of A, B, C, D as shown in FIG. 2B. The odd lines of frame B and the even lines of frame D are used twice.
If no further processing of the resulting video stream occurs (such as cutting and splicing, introducing fades, etc.) then a simple enumeration of the video fields is sufficient to determine the particular film frame from which that field is extracted. However, in the video industry, digital film sequences and digital video sequences are often edited together. After editing in which both video originating and film originating material are combined, it is often desirable to return the edited video sequence back to a film format. Preserving the temporal length of an edited video sequence has presented a difficult process for those in the art, due to the inability to tell the originating frame for a given field in the edited video sequence.
Digital video, whatever its origin, is usually heavily processed, due especially to standard editing and the introduction of special effects. The disclosed method determines labels for video fields by identifying the state of the field.
Some examples of a video field""s state include the origin of the field as film or video, its relative location with respect to edit points, and in the case of film-originating material, the location within the standard sequential pattern which results from converting film to video.
To determine the label of a video field, the conditional probability distribution for a particular sequence of states given the entire video sequence is calculated. This may be optimized by using dynamic programing to maximize the conditional probability function and thus the labels. To determine the conditional probability, first the joint probability distribution is determined for the observed video fields and the states. This joint probability is calculated by creating a data model and a structure model for the video sequence.
A data model is the conditional probability of observing a video field sequence given a sequence of states. The data model is determined by comparing fields and determining interfield differences and normalizing the result. The structure model is the probability for each sequence of states and is determined based upon statistics regarding video field transitions from past video sequences. By combining the data model and the structure model the joint probability distribution is calculated.
From the joint probability distribution an equation representing the conditional probability of having a particular sequence of states given a particular video field is determined. By maximizing this distribution over all states, which may be performed through dynamic programming the most likely state is determined which is consistent with the given video sequence data. Once the states are determined, labels corresponding to the states may be inserted within the video sequence.
When implemented in a computer program or computer program product, the computer code comprises code for receiving historic video field information regarding transitions. The computer code, uses the digital video field data from the edited video sequence and calculates the conditional densities for all possible sequences and then uses dynamic programming to calculate the labels based on the historic video field information and the conditional densities.
The information provided by the labels is necessary for a variety of applications, including compression and reformatting. The resulting annotated video stream can then be efficiently compressed, reformatted (e.g., into whole film frames) or otherwise processed in a manner similar to unprocessed video. The state of a video field may also be used in conjunction with xe2x80x9ccadence editingxe2x80x9d to reorder the video sequence to obtain the proper temporal cadence which is disclosed in U.S. patent application (Number not yet available) filed on Aug. 21, 2000 entitled xe2x80x9cCadence Editingxe2x80x9d claiming priority from provisional application entitled xe2x80x9cCadence Editingxe2x80x9d filed on Aug. 20, 1999 having Ser. No. 60/150,016 which is owned by the same assignee and is incorporated by reference herein in its entirety.