1. Field of the Invention
The present invention is related to processing image sequences, and in particular, to methods and systems for converting an image sequence intended to be displayed at a first frame rate to an image sequence intended to be displayed at a second frame rate.
2. Background
As is well known, motion film is typically exposed and viewed at 24 film frames per second (fps). By contrast, NTSC video, which applies to television, is typically recorded and played back at 29.97 video fps. The selection of 29.97 fps for video is based on the frequency of electricity in the United States, which is 59.94 Hertz (Hz) or cycles per second. Video typically includes two fields per frame, and therefore, there are typically 59.94 fields per second.
For television, the NTSC color video standard specifies that 525 lines of information are scanned at a rate of 29.97 fps, therefore, each field scans 262.5 horizontal lines. However, typically only approximately 480 lines per frame, or 240 lines per field, are active or illuminated and contain actual picture information. The two fields of a video frame are often referred to as being xe2x80x9cinterlaced.xe2x80x9d The lines of information from the two fields of a respective frame interlace, i.e., alternate, to produce the frame. Thus, one field can contain the odd lines of a frame and the other field can contain the even lines of a frame. The two fields are also respectively referred to as xe2x80x9coddxe2x80x9d and xe2x80x9cevenxe2x80x9d fields. In addition, the NTSC video standard is not always used. Many users use proprietary standards that are similar to the NTSC video standard. For example, where a frame is encoded by only one field, the resulting video sequence can include frames with 240 lines of resolution at 60 frames per second or 240 lines of resolution at 30 frames per second.
It is a common practice in the movie and television industry to convert from the film format to the NTSC video format so that filmed works can be broadcast and displayed on a television set. Clips of filmed work are also often transferred to a video format, such as the NTSC video format, because video formats are convenient to store and view as well. Such a conversion is known as a xe2x80x9ctelecinexe2x80x9d process, which typically converts 24 film fps to 30 video fps video (in addition to the resizing or letterboxing to accommodate the difference in screen aspect ratio).
To convert 24 fps of film to 30 fps of NTSC video, duplicate or repeated fields are inserted o xe2x80x9cpadxe2x80x9d the 24 fps to 30 fps. The first film frame is converted into 2 video fields (1 even field and 1 odd field), the second film frame is converted into 3 video fields (2 even fields and 1 odd field), with two of the video fields being the same, the third film frame is converted into 2 video fields, the fourth film frame is converted into 3 video fields, with two of the video fields being the same, and so on. Thus, the video field to film frame pattern is xe2x80x9c2, 3, 2, 3,xe2x80x9d where an extra video field is inserted for every other film frame. As a result, 4 frames of film convert to 5 corresponding frames of video. This is referred to as a xe2x80x9cthree-two (3:2) pull down.xe2x80x9d To return the 30 fps of video to the original 24 fps of film, a reverse process, termed inverse telecine, is performed, where frames of video convert to 4 corresponding frames of video. Prior methods rely extensively on manual intervention to perform the inverse telecine process.
One significant difficulty encountered in performing inverse telecine is handling edits, slow motion, special effects sequences, or other special cases, wherein the 2, 3, 2, 3 pattern is interrupted. For example, because of an edit or abort during final assembly, the 2, 3, 2, 3 pattern may be interrupted in the middle and restarted as follows 2,3,2,[edit] 2, 3, 2, 3. To correctly return or convert this pattern to the original film pattern, a user locates the pattern break and conventionally resynchronizes the sequence by manually deleting one or more fields. This is a time consuming and expensive process, and in particular, makes difficult the accurate performance of the inverse telecine process on a large number of video clips in a short period of time.
Because of the difficulties encountered in performing the inverse telecine process, the video format is often retained when displaying a clip on a computer. However, the video format can be wasteful because the duplicate frames needlessly occupy bandwidth. Further, the display of duplicate frames causes motion in the clip to transition in a jerky or erratic manner. In addition, where video fields are interlaced, the interlacing of fields based on film frames from different times can produce artifacts, which are visible on a progressively scanned monitor, such as a computer video monitor.
The present invention is generally directed to automated methods and systems for converting image streams having a first frame rate to a second frame rate without the need for user intervention. Embodiments of the present invention obviate the effects of a telecine process, wherein additional frames are added to accomplish the frame rate conversion. In one embodiment, a statistical analysis of the differences between pixels in adjacent frames or groups of frames is performed to detect a telecine pattern, thereby identifying which frames to remove.
In another embodiment, where frames are encoded using both even and odd video fields, a statistical analysis of the differences between adjacent fields detects the telecine pattern, identifies which frames to remove, and identifies frames that are candidates for re-interleaving. The novel process disclosed herein can detect and delete the duplicate frames of the telecine process for video sequences with interlaced or non-interlaced frames, and/or of various resolutions.
Video image streams are frequently converted from a film format to a video format through a process known as a telecine process. Although the telecine process allows a sequence originally taken in film at 24 fps to be stored in a video format at 30 fps and displayed on a television monitor, the process typically results in duplicative frames, jittery motion, and interleaving of disparate frames. By providing a technique to automatically perform an inverse telecine process to substantially return the sequence to the film format, the picture quality improves and the bandwidth needed to transmit the processed sequence is reduced.
The techniques for performing the automated inverse telecine processes can be implemented in a server connected to the Internet or other network. The Internet allows a variety of users to communicate with the server. A user can upload, in real time or from a storage device, a first video sequence to the server. The server processes the uploaded video sequence either substantially in real time or in the background. While processing in real time or after processing in the background, users can download the processed video sequence from the server.
In addition, one embodiment of the present invention automatically detects whether the incoming video sequence is encoded in a single field or in multiple fields by counting the number of lines per frame and comparing the count to a predetermined amount.
Where the frames have been encoded in single fields, i.e., wherein a frame is composed of one field, the process computes comparisons of the adjacent frames in the sequence. The comparison can be made on all the pixels of each frame, or on a portion of the pixels, such as every other pixel, every fourth pixel, or some other interval of pixels. A history of the comparisons is maintained. One embodiment compares both the luminance and the chrominance components of a pixel. Another embodiment compares only the luminance component.
The pixels can be compared in a variety of ways. For example, the computation of the comparison can include summations of the absolute differences between pixels, summations of the squares of differences between pixels, and the like. In one embodiment, the summation is further normalized with respect to the number of pixels per frame compared. One embodiment further saturates the comparison to a predetermined amount such that a relatively large difference between frames, such as may be encountered due to an edit, does not unduly impact later statistical analysis.
In one embodiment of the collection, the collection maintains the most recent comparisons made. When a new frame is received and a new comparison is computed, the results of the new comparison are entered into the collection. In addition, the process can detect the presence of dropped frames in the sequence of frames and fill the collection with default histories or provide another indication, such as a separate collection that maintains an indication of validity. By compensating for dropped frames, the process preserves the ability to detect the telecine pattern despite the presence of the dropped frames.
The process statistically analyzes the entries in the collection to detect the telecine pattern. The entries in the collection are further grouped into at least two groups for the statistical analysis. A first group includes comparisons between frames where the comparisons were made about 5 frame positions apart. A second group includes comparisons of at least a portion of the other frames. The statistical analysis can include computations such as means, variances, and standard deviations. In one embodiment, the statistical analysis of the first group and the second group are compared to predetermined amounts. In another embodiment, the statistical analysis of the first group is compared relative to the statistical analysis of the second group or a combination of relative comparison and comparison to predetermined amounts. Where the comparison of the statistical analysis indicates that the differences in the first group are relatively low, then the telecine pattern is detected.
One embodiment of the present invention can rotatably search for the telecine pattern in the 5 frame positions possible in the 3:2 telecine pattern. Where the telecine pattern is found and the frame of interest is found to conform to the duplicate frame in the telecine pattern, the frame is deleted. Where the telecine pattern is found, but the position of the frame of interest is outside the position of the duplicate frame of the telecine pattern, the frame is not deleted and the process continues to process other frames.
The remaining frames of the sequence are re-aligned as necessary so that the remaining frames are substantially evenly spaced across intervals defined by the film frame rate of 24 frames per second (fps). Such re-alignment can be accomplished by, for example, modifying the timestamps associated with the frames.
In one embodiment, where detection of the telecine pattern fails, progressively smaller and smaller subsets of the collection are analyzed to continue to search for the telecine pattern. For example, in a first iteration, the process can analyze the most recent 20 histories in the collection. Upon a failure to detect a telecine pattern in the 20 histories, the process can proceed to analyze the most recent 15 histories in the collection, and so on.
One embodiment further varies the thresholds used with the statistical analysis to detect the telecine pattern in accordance with the size of the portion of the collection searched. For example, where progressively smaller subsets of the collection are searched, the thresholds can be raised to provide protection against false detection.
One embodiment further includes a fail safe mode to maintain the deletion of frames in the absence of a detected telecine pattern. For example, where a portion of the sequence of frames is in slow motion, or the portion of the sequence of frames corresponds to a relatively static scenery shot, the difference between one frame and its adjacent frame is relatively low and the telecine pattern can be difficult to detect. Where a telecine pattern has been observed in the past, the fail safe mode can remove a frame consistent with the previously observed telecine pattern to continue to convert and return the frame sequence from the video format back to its original film format.
One embodiment further includes detection of redundant frames that were replicated to raise the frame rate from 29.97 fps to 30 fps. These redundant frames are substantially identical to an adjacent frame. In one embodiment, a redundant frame is detected when the process determines that there is no difference between the frame and an adjacent frame. The process can further condition the removal of the detected redundant frame based on a predetermined frame rate and a predetermined interval between removal of redundant frames.
A similar process is used to convert a sequence of frames, where a frame from the sequence of frames is interlaced in multiple video fields. In a typical interlaced video frame, the odd and the even fields of the frame combine, or interlace, to produce the video frame. For example, the even lines of a frame are contributed by an even field and the odd lines of a frame are contributed by an odd field.
Where the frames have been encoded in multiple fields, the process performs comparisons of the adjacent fields in the sequence. Again, the comparison can be made on all the pixels of each frame, or on selected pixels. A history of the comparisons between fields is maintained in a collection. One embodiment identifiably maintains the history of the comparisons of the even fields separate from the history of the comparisons of the odd fields.
The process again statistically analyzes the entries in the collection to detect the telecine pattern. The entries in the collection are further grouped into at least four groups for the statistical analysis. The four groups are separated based on whether the entry in the collection is associated with even fields or odd fields, and whether the entry belongs to a first group or a second group. A telecine pattern, if one exists in the collection, manifests itself about once every 5 frame positions. The first group includes comparisons of fields that are evenly spaced 5 frames apart. The frame position for the first group also varies in accordance to whether the field comparisons are associated with the even fields or the odd fields. In one embodiment, the frame positions of the even and the odd field comparisons are offset by 2 frame positions (in modulo 5 arithmetic).
The statistical analysis described in connection with the single field encoded video frame sequence can be applied to the multiple field encoded video frame sequence. When a frame matches the telecine pattern indicated by the statistical analysis of the fields, the frame is deleted from the sequence and the remaining frames time aligned according to a film frame rate. Where the frame deleted has a duplicate even field, the process invokes an interleaving process to interleave odd fields of frames where appropriate. Likewise, where the frame deleted has a duplicate odd field, the process invokes an interleaving process to interleave even fields of frames as appropriate.
Frames other than the frame with the identified telecine pattern can be inspected for re-interleaving. For example, the frame prior to the frame with the identified telecine pattern may have captured two disparate film frames in its even and odd fields. For example, the even field of the frame is compared with the odd field of the frame, and the even field of the frame is compared with the odd field of an adjacent frame. Where the comparisons indicate more similarity between the even field of the frame and the odd field of the adjacent frame, the odd field of the adjacent frame is substituted to re-interleave the frame. By re-interleaving the fields, the artifacts of viewing two disparate fields on a progressively scanned monitor are eliminated. Moreover, the re-interleaving allows the identified duplicate frame to be removed from the sequence with little or no loss of information.
Again, the process can rotatably search for the telecine pattern in the 5 frame positions possible in the 3:2 telecine pattern. After removal of duplicate frames, the remaining frames of the sequence are re-aligned as necessary so that the remaining frames are substantially evenly spaced across intervals defined by the film frame rate of 24 frames per second (fps). Again, the portion of the collection searched to detect the telecine pattern can be varied to detect the telecine pattern. The comparisons used to detect the telecine pattern can vary with respect to the extent of the history search to desensitize the system against a false detection of the telecine pattern.
The multiple-field inverse telecine process can also include the fail safe mode described in connection with the single-field inverse telecine process. The fail safe mode allows the inverse telecine process to continue to convert the sequence of video frames even where the telecine pattern is difficult to detect. Again, the multiple-field inverse telecine process can optionally include detection and removal of the redundant frames that are the result of a conversion from a 29.97 fps frame rate to a 30 fps frame rate that is found on some video sequences.
The automated inverse telecine process may be performed on video uploaded to a Web site server by users. Once a user uploads the video, an inverse telecine module executing in the server deletes the pulldown fields and produces appropriate de-interlaced frames. These frames may then be downloaded or streamed over a network, such as the Internet, to networked terminals, such as progressively scanned monitors, for viewing.