1. Technical Field of the Invention
The present invention relates to detecting the cadence of a sequence of images.
2. Description of Related Art
The detection of the cadence of a sequence of images is based on a search for a cadence pattern, for example “10010”, in a sequence of bits representative of the motion between one field and another. Cadence is therefore understood to mean a successive repetition of at least one cadence pattern.
Cadence detection may lead to determining the source format of an image sequence, or detecting the absence of motion (still pictures). Several source formats exist. For example, a video camera may capture 50 or 60 frames per second. In film format, filmed images are captured at a rate of 24 or 25 frames per second. The number of frames per second may be even smaller, for example about 8 frames per second for certain Japanese animation.
There are also multiple display formats. The PAL standard (Phase Alternating Line), primarily used in Europe, specifies the display of 50 fields per second. The NTSC format (National Television Standards Committee), primarily used in the United States, specifies the display of 60 fields per second.
The standards commonly used in television specify encoding the source frames into successive interlaced fields (half a frame), where fields containing the even lines of pixels in a given frame for display are interlaced, or in other words they alternate, with fields containing only the odd lines of pixels in a next frame for display.
More generally, an image is subdivided into one field, two fields, or even more fields, depending on the scanning mode. In this document, the term “field” therefore covers a complete image, half of an image, and even smaller fractions of an image.
Thus, when a sequence of video frames at 50 frames per second is encoded in the PAL standard, each frame is reduced to a field of half a frame, of alternating parity. In another example, when a sequence of frames in film format at 25 frames per second is encoded in the PAL format at 50 Hz, each film frame is subdivided into two interlaced fields. In another example, when a sequence of frames in film format at 24 frames per second is encoded in the NTSC format at 60 fields per second, each sequence of four consecutive film frames is converted into a sequence of ten fields of half a frame. In these ten fields, the first three originate, for example, from the same film frame, the next two fields originate from a second film frame, etc. Thus, two of the first three fields are identical. Such a conversion is called a 3:2 pulldown.
Other types of conversion also exist. There is the 2:2 pulldown which converts a film format of 24 or 25 frames per second to the PAL format at 50 Hz, the 2:3 pulldown which converts a 24 frames per second format to an NTSC format, the 3:2:3:2:2 pulldown when a television station eliminates one field out of twelve in a sequence originating from film frames, 2:2:2:4 and 2:3:3:2 conversions for frames captured in a DVCAM format, 5:5, 6:4 or 8:7 conversions for frames of animated cartoons, etc.
Cadence detection is based on comparisons of pixels belonging to successive fields of index n, performed in order to determine the existence of motion between one field and another. A conversion typically leads to abrupt variations in motion. For example, in a 3:2 pulldown, three fields n−3, n−2, n−1, originate from the same film frame, and essentially no motion is detected between these fields. The next two fields n, n+1 originate from another film frame. Relatively substantial motion may be detected between the third field n−1 and the fourth field n, while the motion between the fourth field n and the fifth field n+1 is essentially zero. By analyzing a sequence of bits representative of the motion, called a motion sequence, determined by comparisons of pixels in a field sequence, one may identify a repeating pattern and thus detect that a conversion has been performed. Cadence detection is therefore based on a motion sequence analysis.
For a sequence of fields of half a frame, these comparisons between pixels may, for example, involve calculations of the median. The comparisons may be made between pixels in two successive fields, n−1, n, normally of opposite parity. One may also compare the pixels of a field n with pixels in the previous field of the same parity n−2, etc.
For a sequence of fields which are complete frames, each pixel is present from one field to the next and pixel comparisons are relatively simple.
Cadence detection may be used in a variety of applications. For example, a cadence detector may be coupled with a deinterlacing device. A deinterlacing device is used to reconstruct the corresponding image from a field of half an image. Encoding into interlaced format reduces the amount of information to be sent by a factor of two. This decrease occurs to the detriment of the image quality, and is even more evident when animated images are concerned.
For video, the frame represented by a first field n−1 is not quite the same as the one represented by the next field n, because they are separated by an interval of time and the objects represented are in motion. Therefore, a frame is not reconstructed from a sequence of images in interlaced video format by simply overlaying two successive fields. Deinterlacing algorithms must be applied, such as spatial interpolation or temporal interpolation with motion compensation.
In film format with 25 frames per second, each film frame is subdivided into two fields when encoding to the interlaced format of 50 fields per second. The reconstruction of a frame sequence from a sequence of interlaced fields may then be done simply by merging two successive fields initially corresponding to the same film frame.
It may therefore be of interest, before applying a deinterlacing algorithm, to detect whether a sequence of interlaced fields originates from a film format. If such is the case, the sequence of film frames may be identically reconstructed simply by merging the fields.
In addition, cadence detection allows for avoiding the application of complex deinterlacing algorithms to a sequence of interlaced fields originating from a film format. In the case described above of a field sequence resulting from a 3:2 pulldown, the motion between the third field n−1 and the fourth field n corresponds to the motion between two frames separated by 1/24th of a second, i.e. a relatively long interval of time. The motion between the fourth field n and the fifth field n+1 should essentially be zero, because these fields come from the same frame. Considering these abrupt variations in motion from one field to another, a deinterlacing algorithm with motion compensation could result in artifacts during the reconstruction.
More generally, detecting the cadence of a sequence of images may permit a simpler deinterlacing of higher quality.
In another example, a cadence detector may be used for compression. For example, if a field sequence at 60 Hz results from a 3:2 pulldown, each sequence of five fields contains the same field twice. In other words, one field out of five may be removed without losing any information. A flag may be set to signal such a removal. In another example, if no motion is detected in several successive fields, all these successive fields may be eliminated except two fields of opposite parity without losing any information. Analysis of the motion sequence may thus contribute to a relatively efficient compression.
However, a displayed image may be created from several combined sources. This is the case when subtitles are overlaid onto a sequence of film frames, or when an image is partitioned in order to highlight specific areas, for example variations in stock prices or graphs. The fields of a given sequence may therefore comprise zones emanating from different sources, for example a film zone which has undergone a 3:2 pulldown and a video zone directly captured at 60 frames per second.
In addition, certain compression algorithms apply encoding such that a 2:2 conversion may be locally introduced. For example, the DV (Digital Video) compression algorithm may encode certain areas on the basis of corresponding parts of fields of half a frame, while other areas are encoded on the basis of corresponding parts of frames.
To perform cadence detection in such combinations, it is known to break up the fields into blocks, and to look for motion in each block in order to perform the cadence detection locally. For each block, pixels in a current field are compared with pixels in a previous field, and possibly in a next field. These comparisons result in determining for each pixel a pixel motion phase value representative of the motion for the pixel. Then, for each block, the pixel motion phase values for the pixels in the block are used to decide on a block motion phase value for the block. By storing the block motion phase values for the block from one field to the next, a motion history for the block is maintained. Searching for a pattern in this history may result in detecting a conversion. For each block, depending on the application desired, parameters may be sent to a processing device such as a deinterlacing device, or to a means of compression.
The blocks may, for example, have a size of 16×16 pixels in a displayed image. Thus, a screen of 720×576 pixels corresponds to 1620 blocks. For each field, 1620 transmissions of parameters therefore occur.
Patent application WO 02/056597, the disclosure of which is hereby incorporated by reference, describes a method in which objects are identified in multiple images. An object may be defined in that the pixels in this object move in these images according to the same motion model. Cadence detection is performed and a decision is made for each object identified.