In the NTSC standard, the picture frequency is 30 interlaced frames per second. However, for movies (that are inherently progressive), the progressive frames are produced at a frame rate of 24 Hz. Displaying a sequence of film-type images (originally at 24 frames) on television, at NTSC rates of 60 video fields per second, therefore requires a conversion process called “3:2 pull-down”. This technique, described for instance in the international patent application W0 97/39577, consists of creating five interlaced frames (which can be therefore visualized on television) based on four original sequential (or: progressive) film frames. This is obtained by dividing each of these four sequential frames by two, so as to form four odd and four even fields, and by duplicating two of these eight fields.
As illustrated in FIG. 1, which shows an original film sequence at 24 Hz on the first line and illustrates on the second line how to organize the field sequencing of a video sequence at 30 Hz corresponding to said film sequence, it means that an additional field is inserted for each pair of film frames, for instance by splitting one film frame out of two into three fields, the other one being split as usually into two fields. In the case of the frame split into three fields (for instance, G1G2 split into F1, F2, F3, or G5G6 split into F6, F7, F third one is obtained by copying the odd (F1) or the even field (F6) alternately, in order to keep the sequencing “odd/even”. The result is the following (conditions CD1):                F1=F3=G1        F2=G2        F4=G4        F5=G3        F6=F8=G6        F7=G5        F9=G7        F10=G8, and so on.These two additional fields obtained by duplication constitute a redundant information. When encoding such sequences according to the MPEG-2 standard, it is interesting to detect said information: the suppression of these repeated fields frees some space to better encode the other ones, the concerned MPEG-2 encoder thus receiving both video-type image sequences at 30 Hz and original film-type image sequences at 24 Hz.        
An usual criterion to detect automatically sequences coming from movies (film-type image sequences) is therefore the following: a structure of five frames—i.e. of ten fields—is analyzed by means of a subtraction of consecutive fields of the same parity. The conditions to detect the 3:2 pull-down structure (i.e. to detect in any group of ten successive fields the specific film pattern formed by the two duplicated fields) are the following (conditions CD2):                F1=F3        F2≠F4        F3≠F5        F4≠F6        F5≠F7        F6=F8        F7≠F9        F8≠F10,as depicted in the sequence of FIG. 2 showing how fields are sequenced for the film mode format and illustrating the set of tests (identical ? or not ?) to be carried out for the detection of a 3:2 pull-down structure (f1, f2, . . . designate the successive frames, 1o-1e, 1o-2e, 2o-3e, . . . the corresponding pairs of fields, y the reply “yes” to the test of comparison, i.e. fields equal, and n the reply “no”, i.e. fields different). If all these conditions are satisfied, then the inverse 3:2 pull-down conversion (suppression of the two repeated fields) is performed on the corresponding group of five frames; on the contrary, if one of these conditions is not valid, the encoder goes back to the video mode (no elimination of the repeated fields).        
However, due to the possible presence of noise on the original 3:2 pull-down sequence, the equality criterion between two fields (F1, F3 and F6, F8) may be not strictly verified. Two fields of the same parity F(N) and F(N+2) are considered. If NTOT designates the total number of pixels in a field (172800 for a full resolution), val (F(N)) designates the luminance value for a given pixel, N1 is the number of picture elements (pixels) such as ABS[val(F(N))−val (F(N+2))]>THRES1, Nm is the number of pixels such as ABS [val(F(N))−val (F(N+2))]<THRES2, N2 is the number of pixels such as N2=NTOT−Nm, and THRES1, THRES2 are predetermined thresholds, then the following test, the values of Ratio 1 and Ratio 2 being previously chosen, is carried out:                IF ((N1<Ratio 1) and (N2<Ratio 2)) THEN: F(N)=F(N+2)        ELSE: F(N)≠F(N+2)The first criterion (N1<Ratio 1) may be called “the dissimilarity criterion” and involves the number of pixels where the field-to-field pixel difference is large, while the second one (N2<Ratio2) may be called “the likeness criterion” and involves the number of pixels where the field-to-field pixel difference is small.        
Troubles within the film mode detection step may consequently occur, mostly in the case of the two following contrasted situations. For static or quasi-static sequences, the dissimilarity criterion is no longer verified, since the fields are nearly all equal. Said criterion is therefore suppressed, the residual conditions needed to be fulfilled being then only F1=F3 and F6=F8 (conditions CD2). But, for a very noisy sequence, with which two identical fields may however seem unlike, the threshold setting the likeness criterion cannot be too increased, otherwise fields that are different could be considered as identical.
The european patent application, previously filed by the applicant under the number 99403228.2 (PHF99621), describes an encoding method (and also a corresponding encoding device) including a film mode detection step with which the above-indicated drawback is now avoided. According to said document, when dealing with noisy images, the criterion for detecting automatically sequences coming from movies is modified on the basis of the following remark. By looking at the N2 statistics, the applicant has noticed that N2 for fields F1 and F3 (referenced N2[1,3]) and N2 for fields F6 and F8 (referenced N2[6,8]) are small compared to the others (more generally, N2[i,j] stands for statistics of N2 calculated for Fj-Fi). Then, by computing the difference between two consecutive N2 statistics, for instance: N2[6,8]−N2[5,7], and comparing—in the form of a percentage—said difference to a predetermined threshold (according to an expression of the following form: N2[5,7]−N2[6,8]×100/NTOT for example), a large value of percentage is obtained every five computations. Therefore, if the computed percentage is less than X %, with for instance X=30%, both fields (of the last considered pair of fields) are considered as equal, and the inverse 3:2 pull-down processing is carried out for the next five frames.
An encoding device in which this preprocessing operation is included is described with reference to FIG. 3 and comprises means 31 for encoding input signals corresponding to input sequences either coming from movies or of video type, means for detecting in the input signals of the encoding device a sequence of film type, and means 33 for switching, only when such a detection has occurred, from a first to a second mode of operation of the encoding means 31 (encoding means 31 are located downstream of said switching means).
The encoding means 31 comprise in series a suppressing stage 311 and a coding stage 312, for instance an MPEG-2 coder. The detecting means consists of a detecting stage 32, illustrated in a more detailed manner in FIG. 4 and comprising first a set of subtractors 41.1, 41.2, 41.3, . . . , provided for receiving each one two successive fields of the same parity and determining per pixel the difference between these fields. These subtractors are followed by a set of circuits 42.1, 42.2, 42.3, . . . provided for taking the absolute value of said difference; this value is stored in a memory, 43.1, 43.2, 43.3, . . . , respectively. The successive differences between the successive values of these stored absolute values are then computed in subtractors 44.1, 44.2, 44.3, . . . , and these differences, for instance multiplied by 100/NTOT as indicated above, are compared to the predefined threshold (comparison tests C1). If the fields can be considered as equal (results “TRUE” of the tests: F1=F3, and F6=F8), the conditions previously called CD2 are satisfied, and the inverse 3:2 pull-down processing is performed on the next group of five frames, in the suppressing stage 311. In the other cases (i.e. if one of the conditions CD2 is no more valid, which corresponds to a result “FALSE” of the tests), the switching means 33 are in the opposite position, and the stage 311 is de-activated: the encoding stage goes back to the video mode (no elimination of the repeated fields: the input of the encoding device is directly connected to the input of the coding stage 312).
In the video sequences now currently handled, one or several objects coded according to the film mode may however be present, and it may then be necessary to detect these objects. As each object can be of any shape and have any random position within the considered successive images, it becomes no longer possible to use the previously described solution (i.e. a detection that is done using some pixel statistics from the complete image), since the film mode coded object(s) size and position within any image are unknown.