Conventionally, when a cathode ray tube is used as a display element, an odd/even interlace scanning method has been used to save the bandwidth and render at high speed.
Meanwhile, in recent years, many types of display elements have been put into use, and a progressive (sequential) scanning method is widely adopted, irrespective of the display device types such as liquid crystal display, plasma display and rear projection.
The progressive (sequential) scanning method is a method to form a screen by one-time scanning without dividing into two rows, like the odd-even interlace scanning method in which each screen is divided into two rows: even rows and odd rows. Basically, outputs to a computer monitor are performed using the progressive scanning method.
Therefore, by the progressive scanning method, in order to display an interlace-scanned video signal according to the odd/even interlace scanning method on the display devices such as liquid crystal display, plasma display and rear projection, it is essential to perform IP (Interlace to Progressive) conversion to convert from the interlace scanning method to the progressive scanning method.
Corresponding thereto, today, a digital television receiver performs display processing of an image by converting from a received interlace-scanned video signal to a progressive video signal through the IP conversion.
In the above IP conversion processing, there is performed the compensation of an interlaced image in which a half of the information is omitted in each field. Therefore, if the compensation processing is simply performed, the number of frames becomes half. A variety of methods have been devised for the above compensation.
Depending on the propriety of such the IP conversion processing and the difference in the method for harmonizing with other image control techniques, different image quality may be produced.
In the meantime, with the progress of an image edition technique, a variety of synthesized images have come to broadcast. For example, in one screen of a broadcast image conforming to the NTSC system (or the PAL system), there has come to broadcast synthesized images including the display of date, time and characters, or synthesized images including movie subtitle, CG (computer graphics), video camera image, etc. inserted therein.
The original image data of each image portion forming such the synthesized image may be a 60 Hz broadcast data originally conforming to the NTSC system, a CG data generated by a 22 pull-down method, or an image data generated by a 23 pull-down method.
Namely, one synthesized image may include a video camera image imaged by the interlace method with a 60 Hz frame frequency, which is synthesized with a CG image generated by the progressive method with a 30 Hz frame frequency. Another synthesized image may include a progressive CM (commercial film) image of a frame frequency of 30 Hz, which is synthesized with a character telop generated by the interlace method of a 60 Hz frame frequency.
As such, a display screen image may have an interlace-scanned synthesized image generated through the synthesis and superposition with images of different frame frequencies, including date, time, character display, movie subtitle, CG (computer graphics) by the 22 pull-down method, and video camera image by the 23 pull-down method, which are window displayed at local positions.
When processing IP conversion of the interlaced image generated through the synthesis and superposition of the image having the different frame frequency, film mode detection is carried out on the synthesized image. Based on the detection result, the IP conversion is performed to convert to an image data of a sequential scanning method.
Here, the film mode signifies a mode of an image data being in a state that a film source such as movie is converted to obtain an interlaced image (a telecine processed state). Also, the film mode detection signifies the processing to detect the conversion method to a broadcast data in the above telecine processing, such as the 22 pull-down sequence and the 23 pull-down sequence, or neither thereof.
Now, the 22 pull-down image data and the 23 pull-down image data will be explained in brief.
FIG. 1A is a diagram schematically illustrating a data array of a 22 pull-down image data being converted from a 30 Hz progressive video signal, which is imaged by a digital video camera, to a 60 Hz interlaced video signal, as an example.
FIG. 1B is a diagram schematically illustrating a data array of a 23 pull-down image data, being converted from a 24 Hz film progressive video signal of a movie data to a 60 Hz interlaced video signal.
As shown in FIG. 1A, in the 22 pull-down sequence, two fields (f1t, f1b), (f2t, f2b) . . . of an interlaced video signal are generated from each one frame F1, F2 . . . of a progressive video signal, and such the conversion procedure is repeated.
Meanwhile, as shown in FIG. 1B, in the 23 pull-down sequence 3, three fields (f11, f12, f13) of an interlaced video signal are generated from a first frame F1 of a progressive video signal in a cinema image frame. Next, two fields (f21, f22) of the interlaced video signal are generated from a second frame F2. The above conversion procedures to obtain 3 fields and 2 fields are successively repeated for the progressive video signal frames in a frame F3 and thereafter.
Here, in the 23 pull-down sequence shown in FIG. 1B, the field (f13) generated in the third and the field (f33) generated in the eighth have identical data to the field data (f11) generated in the first and the field data generated in the sixth, respectively. Namely, the fields (f13), (f33) are repeated fields.
The above repeated fields (f13), (f33) are respectively inserted between the fields (f12) and (f21), and between the fields (f32) and (f41) which are mutually distant in the time axis. As a result, when looked as one screen, a “combing noise” phenomenon is produced in the image between the repeated portion and the non-repeated portion. Further, the combing noise phenomenon is also produced in the synthesis boundary portion of the synthesized image having different frequencies.
To obtain a 60 Hz high-quality progressive image through the IP conversion after correcting the above phenomenon by interpolation etc., it is necessary to confirm whether the interlaced video signal before the IP conversion is a synthesized image. For each synthesized image portion in case of a synthesized image, or for the entire image in case of a non-synthesized image, it is further necessary to confirm whether the above image portion, or the entire image, is a 22 pull-down image data or a 23 pull-down image data. For the above confirmation, the aforementioned film mode detection is performed.
As such, if only the conversion method in the original image is known, according to the conversion method concerned, it has been considered that a high quality image can be obtained by performing IP conversion on the basis of each synthesized image, using either the transform IP conversion or the motion-compensated IP conversion.
For the above purpose, a variety of techniques have been proposed as conventional techniques to perform the film mode detection.
As such the conventional techniques, FIGS. 2A and 2B are explanation diagrams illustrating first and second exemplary configurations of a typical synthesized image detection unit having the film mode detection function.
In the conventional configuration shown in FIG. 2A, there are provided two field memories 10, 11, a film mode detection function section 100, a transform IP converter 15, a motion-compensated IP converter 16, and a synthesizer 14.
In order to perform the film mode detection (22/23 pull-down sequence detection) using an interframe difference etc., field signals F(n), F(n−1) and F(n−2) for consecutive three fields are output, using the two field memories: first field memory 10 and second field memory 11.
The above field signals F(n), F(n−1) and F(n−2) for the three fields are input to a feature amount extractor 12 on a screen-by-screen basis, constituting film mode detection function section 100.
Meanwhile, the above field signals F(n), F(n−1) and F(n−2) for the consecutive three fields are also input to transform IP converter 15 and motion-compensated IP converter 16.
Feature amount extractor 12 inputs the above field signals F(n), F(n−1) and F(n−2) for the consecutive three fields. A feature amount of one screen is detected by the above screen feature amount extractor 12 constituting film mode detection function section 100.
A screen film mode detector 13 inputs the detected feature amount for each screen from feature amount extractor 12. Then, screen film mode detector 13 retains the screen feature amounts detected in feature amount extractor 12 over a plurality of fields in the past, and detects the film mode from the overall motion result.
Based on the film mode detection result, a signal after the IP conversion from either transform IP converter 15 or motion-compensated IP converter 16 is made effective.
The invention related to the conventional technique shown in FIG. 2A is disclosed in Patent document 1. In short, according to the invention described in Patent document 1, a feature for the overall image is extracted, and based on the overall motion result obtained from the above result, the film mode is decided.
Meanwhile, according to the conventional configuration shown in FIG. 2B, there are shown two field memories 10, 11 and a film mode detection function section 100 only, while transform IP converter 15, motion-compensated IP converter 16 and synthesizer 14 shown in FIG. 2A are omitted in the figure.
From the two field memories 10, 11, image signals for three fields, namely a present field, a field before one field, and a field before the two fields are successively input, and local areas in the screen are successively selected by a local area selector 14. The local areas here signify respective pixel areas when the screen is sectioned into a plurality, m×n, of block areas.
The signals in the local areas successively selected by local area selector 14 are input into feature amount extractor 15.
In feature amount extractor 15 for the local areas, the feature of the related local area is extracted, which is then forwarded to a feature amount distributor 16. Feature amount distributor 16 forwards the feature amount extracted in feature amount extractor 15 to the corresponding film mode detector in film mode detectors 17a-17n, each corresponding to each of the plurality of local areas. There, the detection whether or not the film mode is made for each local area. As a conventional technique related to such the conventional technique as shown in FIG. 2B, there is an invention disclosed in Patent document 2. According to the invention of Patent document 2, the screen area is divided in advance into a plurality of local areas having no relation with the synthesized image area, and the film mode detection is carried out for each divided area.
Furthermore, there is an invention described in Patent document 3. According to the invention, a field image is divided into a plurality of blocks, as shown in FIG. 2B. Then, in regard to each divided block, a motion vector having the highest reliability between two consecutive field images having an identical property (odd or even field) is detected, so as to perform motion compensation. At the time of the motion vector detection, using the detected motion vector and the reliability information, a repeated field image included in the video signal is detected, and the film mode is decided accordingly.    [Patent document 1] Japanese Unexamined Patent Publication No. 2005-318624.    [Patent document 2] Japanese Unexamined Patent Publication No. 2005-318611.    [Patent document 3] Japanese Unexamined Patent Publication No. 2006-303910.
Here, by the decision of the film mode for each screen according to the invention described in Patent document 1, it is possible to detect the film mode from an image edited for each screen. However, in case of a synthesized image, for example, when a CG area and a CM film area are relatively large, or when the feature thereof is intense, detection of a 30 Hz film image is made, and the IP conversion for film is performed.
As a result, there arises the problem of the occurrence of a combing noise, in which a dithered image like a residual image is produced in a 60 Hz video camera image and a character telop, causing an image deviating line-by-line in a comb shape.
Also, by the film mode decision on the basis of each local area according to the inventions described in the aforementioned Patent document 2 and Patent document 3, because of deciding the film mode for each local area, it is possible to obtain the film mode detection optimal to each local area, and IP conversion.
However, it is necessary to provide a plurality of film mode detection function sections corresponding to respective local areas, and accordingly, there is the problem that the circuit scale becomes relatively large.