This invention relates to the field of image processing, more particularly to the detection of various video formats and image scaling, most particularly to the detection of an image aspect ratio and the presence of subtitles and the scaling of the detected image to optimally fit a video screen.
Modern televisions are available with a wide screen 16:9 aspect ratio. The aspect ratio of a display is the ratio of display width to display height. The wide screen is capable of displaying more image content than the traditional 4:3 display, and has long been used by motion picture producers and theaters. Since the wide screen televisions are relatively new, however, most of the existing pre-recorded content is intended for viewing on a traditional television having a 4:3 aspect ratio and has been adapted from the original wide screen format to the traditional 4:3 format.
Several means are available to adapt a motion pictures to the 4:3 television format. One alternative is to simply crop the edges of the image to yield a 4:3 image. This method loses much of the artistic content embodied in the motion picture, and sometimes even crops some or all of the characters from certain scenes. A second alternative, which is very common, is to letterbox the images.
Letterboxing occurs when the wide screen image is scaled down to fit the width of the 4:3 display screen. Scaling the image, however, results in an image that is not tall enough to fill the 4:3 display screen. Dark video lines are added above and below the scaled image to fill the display screen. Unfortunately, no standard defines the letterbox size or position. Thus, a letterboxed video source may have an image that uses any number of horizontal lines and is located anywhere within the display region.
The lack of a letterbox standard does not create a problem until the letterboxed image is displayed on a wide screen display. Simply displaying the video image without any video processing yields a small 16:9 image within a large 16:9 display and is a poor utilization of the capabilities of a wide screen display. Many high-end 16:9 televisions offer multiple display modes such as regular, panorama, cinema, full, etc. which apply various scaling ratios to the input video signal. These modes attempt to enable the viewer to optimize the image scaling of a particular video source to the display. But given the variations between source materials in the absence of a letterbox standard, often none of the various modes are ideal. The closest mode typically leaves some black borders, crops off some of the picture or subtitles, or a combination of these. Additionally, some video sources mix letterboxed and non-letterboxed images. For example broadcasts of letterboxed motion pictures include non-letterboxed commercials.
Given the drawbacks of the present display modes, an image processing system and method are needed to automatically match the image processing performed on a video signal to the aspect ratio of the display device.
Objects and advantages will be obvious, and will in part appear hereinafter and will be accomplished by the present invention which provides a method and system for processing video signal. The method and system provide for automatically detecting letterboxing in an input video signal, and scaling a desired portion of the video signal to match a given display device, as well as detecting subtitles in the input video signal and selectively including the subtitles in the desired portion of the video signal.
According to one embodiment of the disclosed invention, a method of processing a video image is disclosed. The method comprising the steps of receiving video image data, calculating image data statistics for each line of the video image, locating at least one desired portion of the video image, scaling the desired portion of the video image for display on a display device having a pre-determined aspect ratio.
According to one embodiment, at least one image data statistic selected from the group consisting of mean, variance, edge strength, and entropy is calculated for each line for the video image. The image data statistic is compared to a threshold, and lines exceeding the threshold are part of the desired image portion. When more than one image data statistic is computed, the line is part of the desired portion when all of the statistics exceed the threshold.
Alternate embodiments of the disclosed invention selectively include subtitles in the desired portion of the video image, typically depending on the language of the subtitles and the preferences of the viewer. The language of the subtitles is detected by calculating at least one image data statistic selected from the group consisting of mean, variance, edge strength, and entropy is calculated for each line for the video image.
Another embodiment of the disclosed invention provides a display system. The display system comprises a signal processor and a display device. The signal processor receives an input video signal having a first aspect ratio, detects a desired portion of the input video signal, and scales the desired portion to generate an output video signal having a second aspect ratio. The display device receives the output video signal and generates an image.
According to one embodiment of the disclosed display system, the signal processor calculates one or more image data statistics for the input video image. The statistics, such as variance, mean, entropy, and edge strength aid in locating the desired portion of the video image, typically by comparing the statistics on a line-by-line basis to a set of thresholds, one threshold for each statistic. The image data statistics are also used to detect the subtitles and the language of the subtitles. Depending on the preferences of the viewer, one or more languages of subtitles are included in the output video image.