Video frames are typically encoded in an interlaced format comprising a first video field (e.g., a top video field) and a second video field (e.g., a bottom video field), each video field having alternating lines of the video frame and each field being temporally separated. Video images are typically encoded and transmitted to a receiver in such an interlaced format as a compromise between bandwidth and video image resolution. Since interlaced video frames are displayed using only half the lines of a full video frame, less system bandwidth is required to process and display these video frames. However, since the human eye typically cannot resolve a single video field, but rather, blends the first field and the second field, the perceived image has the vertical resolution of both fields combined.
Some types of receivers, including computers, televisions, mobile phones, computing tablets, etc., may require the use of de-interlaced video frames instead of interlaced video frames. For such receivers, the video frames encoded in an interlaced format must be de-interlaced prior to display. Typically, any missing pixels from the video frame are interpolated using the pixels of the first video field and the second video field.
There are several well-known methods to construct de-interlaced video frames. One such method is commonly referred to as the “bob” method in which a de-interlaced video frame is constructed from a single video field that is vertically interpolated. Whether to rely on a spatial or a temporal interpolation to interpolate an image data is decided by detecting the motion of a subject in the picture. Specifically, spatial interpolation is used to interpolate image data for pixels that are sensing a subject in motion, and temporal interpolation is used to interpolate image data for pixels that are sensing a motionless subject. In this way, by switching interpolation methods according to the state of motion of the subject being sensed by individual pixels, it is possible to faithfully reproduce the sensed subject in each field of the picture being played back.
Conventionally, such detection of motion is achieved by calculating differences of the image data of identical pixels among even-numbered and odd-numbered fields, and then comparing those differences with a predetermined threshold value. If the differences are greater than the threshold value, the subject being sensed by the pixels in question is recognized to be in motion.
In this way, by comparing the field-to-field differences of the image data of identical pixels with a constant, predefined threshold value, whether the subject being sensed by the pixels for which image data is going to be interpolated is in motion or not is judged. However, as long as such a threshold level is kept constant, for example, in a case where motion was present up to the field immediately previous to the one currently being reproduced but no motion is present any more in the current field, the motion that had been recognized just up to the previous field leads to an erroneous judgment that the motion is still present in the current field. In addition, the predefined threshold value has no relationship between any of the other pixels in the current video field/frame, which can lead to inaccurate results during interpolation.
This makes faithful reproduction of the real image impossible, and sometimes causes flickering or the like while a motion picture is being played back. Therefore, it would be desirable to provide new methods and systems for motion detection in video fields that can use an adaptive threshold value, conserve processing power, and increase system bandwidth.