1. Field of the Invention
This invention relates to the field of video compensation. More specifically the invention relates to detecting and correcting motion artifacts in video source signals.
2. Background Art
In North America the video displayed across a normal television screen is an interlaced video signal, which is a standard called NTSC (National Television Standards Committee) video. This is not the same video displayed across most computer screens since computer screens use mostly non-interlaced display devices.
Interlaced video simply means that for each picture frame displayed on the television screen, there are two video fields being displayed one after the other. The first field is commonly known as the odd field, and the second field as the even field. Since the interlaced video frame is displayed at 30 frames (i.e. 60 fields) every second, the odd field is displayed in the first one sixtieth ({fraction (1/60)}) of a second while the even field is displayed in the second one sixtieth of a second.
Each display monitor comprises a series of horizontal and vertical lines. For example, the resolution of an NTSC television monitor is approximately 858 horizontal counts by 525 vertical lines. Actual resolution excluding blanking lines is 720 by 480. In a television display, the odd field of the interlaced video signal is displayed on the odd numbered (i.e. 1, 3, 5, . . . ) horizontal lines of the monitor and the even field is displayed on the even numbered (i.e. 0, 2, 4, 6, . . . ) horizontal lines. Thus, at brief instances of time, alternating lines of the television screen do not have any video display (i.e. are blank). However, because the display rate is faster than can be perceived by the human eye, a viewer is not able to discern the blanked lines.
Video is a linear medium like audio, unlike photography or film. A film camera captures the entire frame of a picture in a single instant. But video was originally designed to be transmitted over the air. Video images must be broken up and transmitted or recorded as a series of lines, one after the other. At any given millisecond, the video image is actually just a dot speeding across the face of the monitor.
One problem with NTSC is that it is an analog system. In non-analog systems such as computer video, numbers represent colors and brightness. But with analog television, the signal is just voltages, and voltages are affected by wire length, connectors, heat, cold, videotape, and other conditions. Digital data does not have such problematic characteristics. Thus, it would be advantageous to store or transmit video signals in a digital format.
Interlaced NTSC video must be converted to non-interlaced (i.e. progressive) video for display on devices such as computer screens. The conversion is generally performed in the digital domain therefore, the NTSC video signal must first be converted from analog to digital and then the odd and even fields are combined into one complete non-interlaced video frame such that the complete frame is displayed in one scan of the video signal.
Analog video inputs may be available in any of the different color models such as the C-Video, S-Video, or YUV (or YIQ). A color model (also color space) facilitates the specification of colors in some standard, generally accepted way (e.g., RGB). In essence, a color model is specification of a 3-Dimensional coordinate system and a subspace within that system where each color is represented by a single point.
The C-Video or Composite Video is a type of video signal in which all information—the red, blue, and green signals (and sometimes audio signals as well)—are mixed together. This is the type of signal used by televisions in the United States. The S-Video, short for Super-Video, is a technology for transmitting video signals over a cable by dividing the video information into two separate signals: one for color (chrominance), and the other for brightness (luminance). When sent to a television, this produces sharper images than composite video, where the video information is transmitted as a single signal over one wire. This is because televisions are designed to display separate Luminance (Y) and Chrominance (C) signals. The terms Y/C video and S-Video are used interchangeably.
The YUV or YIQ Color model is used in commercial color TV broadcasting. The Y generally stands for intensity (luminance, brightness) and thus provides all the information required by a monochrome television. The other two components carry the color (chrominance) information. Each model component may be represented in various bit depths. For example, the brightness component may range from 1-bit (black and white), to over 8-bit (usual, representing 256 values of gray) to 10- or 12-bit. Note that brightness, luminance, and intensity are used interchangeably in this specification.
Whatever the color model of the input, the incoming video signal may need to be converted to progressive video for display on non-interlaced devices. Video signals originate from various sources. For example, a video material may have originated from a film source, or may have been recorded using an interlaced video camera. In recent years there has been a proliferation of film material being converted to NTSC video for display on regular television. For example, movies stored on videotape usually originated from a film counterpart. Film data is shot at twenty-four frames a second (24 frames/sec) while NTSC data is at 30 frames a second (i.e. 60 fields/second) therefore the film data must be scaled in frequency from 24 frames/second to the NTSC rate of 30 frames/second (i.e. 60 fields/sec). To achieve this, a method called 3-2 pulldown is employed. Thus, 3-2 pulldown is a method for transferring film material that is at 24 frames per second to NTSC video at 30 frames per second. That is, 24 film frames in 30 video frames requires that four film frames be converted to five video frames (i.e. 24 to 30 every second).
FIG. 1 is an illustration of the mechanics of 3-2 pulldown. In this illustration, row 100 contains film frames f1-f7 that are mapped into row 106 comprising interlaced video frames v1-v8. Each interlaced video frame comprises an odd and an even field shown in row 104. For example, interlaced video frame v1 comprises interlaced video fields 1o and 1e, interlaced video frame v2 comprises interlaced video fields 2o and 2e, and so on for all the video frames up to v8. Row 102 represents the field frame numbers that are mapped into the respective video fields. As shown in row 102, film frame 1 (i.e. f1) is mapped into video fields 1o, 1e, and 2o; film frame 2 (i.e. f2) is mapped into video fields 2e and 3o; film frame 3 (i.e. f3) is mapped into video fields 3e, 4o, and 4e; film frame 4 (i.e. f4) is mapped into video fields 5o and 5e. This process continues whereby one film frame is mapped into three video fields, followed by the second film frame being mapped into the next two video frames. This three-two cycle repeats itself hence the process known as 3-2 pulldown.
Further, in this illustration of the 3-2 pulldown phenomenon, film frames f1-f4 are mapped into video frames v1-v5. Film frames f1-f4 and video frames v1-v5 must occur in the same ⅙th of a second to preserve the length of the material being converted. As shown, film frame f1 is mapped into the odd and even fields of video frame v1 and into the odd field of video frame v2, and film frame f2 is mapped into the even field of video frame v2 and into the odd field of video frame v3. This results in video frame v2 having film frame f1 in its odd field and film frame f2 in its even field, and video frame v3 having film frame f2 in its odd field and film frame f3 in its even field. Thus video frames v2 and v3 are composed of mixed film frames. The phenomenon known as field motion, illustrated by a “Yes” in row 108, occurs in video frames with mixed film frames.
When viewed on an NTSC television, the video generated from the 3-2 pulldown is visually tolerable to the viewer because television displays a single field at a time hence the video appears continuous. However, if the NTSC data originating from film source is subsequently converted to progressive video for display on a computer display, for example, a problem known as “field motion” may occur. Field motion occurs because each progressive video frame is displayed one at a time.
One method of generating progressive video material is to combine the odd and even fields of an interlaced video material to generate a frame of the progressive video material. Using a progressive material generated from film material, for example, progressive video frame v1 comprises film frame f1 in its odd and even lines. Progressive video frame v2 comprises film frame f1 in its odd lines and film frame f2 in its even lines. If film frames f1 and f2 are shot at different times and if an object has moved during that time, the object may be at different locations on film frames f1 and f2. Now, if the progressive video frame v2 is viewed in still frame, the object will be distorted. This distortion is what is known as “field motion”. The distortion becomes more pronounced as the video material is scaled-up to fit higher resolution display devices.
Video Scaling
Video scalers are employed to change the size of an original video signal to fit a desired video output device. A scaler changes the size of an image without changing its shape, for instance, when the image size does not fit the display device. Therefore, the main benefit of a scaler is its ability to change its output rate to match the abilities of a display device. This is especially advantageous in the case of digital display devices because digital display devices produce images on a fixed matrix and in order for a digital display device to provide optimal light output, the entire matrix should be used.
Since a scaler can scale the output both horizontally and vertically, it can change the “aspect ratio” of an image. Aspect ratios are the relationship of the horizontal dimension to the vertical dimension of a rectangle. Thus, when included as part of a graphics switch, a scaler can adjust horizontal and vertical size and positioning, for a variety of video inputs. For example, in viewing screens, the aspect ratio for standard TV is 4:3, or 1.33:1; HDTV is 16:9, or 1.78:1. Sometimes the “:1” is implicit making TV=1.33 and HDTV=1.78. So, in a system with NTSC, PAL or SECAM inputs and a HDTV type of display, a scaler can take the standard NTSC video signal and convert it to a 16×9 HDTV output at various resolutions (e.g. 480p, 720p, and 1080p) as required to fit the HDTV display area exactly.
Scaling is often referred to as “scaling down” or “scaling up.” An example of “scaling down” is when a 640×480 resolution TV image is scaled for display as a smaller picture on the same screen, so that multiple pictures can be shown at the same time (e.g. as a picture-in-picture or “PIP”). Scaling the original image down to a resolution of 320×240 (or ¼ of the original size) allows four input TV resolution pictures to be shown on the same output TV screen at the same time. An example of “scaling up” is when a lower resolution image (e.g. 800×600=480,000 pixels) is scaled for display on a higher resolution (1024×768=786,432 pixels) device. Note that the number of pixels is the product of the two resolution numbers (i.e. number of pixels=horizontal resolution×vertical resolution). Thus, when scaling up, pixels must be created by some method. There are many different methods for image scaling, and some produce better results than others.
A scan converter is a device that changes the scan rate of a source video signal to fit the needs of a display device. For instance, a “video converter” or “TV converter” converts computer-video to NTSC (TV), or NTSC to computer-video. Although the concept seems simple, scan converters use complex technology to achieve signal conversion because computer signals and television signals differ significantly. As a result, a video signal that has a particular horizontal and vertical frequency refresh rate or resolution must be converted to another resolution or horizontal and vertical frequency refresh rate. For instance, it requires a good deal of signal processing to scan convert or “scale” a 15.75 KHz NTSC standard TV video input (e.g. 640×480) for output as 1024×768 lines of resolution for a computer monitor or large screen projector because the input resolution must be enhanced or added to in order to provide the increased capability or output resolution of the monitor or projector. Because enhancing or adding pixels to the output involves reading out more frames of video than what is being read in, many scan converters use a frame buffer or frame memory to store each incoming input frame. Once stored, the incoming frame can be read out repeatedly to add more frames and/or pixels.
Similarly, a scan doubler (also called “line doubler”) is a device used to change composite interlaced video to non-interlaced component video, thereby increasing brightness and picture quality. Scan doubling is the process of making the scan lines less visible by doubling the number of lines and filling in the blank spaces. Also called “line-doubling”. For example, a scan doubler can be used to convert an interlaced, TV signal to a non-interlaced, computer video signal. A line doubler or quadruplet is typically very useful for displaying images on TV video or TFT flat panel screens.
Because of the problems exigent in current conversion systems, there is a need for a system that enhances or improves the quality of video images by correcting the effects caused by converting the video signal from one type to another. For instance, current systems lack an effective way to eliminate field motion from interlaced video material during the conversion to progressive video.