The present invention relates generally to conversion from one video signal format to another and, more particularly, relates to a method and apparatus for horizontally scaling computer pixel data during encoding of a television-compatible composite video waveform such that the pixel data is encoded into the viewable area of the composite waveform without loss of horizontal resolution.
The expanding use of computers for generation and control of entertainment content is giving rise to a need for a means to effectively display computer video output on a standard television. In the home, the television is usually centrally and comfortably located and typically has a larger display than that of a computer monitor, making it attractive for the display of content such as computer video games, Internet web sites and the like. In a business setting, the larger display area of a television is inviting for the display of content such as computer-generated presentations, slides and the like. The video signal format and display generated by a computer, however, differs significantly from the display format and scheme of a television. Moreover, computer monitors are constantly being improved to permit higher resolutions and display of more image information content. Computer users have grown accustomed to these higher resolutions, and wish to duplicate this experience on their televisions. Televisions, however, typically have lower resolutions and require a video signal format that complies with stringent video broadcast standards developed nearly fifty years ago. These differing video signal formats and display characteristics have made attainment of an acceptable means for displaying a computer video signal on a television very elusive.
In order to display a computer video signal on a television screen, the computer video signal must be transformed or converted into a television-compatible video signal. This is a complex process involving many issues. While the present invention deals with a particular aspect of this process, stretching or shrinking (xe2x80x9cscalingxe2x80x9d) horizontal lines of computer video data to fit within the horizontal display area of a television, it is useful to review the fundamental steps of the conversion process. These steps include color space conversion; scan rate conversion; horizontal and vertical scaling, and encoding of the composite waveform in accordance with the desired television signal format.
Color space conversion is necessary because computers and televisions generally use different xe2x80x9ccolor spacesxe2x80x9d for representation and storage of video color data. A color space is a mathematical representation of a set of colors. Computers typically use the RGB (Red, Green, Blue) color space, while televisions use color spaces based on luminance and chrominance values (the YUV and YCbCr color spaces, for example).
The RGB color space is a digital format widely used in computer graphics and imaging. Red, green and blue are the primary additive colors; components of these primary colors can be combined to form any desired color. The RGB color space is the most prevalent choice for computer graphics frame buffers (the memory used to hold images for display) because computer monitors use red, green and blue phosphors to create the desired color. Consequently, using the RGB color space simplifies the architecture and design of the system.
The RGB color space, however, is not always the best choice when dealing with display of xe2x80x9creal worldxe2x80x9d images. The human eye does not always perceive color as a simple addition of red, blue and green, but rather, sometimes as color difference signals. Moreover, processing an image in the RGB color space is not very efficient. In order to modify the intensity or color of a given pixel, for example, all three RGB values must be read from the frame buffer, the intensity or color calculated and the modifications performed, and the new RGB values calculated and written back to the frame buffer. For these and other reasons, broadcast and television standards generally use color spaces employing luminance and chrominance video signals. Luminance, or luma, refers to the black-and-white information in the video signal and chrominance, or chroma, refers to the color information in the video signal. The YUV, YIQ and YCbCr color spaces are luminance and chrominance based.
The YUV color space is the basic analog format used under the NTSC (National Television Standards Committee) composite color video standard, which is used in North America and other parts of the world, as well as under the PAL (Phase Alternation Line) and SECAM (Sequential Couleur Avec Memoire) standards, which are used in Europe and elsewhere. The YUV space is comprised of luma (Y) and chroma (U and V) components. The luma component is made up of portions of all three RGB color signals, and the chroma components are color difference signals developed by subtracting the luma component from the blue signal (U) and from the red signal (V). A set of basic equations is used to convert between the RGB and YUV color spaces:
Y=0.299Rxe2x80x2+0.587Gxe2x80x2+0.114Bxe2x80x2;
                    U        =                                            -              0.147                        ⁢                          R              xe2x80x2                                -                      0.289            ⁢                          G              xe2x80x2                                +                      0.436            ⁢                          B              xe2x80x2                                                                        =                      0.492            ⁢                          (                                                B                  xe2x80x2                                -                Y                            )                                      ;        and                                V        =                              0.615            ⁢                          R              xe2x80x2                                -                      0.515            ⁢                          G              xe2x80x2                                -                      0.100            ⁢                          B              xe2x80x2                                                              =                  0.877          ⁢                                    (                                                R                  xe2x80x2                                -                Y                            )                        .                              
The prime (xe2x80x2) symbols in the above equations indicate that the RGB values are gamma-corrected. Gamma correction is necessary to compensate for the nonlinear characteristics of displays using phosphors, such as cathode ray tubes (CRTs). In CRT displays, a small change in voltage when the voltage level is low will produce a particular change in the output display brightness level, but this same small change in voltage at a higher voltage level will not produce the same magnitude of change in the brightness output level. This effect, or the difference between what should have been measured and what was measured, is known as gamma. Gamma correction adjusts the intensity output of the CRT so that it is roughly linear.
The YUV format is also advantageous in that a black and white display can be driven with just the Y component. For digital RGB values with a range of zero to 255, Y has a range of zero to 255, U a range of zero to xc2x1112 and V a range of zero to xc2x1157.
The YIQ and YCbCr color spaces are derived from the YUV color space and are optionally used by the NTSC composite color video standard. In the YIQ color space, the xe2x80x9cIxe2x80x9d stands for xe2x80x9cin-phasexe2x80x9d and the xe2x80x9cQxe2x80x9d for xe2x80x9cquadraturexe2x80x9d, which is the modulation method used to transmit the color information. I and Q are modulated (one modulator is driven by the subcarrier at sine phase; the other modulator is driven by the subcarrier at cosine phase) and added together to form the composite chrominance signal. For digital RGB values with a range of zero to 255, Y has a range of zero to 255, I has a range of zero to xc2x1152, and Q has a range of zero to xc2x1134. The YCbCr color space is a digital component format developed as part of Recommendation ITU-R BT.601 during the development of a worldwide digital component video standard. It is essentially a scaled and offset version of the YUV color space. Y is defined to have a nominal range of 16 to 235; and Cb and Cr are defined to have a range of 16 to 240, with 128 equal to zero.
Another fundamental step in conversion of computer video signals to television video signals is scan rate conversion. Both computers and televisions utilize CRTs having electron guns that produce an electron beam. The beam is attracted to phosphors on the face of the CRT, activating the phosphors and causing them to emit red, green or blue light. The electron beam begins at the top left of the CRT and scans from left to right across the screen, illuminating pixels (which are comprised of the activated phosphors) in the process. Hence, the electron beam is effectively drawing horizontal lines of video, one pixel at a time.
The horizontal scan rate is the number of horizontal lines drawn per second by the electron beam. The horizontal scan rate of a computer monitor is often twice as fast as that of a television monitor. The horizontal scan rate of a computer monitor is typically in the range of 24 to 65 kHz (24,000 to 65,000 horizontal lines drawn per second); and workstations used in business and industry may have horizontal scan rates exceeding 100 kHz. The horizontal scan rate of a television monitor, by contrast, is only about 15 kHz (15,000 horizontal lines drawn per second). Television broadcast standards specify exact horizontal scan rates that must be strictly adhered to; the NTSC horizontal scan rate is 15.75 kHz and the PAL horizontal scan rate is 15.625 kHz.
When the electron beam reaches the bottom of the display, one frame of video has been completed. The number of frames completed by the beam per second, or the number of times that the frame has been xe2x80x9crefreshedxe2x80x9d, is the vertical scan, frame or xe2x80x9crefreshxe2x80x9d rate. Again, the vertical scan rates of computer monitors are usually much higher than those of television monitors: a television monitor has a frame rate of approximately 30 Hz, while computer monitors have refresh rates of 75 Hz or more.
A VGA (Video Graphics Array, a widely used computer display standard) source, such as a computer, scans out a video display in a xe2x80x9cnoninterlacedxe2x80x9d fashion. That is, all of the lines in a frame are scanned out sequentially, one right after the other. The entire image is drawn on the screen in one pass from top to bottom.
A television display, by contrast, uses an xe2x80x9cinterlacedxe2x80x9d format. Each frame of video is scanned out as two fields that are separated temporally and offset spatially in the vertical direction. Each field is drawn on the screen consecutively and in alternating fashionxe2x80x94first one field, then the other. Essentially, an image is drawn in two top-to-bottom passes: the first pass draws the xe2x80x9coddxe2x80x9d lines (the first field) and the second pass draws the xe2x80x9cevenxe2x80x9d lines (the second field). It follows that the number of lines in a field is one-half the number of lines in a frame. In NTSC, there are 262.5 lines per field (525 lines per frame), and in PAL, there are 312.5 lines per field (625 lines per frame).
An interlaced television display format is utilized because of the relatively slow frame rate of a television, which as mentioned above, is approximately 30 Hz. A television screen updated at only 30 frames per second will cause noticeable flicker, that is, the image will begin to fade before the next one is drawn on to the screen. Flicker is similar to the effect produced by an old fluorescent light fixture. By using two interlaced fields, each containing one-half of the information that makes up the frame and each field being drawn on the screen consecutively, the field update rate is 60 fields per second. At this higher rate, the eye blends everything together into a smooth, continuous motion. Again, under the television broadcast standards, there are exact field rates that must be strictly adhered to: 59.94 Hz for NTSC and 50 Hz for PAL.
The RGB pixel data, after being converted into the YUV, YCbCr or YIQ color space, must be encoded into a composite color video waveform recognizable by the television. The composite waveform contains a number of specifically placed and timed video and control signals. These include the active video signal; the color burst waveform; the horizontal and vertical sync pulses; and the horizontal and vertical blanking intervals. The active video signal contains the encoded luminance and chrominance data for the image that is to be displayed on the screen. The color burst waveform provides the decoder with a reference for decoding the chrominance information contained in the active video signal. The horizontal and vertical sync pulses are control signals that signal to the decoder the start of new horizontal lines and new frames. The blanking intervals signal the decoder to shut off the electron beam while it is being retraced from the right edge to the left edge of the display, or from the bottom to the top of the display. Each of these signals is combined into one composite video waveform that is transmitted to the television on a one-wire connection.
The composite video waveform must be encoded in strict accordance with the applicable broadcast standard, such as NTSC or PAL. These standards specify important timing parameters such as the horizontal and vertical sync pulse widths, the rise and fall times of the pulses, and the position and number of cycles in the color burst. These timing parameters must not be altered while encoding the waveform. Numerous problems can result from even slightly inaccurate timing. Errors in the pulse widths can lead to picture break up. Errors in the rise and fall times can make it difficult for the television receiving equipment to lock to the signals.
A composite color video waveform 100 adhering to the NTSC standard is illustrated in FIG. 1. Waveform 100 includes a horizontal blanking interval 102 that extends between active video signals 104 and 106 (only portions of signals 104 and 106 are illustrated. The duration of waveform 100, including one complete active video signal for one horizontal line of video and one horizontal blanking interval, is 63.555 xcexcs. Stated another way, a horizontal sync pulse occurs once every 63.555 xcexcs.
Active video signals 104 and 106 contain the luminance and chrominance display information for adjacent horizontal lines of video N and N+1. The luminance is the monochrome component of the signal containing brightness and contrast information, and chrominance is the color component, containing hue and saturation information. The hue and saturation values are transmitted on a color subcarrier wave located within a specific frequency band (3.58 MHz) of the luminance signal. By arranging scanning frequencies to be rigidly tied to the color subcarrier frequency, the hue and saturation components of the chrominance signal are recoverable from the luminance signal. The active video signals range in amplitude from 100 IRE (white) to 7.5 IRE (black). An IRE unit is an arbitrary unit used to describe the amplitude characteristics of a video signal. One IRE corresponds to approximately 7.14 mV. The duration of the active video signal under NTSC is 52.66 xcexcs. The active video signal starts 9.40 xcexcs after the falling edge of the horizontal sync pulse.
During the horizontal blanking interval, the video signal is at the blank level so as not to display the electron beam as it sweeps back from the right to the left side of the screen. The duration of blanking interval 102 is approximately 10.9 xcexcs. Defined within blanking interval 102 is horizontal sync pulse 108 and color burst waveform 110. Horizontal sync pulse 104 signals the beginning of a new horizontal scan line. It is the maximum negative excursion of the composite waveform; approximately xe2x88x9240 IRE. As with the other portions of the waveform, sync pulse 108 has rigidly defined parameters. It has a length of 4.70 xcexcs, a drop time (from xe2x88x9240 IRE to 0 IRE) of 138 ns and a rise time of 137 ns. The falling edge of the horizontal sync pulse signals to the decoder that a new horizontal line is starting, and the rising edge signals that the color burst waveform is coming. The flat portion 112 of the waveform positioned between the end of active video signal 104 and the beginning of sync pulse 108 is at the blanking level (0 IRE) and is often referred to as the xe2x80x9cfront porchxe2x80x9d. It has a duration of 1.50 xcexcs.
Color burst waveform 100 follows horizontal sync pulse 108. Color burst waveform 110 serves as reference for the chrominance signals that are phase modulated and encoded in the active video signal (at the subcarrier frequency of 3.58 MHz). It consists of a nine-cycle sine wave at the subcarrier frequency and at a specific phase. The decoder determines the proper color of the active video from the phase relationship between the color burst waveform and the modulated chrominance signals. Color burst waveform 110 starts 5.3 xcexcs after the horizontal sync pulse falling edge and ends 7.82 seconds after the sync pulse falling edge.
Color burst waveform 110 is positioned on a portion 114 of waveform 100 that extends from the rising edge of horizontal sync pulse 108 to the beginning of active video signal 106. This portion of waveform 100 is often referred to as the xe2x80x9cback porchxe2x80x9d. Back porch 114 is at the blanking level and has a duration of 4.39 xcexcs. The portion 116 of back porch 114 extending between the rising edge of horizontal sync pulse 108 and the start of color burst waveform 110 is sometimes referred to as the xe2x80x9cbreezewayxe2x80x9d and has a duration of 0.5 xcexcs. This time slot gives the decoder time to recover and prepare for detection of the color burst waveform.
The final component of the composite waveform is the vertical sync, which is not illustrated in FIG. 1. It occurs every 525 lines for NTSC, as a sequence of pulses, and signals the decoder that a new frame is about to begin.
FIG. 2 is a table that sets forth important timing parameters for NTSC and other common video broadcast formats. These parameters include the subcarrier frequency, the color burst waveform starting and ending times, the width and frequency of the horizontal sync (HSYNC) pulse, the starting time of the active video signal and the timing of the active video image center; and the length of the front porch. FIG. 3 is a table that lists constant values that are dependent on the video output format. These values include the target active lines per output field (ALO); total lines per output field (TLO); the active time per output line (ATO); and the total time per output line (TTO)
The final fundamental issue, horizontal and vertical scaling, arises from the differences in display resolution between computer and television monitors. Resolution is the basic measurement of how much information is on the screen. Resolution is described by a first number representing horizontal resolution (total number of pixels in one horizontal scan line) and a second number representing vertical resolution (total number of horizontal lines down the screen). The typical resolution of a computer monitor is, at a minimum, 800xc3x97600, and may be upwards of 1280xc3x971024. The standard NTSC resolution, by contrast, is only 640xc3x97480. Hence, vertical scaling is the process of making the 600 lines (or more) displayed by a computer fit within the television vertical line resolution; and horizontal scaling is the process of making the 800 pixels (or more) per horizontal line displayed by a computer fit within the television horizontal resolution.
In addition to resolving resolution differences, horizontal and vertical scaling is necessary to counter overscan in the television display. The electron gun in a television set typically overscans the edges of the viewable display area by five to fifteen percent, causing the image to bleed off of the edges in all directions. Overscan is not typically a significant problem when broadcast or recorded signals are displayed on the television, since the viewer usually has no knowledge of the source material. It can, however, pose serious problems when computer-generated video data is displayed on a television. Critical information, such as menus or tool bars, may be lost outside of the television viewable area.
In determining the amount of vertical filtering or scaling that is necessary, flicker should also be taken into account. In a computer-generated image, there are frequently abrupt transitions from one scan line to the next. Even at the NTSC scan rate of 60 Hz, the human eye can detect these transitions and scan lines may be seen flashing individually every {fraction (1/30)} of a second. Vertical or flicker filtering is a technique employed to remove flicker from computer-generated video displayed on a television. A vertical filter averages adjacent scan lines to soften the transition between dark and light lines and to produce lines with less sharply defined contrasts. One common filter, for example, produces a television line by adding one quarter of the current line, two quarters of the previous line and one quarter of the line before that. This is called a xe2x80x9c1-2-1xe2x80x9d or a xe2x80x9cxc2xc-xc2xd-xc2xcxe2x80x9d filter.
Horizontal scaling has traditionally been achieved by adjusting the number of pixels encoded into the active video signal such that for a given clock rate, the pixels exactly fill the viewable image area of the television. The video data must also fit within the time allotted by the applicable broadcast standard for the active video signal. Under the NTSC standard, for example, the video data can occupy a maximum of 52 xcexcs.
As mentioned above, overscan reduces the actual display time to 44-50 xcexcs. The standard encoder clock rate of 13.5 MHz, for a computer monitor resolution of 640 pixels, yields a display time of 47 xcexcs. This is effective so long as the amount of overscan does not exceed ten percent and the horizontal resolution of the computer video is not greater than 640 pixels.
To display an image having a horizontal resolution of more than 640 pixels, or to accommodate overscan of greater than ten percent, the number of pixels encoded into the composite video waveform is scaled or adjusted to fit within the viewable area. This process is also known as sample rate conversion. An incoming horizontal resolution of 800 pixels, for example, must be scaled down to 640 pixels prior to encoding in order to yield an active video signal having a length of 47 xcexcs at a clock rate of 13.5 MHz. The simplest form of horizontal downscaling is pixel dropping, in which (m) out of every (n) pixels is thrown away. Similarly, horizontal upscaling (in situations where the incoming horizontal resolution is less than the television resolution) is accomplished by duplicating (m) out of every (n) pixels. Pixel dropping (or duplicating) is a crude method of scaling that tremendously impacts resolution and invariably introduces artifacts (blemishes, noise and other physical disruptions) into the image.
Another approach for accomplishing scaling is through interpolation or averaging of adjacent pixels. Various algorithms for interpolation scaling are well known in the art. Scaling via interpolation or averaging improves the video quality relative to simple pixel dropping, but still involves a loss of resolution and artifact generation. Video quality generally depends on the complexity of the algorithm used. Significant amounts of hardware and/or memory are required to implement the more complex algorithms.
Accordingly, there is a need for a horizontal scaling method and apparatus that does not physically alter the encoded pixel data or introduce distortion or artifacts into the encoding process.
There is also a need for a horizontal scaling method and apparatus that is not dependent on complex and implementation-costly scaling algorithms.
Objects and advantages of the present invention include any of the foregoing, singly or in combination. Further objects and advantages will be apparent to those of ordinary skill in the art, or will be set forth in the following disclosure.
In accordance with the purpose of the invention as broadly described herein, there is provided a method and apparatus for horizontally scaling pixel data that does not involve averaging or otherwise physically altering the pixel data. Rather, the encoder clock rate is modified such that all incoming pixels are encoded within the time allotted for the active video portion of the composite waveform. The encoded waveform supplied to the television contains the full and unaltered image resolution without introduction of unwanted artifacts. Various overscan ratios are also accommodated by the present invention by modifying the encoder clock rate to compress or expand the viewable area of the display.
In one embodiment, an apparatus for converting an input video format to an output video format is provided. The input video format comprises pixel data having a predetermined horizontal resolution, and the output video format comprises a composite video waveform having a control portion followed by an active video portion. The apparatus comprises:
input means for receiving the pixel data from a video source;
an encoder for generating the output video format from the input pixel data, the encoder receiving the pixel data from the input means one pixel at a time and encoding each pixel into the active video portion of the waveform at an encoder clock rate, the encoder further comprising control means for generating and encoding control signals into the control portion of the waveform at the encoder clock rate;
an encoder clock generator that can generate a range of encoder clock rates, wherein the generator calculates and generates the encoder clock rate based on the horizontal resolution of the input pixel data, the clock rate being sufficient to encode all pixel data from one line within a viewable area of the active video portion of the waveform without physically scaling or altering the pixels; and
output means for outputting the composite video waveform to a video receiver.
In another embodiment, a method for horizontally scaling input computer pixel data into the active video portion of a television-compatible composite waveform during encoding of the waveform by a video encoder is provided. The method comprises the following steps:
(a) determining the horizontal resolution of the input computer pixel data;
(b) generating an encoder clock rate based on the horizontal resolution that is sufficient to permit all of the pixel data to be encoded within the active video portion of the waveform without loss of resolution;
(c) determining sync and color burst timing parameters based on the generated encoder clock frequency;
(d) inputting the pixel data for one pixel into the encoder on the rising edge of the encoder clock;
(e) incrementing the horizontal line count by one;
(f) adding sync and burst information to the composite waveform based on a comparison of the timing parameters and horizontal line count; and
(g) repeating steps (d)-(f) until all pixel data for a horizontal line is encoded into the active video portion of the waveform.
In an additional embodiment, an apparatus for horizontally time-scaling input pixel data during encoding of a composite video waveform is provided. The apparatus comprises:
an encoder clock generator for generating an encoder clock rate that varies according to the horizontal resolution of the pixel data, the encoder clock rate being calculated to permit encoding of all pixel data without alteration into the viewable area of the active video portion of the waveform;
a storage medium loaded with timing parameters and control waveform shapes that correspond to the generated encoder clock rate; and
an encoder that encodes the input pixel data into the active video portion of the waveform at the encoder clock rate and that encodes control information into the waveform based on the timing parameters and waveform shapes stored in the storage medium, the control information including a horizontal sync pulse and a color burst signal.
In one implementation example, the storage medium comprises a ROM that stores a plurality of timing parameters and waveform shapes corresponding to a plurality of encoder clock rates, and a plurality of registers that are written by the ROM with the timing parameters and waveform shapes appropriate to the calculated encoder clock rate.
Further features and advantages of the invention, as well as the structure and operation of particular embodiments of the invention, are described in detail below with reference to the accompanying drawings.