1. Field of the Invention
This invention relates to the field of analog and digital signal processing, and more specifically to circuitry and systems for providing switching, scan conversion, scaling, and processing where the output frequency is different from the input frequency.
2. Background Art
Switchers are a means of connecting an input source to an output device or a system. Typically, a switcher allows a user to provide an output derived from a selection between more than one input signal source or connector type. Furthermore, various types of switchers have various components, capabilities, options and accessories.
2.1 Graphics Environment
For digital display technologies, a Graphics Switcher (GS) is a device that enables multiple analog and digital input signals to be selected and sent to various selected output devices, such as presentation displays. FIG. 1 illustrates a typical graphics environment showing various pieces of digital display technology connected by a graphics switcher, in accordance with an embodiment of the present invention.
Hence, a graphics switcher 100 allows source signals derived from inputs such as video cameras 102, VCRs 104, DVDs 106, TV video, audio/video systems, and computers 110, 112, and 114 to be selected and viewed on a presentation display 120 one at a time. For example, when trying to display from two computer inputs 110 and 114 having separate presentations, a graphics switcher 100 can physically connect both of the computers to the display device and allow input selection from the two computers for display on the display device 120. Other examples of graphics switcher use are for generating special graphics and movie effects; in industrial settings or security applications for switching between video cameras inputs for displaying certain areas on monitors or systems of display devices.
Typical inputs to a graphics switcher comprise computers, TV video, composite video, red-green-blue (RGB) video, S-Video, D-1 (digital) video, computer input (e.g. VGA, SVGA and Mac video formats), video cameras, VCRs, and various other audio/video inputs as appropriate. Furthermore, inputs may originate from different physical locations. For instance, to form a presentation on a larger screen display, a switch may be used to choose between inputs received from a computer at one end of one room, a computer in another room, a video camera taking video of a performance, and a video conferencing system.
Similarly, a switcher provides output to various sources or presentation formats. Examples of outputs comprise LCD panels (including high-resolution LCD projectors), DLP displays (including high-resolution DLP projectors), high resolution plasma displays, TV displays, CRT display devices 122 and 124 (e.g. VGA, SVGA and Mac video formats), audio stereo systems, and various other audio/video outputs as appropriate. For instance digital projectors used for business presentation supply digitally addressed elements to LCD panels, DLP panels, digital light processing devices, and various others.
A TV signal has a set number of horizontal lines. In PAL and SECAM, it's 625; in NTSC, it's 525. However, not all of these lines are visible. In fact, only 576 lines in PAL and SECAM and 483 lines in NTSC are seen by the TV viewer; the remainder are called blanking lines, which contain no picture information and are hidden at the top and bottom of the screen.
By contrast, the number of horizontal lines on a computer display can range dramatically, from lower resolutions of 480 visible horizontal lines or less, up to very high resolutions with 1280 or more lines. Many computers contain video cards that allow the user to choose between several different display resolutions.
The higher the display resolution, the more crisp and clear small details and text become. For example, a computer screen composed of 768 horizontal lines is able to contain and display more detail than a computer picture composed of only 480 lines, or a TV picture composed of 576. The relatively small number of horizontal lines in a TV video picture limits the ability to display very small text or other intricate visual details.
TV video is defined by either the NTSC, PAL or SECAM standard, which dictates the number of lines in the picture, how the color information is defined and the speed with which the lines are painted on the screen from top to bottom (refresh rate). However, within PAL, NTSC, and SECAM, there are actually several signal formats that meet these standards. Composite video is the most commonly used format. In composite video, all the video information (e.g. information for red, green, blue (RGB) and sync) are all combined into a single signal. S-Video, which provides a superior picture quality, separates the chrominance (color) from the luminance and sync information. Other variations of PAL and NTSC include RGB at 15 kHz, component video and D-1 (digital) video.
While all of these formats differ in the way the video information is combined into a signal, they still have certain things in common. They are all interlaced, they have either 576 (PAL and SECAM) or 483 (NTSC) visible lines, and they have an established, unvarying refresh rate. For PAL, two interlaced fields, making up a single "frame," are painted onto the screen 25 times each second (a rate of 25 Hz), and for NTSC, this occurs 30 times each second (30 Hz).
Unlike TV video, there is no single standard by which all computer video signals must abide. As discussed earlier, there is a wide range of commonly used display resolutions. There is an equally wide range of refresh rates, most falling between 60 and 85 Hz. And, while almost all computer displays are non-interlaced, some video display cards do offer an interlaced display option. However, what computer video signals do all have in common is the way in which they describe chrominance and luminance information to the monitor. All VGA, SVGA and Mac video formats transmit the red, green and blue information as separate signals. But, there is some variation between computers in the way sync information is combined with the color signals. By keeping red, green and blue separate from each other, computer monitors are able to display a wide range of colors with minimal distortion.
2.2 Types of Switchers
In order to support such a wide variety of analog and digital inputs and outputs, numerous types and "lines" of switchers have been developed. For example, there are audio/video (A/V) switchers; VGA, Mac and RGB switchers; system switchers; and matrix switchers. In addition, the numerous signal characteristics associated with switching mixtures of inputs to outputs has led to number of switch options and accessories.
For instance, a line of A/V switchers may accept NTSC/PAL/SECAM composite and S-video type video sources, as well as two channels of stereo audio from amongst six selectable inputs. Each model in the line is then differentiated by the type or combinations of video audio formats that it accepts.
Another line of switchers, VGA, Mac and RGB switchers, are used for simple routing applications. A model of this line can be dedicated to switching signals of only one specific computer type, such as VGA or Mac. Alternatively, another model may provide more input flexibility, by accepting both VGA and Mac video signals.
A more complex switcher type, the system switcher, may be compatible with all types of digitally controlled projectors and accept virtually all source signals. Thus, a system switcher can easily switch between computers, A/V components and audio sources. In addition, an accessory may allow a system switcher to communicate with a projector and be recognized by the projector as if the switcher were the same brand as the projector.
A special type of switcher, the matrix switcher, routes multiple inputs to multiple outputs. For example, input #1 (e.g. camera 102) can be routed to output #1 (e.g. preview monitor 124) or output #2 (e.g. program monitor 122); input #2 (e.g. PC computer 110) can be routed to outputs #3 (e.g. program monitor 122) and #4 (e.g. digital display 120s); and so on--in any combination. Thus, a matrix graphics switcher may allow for the switching of multiple inputs and outputs in most video and RGB formats. Matrix switchers are commonly used in applications such as presentations, data display, and entertainment. These applications require multiple input sources (computers, cameras, DVD players, etc.) to be switched to more than one output destination (monitor, projector, videoconferencing CODEC). Addition of an auto-switching accessory allows such switchers to automatically switch between various types of inputs, and/or outputs when a change in input signal type is detected. Thus, a switcher may have various signal conversion and processing capabilities depending on switcher type and needs. For example, graphics switchers implement mixture of scan conversion, scaling, filtering, and other capabilities as needed for their desired performance.
A scaler changes the size of an image without changing its shape, for instance, when the image size does not fit the display device. Therefore, the main benefit of a scaler is its ability to change its output rate to match the abilities of a display device. This is especially advantageous in the case of digital display devices because digital display devices produce images on a fixed matrix and in order for a digital display device to provide optimal light output, the entire matrix should be used. FIG. 2 illustrates a digital display device showing the pixel matrix for displaying an image, according to an embodiment of the present invention. Thus, the goal of a scaler is to have output flexibility so that the input image can be scaled to an output image 202 that matches the pixel matix 204 of the display device 206 or the display "sweet spot".
Since a scaler can scale the output both horizontally and vertically, it can change the "aspect ratio" of an image. Aspect ratios are the relationship of the horizontal dimension to the vertical dimension of a rectangle. Thus, when included as part of a graphics switch, a scaler can adjust horizontal and vertical size and positioning, for a variety of video inputs. For example, in viewing screens, the aspect ratio for standard TV is 4:3, or 1.33:1; HDTV is 16:9, or 1.78:1. Sometimes the ":1" is implicit making TV=1.33 and HDTV=1.78. So, in a system with NTSC, PAL or SECAM inputs and a HDTV type of display, a scaler can take the standard NTSC video signal and convert it to a 16.times.9 HDTV output at various resolutions (e.g. 480p, 720p, and 1080p) as required to fit the HDTV display area exactly.
Scaling is often referred to as "scaling down" or "scaling up." An example of "scaling down" is when a 640.times.480 resolution TV image is scaled for display as a smaller picture on the same screen, so that multiple pictures can be shown at the same time (e.g. as a picture-in-picture or "PIP"). Scaling the original image down to a resolution of 320.times.240 (or 1/4 of the original size) allows four input TV resolution pictures to be shown on the same output TV screen at the same time. An example of "scaling up" is when a lower resolution image (e.g. 800.times.600=480,000 pixels) is scaled for display on a higher resolution (1024.times.768=786,432 pixels) device. Note that the number of pixels is the product of the two resolution numbers (i.e. number of pixels=horizontal resolution.times.vertical resolution). Thus, when scaling up, pixels must be created by some method. There are many different methods for image scaling, and some produce better results than others.
A scan converter is a device that changes the scan rate of a source video signal to fit the needs of a display device. For instance, a "video converter" or "TV converter" converts computer-video to NTSC (TV), or NTSC to computer-video. Although the concept seems simple, scan converters use complex technology to achieve signal conversion because computer signals and television signals differ significantly. As a result, a video signal that has a particular horizontal and vertical frequency refresh rate or resolution must be converted to another resolution or horizontal and vertical frequency refresh rate. For instance, it requires a good deal of signal processing to scan convert or "scale" a 15 KHz NTSC standard TV video input (e.g. 640.times.480) for output as 1024.times.768 lines of resolution for a computer monitor or large screen projector because the input resolution must be enhanced or added to in order to provide the increased capability or output resolution of the monitor or projector. Because enhancing or adding pixels to the output involves reading out more frames of video than what is being read in, many scan converters use a frame buffer or frame memory to store each incoming input frame. Once stored, the incoming frame can be read out repeatedly to add more frames and/or pixels.
Similarly, a scan doubler (also called "line doubler") is a device used to change composite interlaced video to non-interlaced component video, thereby increasing brightness and picture quality. Scan doubling is the process of making the scan lines less visible by doubling the number of lines and filling in the blank spaces. Also called "line-doubling". For example, a scan doubler can be used to convert an interlaced, TV signal to a non-interlaced, computer video signal. Hence, in order to display TV video on new TFT flat panel screens, a line doubler or quadrupler is indispensable.
2.3 Problems with Graphics Switchers
When a graphics switcher switches between input signals having disparate refresh rate frequencies or resolutions, either the switcher or the display needs to lock to the new vertical refresh rate and horizontal refresh rate. As a result when the input signal is switched and a signal having a new frequency is sent to the output display device, the display has to reacquire and lock up to the new frequency so the new input can be displayed. During the time it takes the display to reacquire the new input signal frequency, the output drifts leading to picture scrambling and/or noise which results in a "jitter" in the output display.
Accordingly, in order for a graphics switcher to provide a stable output, it must be capable of switching between multiple analog and digital input formats and resolutions while keeping the output rate and resolution stable. One way to design such a switch is to use signal processing.
2.4 Seamless Graphics Switchers
A switcher that provides such a stable output during switching is generally referred to as a Seamless Graphic Switcher (SGS). The term "seamless" derives from providing a glitch-free "cut" that eliminates the noise and jitter caused by switching between unsynchronized inputs. By using signal processing, the output is kept stable in an SGS while the input is switched between multiple analog and digital formats because the inputs are scan converted to one frequency before being sent to the display. Since the signal processor is doing the "locking" onto the new input rates, the display always sees the same resolution and has the same constant sync. Thus, because the display only receives one frequency, it does not have to reacquire the signal and thereby does not produce the jitter related to switching the input. Therefore, scan conversion signal processing permits the user to switch between inputs, without causing jitter in the output from input switching.
In order to scan convert the inputs to one frequency, a SGS writes the input to and reads the output from a memory buffer. Once stored, the incoming frame can be processed and/or read out repeatedly, to add more frames or pixels. Hence, using a memory buffer also allows an SGS to provide scaling (as previously described). In fact, seamless switching usually involves scaling and seamless switchers are usually comprised of two scalers and a matrix switcher.
For example, referring to FIG. 1, a prior SGS product 100 is capable of handling eight different input signals 130, includes routing and control functions for handling the signals, and provides scaling and synchronization ("sync") of the image to the selected output resolution. Thus, the SGS accepts RGB or component video signals with various scanning rates while the operator seamlessly switches those eight inputs to a fixed output rate that is selectable.
As such, the prior SGS can be used for staging events where high frequency computer video 110, 112, and 114 and standard frequency video from a camera 102 must be seamlessly switched to high frequency and high computer resolution outputs 120, 122 and 124. The prior SGS can accept both interlaced and non-interlaced video formats with resolutions from 560.times.384 up to 1600.times.1200 with scan rates of 15 kHz up to 100 kHz and provides two different output signals. The first output is the "program" output for viewing by the audience. The second output is the "preview" output for viewing "next to switch" sources by the switch operator on a local monitor. Thus, the switch operator can seamlessly switch the "preview" to the "program" output or choose a digital transition effect to use when the physical switch is made.
In order to optimize image quality as well as maintain maximum image brightness and detail, all inputs are scaled to resolutions that match the "sweet spot" or native resolution of digital displays. Advanced digital video scaling technologies enable the example SGS to scale RGB inputs to one of eighteen common computer-video, HDTV, or plasma resolutions. These scaled output resolutions for computer-video output rates are 640.times.480, 800.times.600, 832.times.624, 1024.times.768, 1280.times.1024, and 1360.times.1024. For plasma displays, the output resolutions are 848.times.480, 852.times.480, 1280.times.768, and 1360.times.765. The SGS also provides HDTV 480p, 720p, 1080i, and 1080p output rates.
2.5 Problems with Seamless Graphics Switchers
Nevertheless, although SGSs solve the graphics switcher new input "jitter" problem, there is an inherent problem with SGSs that use a frame buffer or memory to convert an input video signal with one horizontal and vertical frequency refresh rate to an output with another horizontal and vertical refresh rate. As each input frame comes in, the SGS stores that entire frame internally in a box in the memory, which allows the SGS to signal process or read that frame out repeatedly.
However, if the output vertical read rate from the memory buffer is not an integer multiple of the input vertical write rate to the memory, the information in output frame will contain two different input frames at some point in time. As a result, part of the output display will show the image from one input frame, while the rest of the output display shows the image from the second input frame. If there is motion in the input images, elements in the two input frames will be different and therefore, the output frame will display part of one image (e.g. a portion of the "before" image) and part of a later image (e.g. a portion of the "after" image). Moreover, at the border between the two images, a "tear" will appear in the output. FIG. 3 illustrates a digital display output image having a "tear", according to an embodiment of the present invention.
Thus, for instance, input of a ball that is moving horizontally from right to left 300 will result in the top part of the output frame showing the image from the second input frame 302, while the bottom part of the output frame shows the image from the first input frame 304, and a tear in the image where the two parts of the output frame meet 306. The image in the top portion of the output is shifted to the left of that in the bottom portion of the output because the top portion is an image from later in time while the object moves from right to left. Note that there is also a horizontal pixel shift in the output image at the point where the read and write pointers cross over 310.
Thus, current SGSs have a particular problem when the input images contain "panning." For instance, a lot of camera panning is necessary during an on-stage event where the cameras are tracking someone by following them around on the stage. Then, during the scan converting process, because different input frequencies are coming into the SGS and a different scan converted rate going out of the SGS, there are two different refresh rates for writing to and reading from the memory buffer. As a result, the vertical read and write rates are not locked in synchronization.
Describing what creates the output tear in another way, because each output frame being displayed is read from memory, when frames are being written to and read out of memory at different rates, the write pointer and read pointer moving along in memory at different rates. Hence, the read and write pointers will eventually cross, and when they cross, the read pointer will go from new input frame information just behind the write pointer to old input frame information that the write pointer was about to write over. The new and old input frame information will then be combined in the current output frame, and if there is movement in the input (e.g. sideways panning) then output will include a tear where the read pointer crossed the threshold between the newer and older input frames.
When the vertical frame refresh rate coming in and the vertical refresh rate going out cross, or "vertical syncs" cross a tear is formed. When they cross, as the input is being written into the frame memory, the two pointers in the memory actually cross, and as a result a single output frame is displayed having old input frame information and new input frame information.
Note that the tear produced in the output of SGS devices is a bigger problem in Europe where the output frame vertical frequency rate is usually 60 Hz and all of the input source vertical frequencies are usually 50 Hz. hence, because there is delta between the two vertical refresh rates of 10 Hz, European SGS applications can encounter the tear up to 10 times a second.
FIG. 4 is a waveform diagram of the input and output vertical sync pulse in an attempt (i.e. because the actual phenomenon can only be captured in a motion picture or a series of frames) to depict the result when the output vertical sync pulse is not locked with the input vertical sync pulse. Referring to FIG. 4, the top trace shows the vertical sync of the input signal 402. The bottom trace shows the vertical sync of the output signal 404. The sync pulses are depicted at points 406 for the output vertical sync and at point 408 for the input vertical sync. In reality, the output and input vertical sync pulses have different frequencies when there is no lock hence, when viewed together on an oscilloscope, there is a relative motion between the sync pulses.
2.6 Attempted Solutions
One option for solving the SGS output tear, is to take the input signal horizontal and vertical rates and exactly duplicate them at the output so that the frame rates are the same and the horizontal and vertical syncs are the same. The problem with this solution is that it prohibits scaling or scan conversion because the input and output pixel counts are exact equal.
For example, if the input rate is only 15 KHz video, then with the output horizontal and vertical rates locked to the input, the switch can only provide 15 KHz output. Thus, high resolution output video is not possible because 15 KHz interlaced output does not provide enough pixels for big screen projectors.
An additional attempt to solve the tear in SGS output frame during input movement is to use a Phase Locked Loop (PLL) to achieve synchronization between the output vertical sync pulse and the input vertical sync pulse. However, prior implementations of this method fail and the vertical frame rates do not end up synchronized leading to tears in the output. After a certain number of output frames, the output read pointer will cross the input write pointer and when it does, that output frame will still end up containing parts of two input images and a "tear" in between.
Another attempt at solving the SGS output tear is to add frame memory to sort of double buffer the input frames so that there are two input frame buffers containing consecutive input images. Then, whenever the vertical pointers cross, the output frame having two input images can be removed or replaced with the next single image frame. So, at any given instant one of the two buffers has only information from one input frame. Then, when the pointers cross, whichever buffer has information from just one input frame is output. The net effect is that the SGS actually drops a frame or double displays a single frame. The problem is that when an output frame is removed or replaced, the timing of the images is mixed up and any linear motion, such as during panning, will suddenly appear to either hesitates for a frame or jumps ahead for a frame because image information is missing.
For example, motion is not smooth anymore. Instead it includes jumps and hesitations. Thus, motion may appear to stop, then repeat, then jump or skip; or stop, then make a big jump, then stop, them make another big jump. Any panning or any motion in any direction on the screen that's moving at a constant velocity, will jump. Objects in motion will look like they hopped or stuttered. Or, stopped for a second and then continued. On a large screen projector, the image has "hick-ups", and it appears that something is wrong with the image.
Hence, if a user was attempting to record the SGS output at high speed and the SGS is dropping frames to avoid tears, the recording would be full of stutters. Also, if the frames were dropped during a broadcast or high definition display the output would appear unprofessional and sloppy. In addition this method causes variable output audio delay and skipping in parallel to that described for the output image.
Another attempt at solving the SGS output tear is to delay the output so that whenever the vertical pointers cross, signal processing can be performed to somehow "smooth over" the output frame having two input images. However, even with a delay, the output frame with a tear still exists and a frame or frame portion must still be dropped or added. Thus, when the output frame or a portion thereof is removed or replaced, the timing of the images is mixed up and any linear motion, will suddenly appear to either hesitate or jump ahead.
An additional problem with adding delays with SGSs is that video is often delayed at several points in a system due to signal processing steps that have frame delays or delays due to recording to memory. For example, in large staging events delays start to accumulate and can actually accumulate to the point where a speaker or singers lips are out of sync with the sound provided by the system. Generally, an entire system can get by with up to a one frame delay of audio to video, but past one frame and depending on the circumstances, the timing difference between the audio and output image lip sync is discernable.
Therefore, it is desirable to provide a system capable of locking the output vertical frame sync pulses to the input vertical frame sync pulses, while allowing for a different horizontal frequency in the output rate, and while maintaining a constant seamless output frequency and resolution during switching of inputs. For example, a desirable SGS is one where the read and the write pointers in memory do not cross, and it does not produce an output frame made up of two different input frames.
It is also desirable to provide a system wherein the output and input vertical sync pulses are locked, and the position of the output vertical frame read pointer in memory can be adjusted as compared to the position of the input vertical frame write pointer. For instance, a desirable SGS would allow the output read pointer to be placed at any point in reference to the input vertical pointer, so that for instance, frame rate delay could be adjusted (e.g. to say, half a frame). Thus, frame rate delay for an SGS could be set at, adjusted to, or programmed to change to one or more constant, predictable values as desired.
Such a SGS is also desirable because having the video delay locked at a specific value provides a predictable and constant delay for synchronizing the audio to the video. Thus, such an SGS allows for exact, predictable, and more precise video to audio synchronization.
Furthermore, it is desirable to provide a system with an adjustable position of the output vertical frame read pointer in memory as compared to the position of the input vertical frame write pointer so that output frame rate delay can be adjusted to near zero. For instance, a desirable SGS would allow the output vertical frame delay as compared to the input to be reduced as much as possible while still allowing the SGS to function.
Hence, although an SGS can not have a 0 delay unless it processes and outputs lines as they come in, an SGS can have near 0 delay if the input and output frame rates are locked together, and the read and write pointers adjustable as compared to each other. Then, at large staging events where the video is run through several delay causing processing and recording steps, accumulation of delay can be minimized. As a result, a very low frame delay large stage event system does not require an audio delay so that the audio lines up with the delayed video. Such a SGS is also desirable because the video delay can be minimized, thus allowing the audio offset to be minimal or if necessary allowing for a minimal required delay in audio for audio/video output synchronization.