Standard definition (SD) television delivers a full picture to the screen approximately 30 times per second. The North American standard for SD video (NTSC) requires that that each video frame be made up of 720 by 480 pixels. Because each pixel is commonly defined by 2 bytes (other sizes can be used), standard definition TV requires a sustained data rate of approximately 20 mega bytes per second (MBps) (i.e., 720×480×30×2 bytes/s). In today's computing and disk systems, 20 MBps is only a moderate data rate, achievable without substantial expense or design restriction.
The current standards for high-definition (HD) video contemplate video data rates that are up to six times higher than that of standard definition television. As used herein, high-definition or HD refers to any video format that increases the data rate beyond that of standard definition video including applications for digital film (motion picture) production, which currently use resolutions up to 4,096 by 2,731 pixels at 24 frames per second. One such high-definition video standard (often referred to as 1080i) contains frames of 1920 by 1080 pixels at 30 frames per second. Other HD formats (e.g., 720p) increase frame rates to 60 frames per second at a resolution of 1280 by 720 pixels. Both the common 720p and 1080i HD video formats require sustained data rates over 100 MBps. At a sustained rate of 100 MBps, standard computing solutions are no longer adequate. The common expansion bus within desktop PCs (32 bit PCI) can only sustain a maximum of around 80 MBps. Individual hard drives can only sustain about 30 MBps. Editing of video requires simultaneous processing of multiple 100 MBps streams that can overwhelm the fastest workstation CPUs. Due to these combined limitations, real-time HD editing is limited to the most expensive and custom systems on the market. As video resolutions increase, the demands on the system architecture increase.
To simplify the problems presented by video production, many solutions exist today to reduce the volume of data that needs to be processed. The limitations of drive and bus speeds have been commonly solved through video compression. Compression allows for a large reduction in data rate while maintaining the visual quality of the original source material. Compression is commonplace in SD video production, yet in HD video production, compression is not typically used in today's editing systems for a variety of reasons. For example, cost-effective production-quality hardware-based compression that allows an editor to compress and decompress video without visual quality loss does not currently exist. Although hardware compression exists for use in distribution systems (e.g., satellite or terrestrial HD broadcasts), these tools do not meet production quality or architectural requirements. In addition the CPU load for software-based compression using existing technology is very high. Although software decompression can be used for single-stream HD playback, it taxes the CPU, which may already be overloaded by processing video mixes and effects. For example, playback of a single stream of HD MPEG2 will consume 70-80% of the resources of today's fastest CPUs. As a consequence, multi-stream HD decoding and mixing is beyond the capabilities of the standard PC. Further, software encoding (i.e., compression) is typically much more CPU-intensive than decoding (i.e., decompression); therefore expensive hardware is required for encoding during video acquisition.
As a consequence of the difficulties associated with video compression discussed above, commercial HD-based production systems typically use uncompressed video. This mandates that the common PC be replaced with a workstation-class machine equipped with for example, a 64-bit PCI bus and a RAID (Redundant Array of Inexpensive Disks) hard drive solution so that the required data rate can be sustained. These setups are expensive, and without compression, large quantities of disk storage are required for any long-form production.
Video editing typically involves a process that combines two or more video streams into a single video stream through application of one or more video processing filters (e.g., transitions that individually combine one or more video streams into a single video stream). Each of the video streams may be modified individually by one or more video processing filter effects. Any of the available effects can be applied to any portion or the entire video stream before and/or after a transition.
Mixing operations include a dissolve technique that generates fading from one moving video image into another, or a transitional wipe that displays two or more video images simultaneously on one output. Effects are filters that process a stream in order to change the stream's characteristic in some way. Some types of effect filters include color correction effects, which change any combination of image characteristics such as brightness, contrast, saturation and color tint, or a distortion filter that may blur or sharpen or enhance the moving image in any way.
When mixing multiple streams of video or adding special effects, most operations require access to the uncompressed image. Because there are no known alternatives to using uncompressed content when implementing edits, many editing applications simply do not offer a real-time preview (i.e., before edits are actually carried out) of editing results. Instead, to view the results of effect or transition editing in motion, the video must first be rendered. Rendering is the process of pre-computing video mixes over time (however long the processing takes), and placing the results back on disk. Video rendering performs the same mixing and effect operations as required by real-time playback; however, the results can no longer be viewed live. As a consequence, the rendering process requires that the resulting video composite must be completely written to disk before it can be viewed at normal playback speeds. In a rendered-only editing environment, the user/editor must wait before being able to view the “edit,” then decide whether it needs to be changed. If it does need to be changed, then more rendering is required; thus, editing in a rendered-only editing environment can be very time consuming.
Some video editing applications alternatively scale an image down to a lower resolution during capture. In this scaling-upon-capture approach, lower resolution video previews can be seen in real-time, enabling the editor to quickly preview most editing actions. The drawback of scaling upon capture in this manner, however, is that the video must be recaptured at full resolution before the edits are actually implemented and the final-quality production can be completed. This approach to video production has been around for decades, and it is commonly called “off-line” editing.
Another approach offered attempts to overcome both the limitations of the rendering-only and scaling-upon-capture approaches by processing full-resolution HD data then resizing the output to SD for mixing and real-time presentation. Although this approach is intended to make use of existing real-time SD equipment to assist in HD editing, the HD-to-SD resizing introduces an additional processing stage after decompression, making this approach unsuitable for software-only solutions. As a hardware solution, this approach is very costly given that it requires either expensive compression chips or a system architecture with enough bandwidth (e.g., hundreds of MBps) and with enough disk capacity to store very large uncompressed HD video files.
To reduce CPU load, some video compression technologies have limited abilities to decode to a lower resolution. Common compression standards such as MPEG, JPEG, and DV, however, must fetch all data for a frame even when decoding to lower resolutions. Although modifications to the decoding procedure allow some reduction in CPU usage, the results do not offer both good image quality and reduced CPU load.
In the context of video editing, high performance of the decoding operation is important because the user/editor needs to view most editing operations at normal speed playback. Once the CPU load exceeds the system capability, however, the playback of the video will stutter or stall, preventing audio synchronization or smooth motion. Other qualities of the moving image are also important to the user/editor, such as subtleties of color shading and image definition that are used for scene selection as well as image correction. Any compromise that trades performance for artifacts, like those seen in quarter resolution DV decoding, will not be desired by the user/editor.
Conventionally, video previews are rendered by processing only the frames possible with the CPU and bandwidth resources available. In such conventional systems the previews typically stutter (non-smooth motion), and although they are not considered real-time, these systems do preserve audio synchronization by computing and presenting some frames at their correct display time. For example, if the current level of processing takes twice as long as it would in a real-time system, a frame will be skipped so that the next frame is displayed at the correct time. In this situation, playback will occur at half the normal frame rate. Stutter is obvious as the interim frames are not presented to the display; these missing frames contain motion information now missing from the final output. This form of preview introduces temporal artifacting, another undesirable characteristic in video production.
Referring to FIG. 1, shown is a screen capture of a typical desktop editing environment. Shown is a bin of source material 100 containing many video sequences, titles, graphics and audio; a timeline of edit decisions 102 where the editor places and reorders source material (mixing them with transitions and effects) a control panel 104 for manipulating the parameters for effects and transitions; and one or two preview windows 106 in which the source and edited output material can be viewed. The size of the video window 106 is dependent on the available screen space, not the resolution of the source image because video is scaled to fit comfortably in the computer's display hardware. This window area is typically the same size for HD as it is for SD video production (although the aspect ratio commonly is different: 16:9 vs 4:3). The resolution of most of today's high-definition frames will not fit within the window of this editing environment, however, so the image is typically scaled down by the display device as part of today's editing process.