This invention relates generally to video production, photographic image processing, and computer graphics, and, more particularly, to a multi-format digital video production system capable of maintaining the full bandwidth resolution of the subject material, while providing professional quality editing and manipulation of images intended for digital television and other applications, including digital HDTV programs.
As the number of television channels available through various program delivery methods digital TV (DTV) broadcasting, cable TV, home video, broadcast, etc. continues to proliferate, the demand for programming, particularly high-quality HDTV-format programming, presents special challenges, both technical and financial, to program producers. While the price of professional editing and image manipulation equipment continues to increase, due to the high cost of research and development and other factors, general-purpose hardware, including personal computers, can produce remarkable effects at a cost well within the reach of non-professionals, even novices. As a result, the distinction between these two classifications of equipment has become less well defined. Although general-purpose PC-based equipment may never allow professional-style rendering of images at full resolution in real-time, each new generation of microprocessors enables progressively faster, higher-resolution applications. In addition, as the price of memory circuits and other data storage hardware continues to fall, the capacity of such devices has risen dramatically, thereby improving the prospects for enhancing PC-based image manipulation systems for such applications.
In terms of dedicated equipment, attention has traditionally focused on the development of two kinds of professional image-manipulation systems: those intended for the highest quality levels to support film effects, and those intended for television broadcast to provide xe2x80x9cfull 35 mm theatrical film quality,xe2x80x9d within the realities and economics of present broadcasting systems. Conventional thinking holds that 35 mm theatrical film quality as projected in theaters is equivalent to 1200 or more lines of resolution, whereas camera negatives provide 2500 or more lines. As a result, image formats under consideration have been directed towards video systems having 2500 or more scan lines for high-level production, with hierarchies of production, HDTV broadcast, and NTSC and PAL compatible standards which are derived by down-converting these formats. Most proposals employ progressive scanning, although interlace is considered an acceptable alternative as part of an evolutionary process. Another important issue is adaptability to computer-graphics-compatible formats.
Current technology directions in computers and image processing should allow production equipment based upon fewer than 1200 scan lines, with picture expansions to create a hierarchy of upward-converted formats for theatrical projection, film effects, and film recording. In addition, general-purpose hardware enhancements should be capable of addressing the economic aspects of production, a subject not considered in detail by any of the available references.
For the first fifty years of television in the United States, the history shows continuous development and improvement of a purely analog-based system for video production broadcasting. The nature of the NTSC system is to limit the video bandwidth to 4.2 MHZ, which corresponds to approximately 340 TV-lines of resolution. In countries where PAL or SECAM systems are employed, the bandwidth is 5.5 MHZ, which corresponds to approximately 440 TV-lines of resolution.
During the past ten years, digital processing has become the standard for video production equipment. However, to preserve compatibility with existing equipment and standards, the video bandwidth typically has been limited to 4-6 MHZ (for NTSC and PAL applications, respectively). This also has tended to reduce the apparent generation loss during video production steps.
In the past five years or so, digital image compression technology has matured greatly. Furthermore, there are many incompatible standards, such as the different forms of JPEG systems, the Quick-Time system, MPEG-1, and the numerous forms of the MPEG-2 standard. In addition, the latest recording formats for video production have introduced a new set of variations, including the xc2xc-inch DVC-formats from Sony and Matsushita. While the signal deterioration characteristics of multi-generation analog-based production systems are well known, those imperfections resulting from diverse-format digital video compression and the conversions between these formats can be just as troublesome and unpredictable. In practice, these repeated steps of analog-to-digital (A/D) conversion and digital-to-analog (D/A) conversion, as well as data compression and decompression, introduce many signal artifacts and various forms of signal noise. Although digital video production promises multiple-step production processes free of generation losses, the reality is different, due to the repeated steps of A/D and D/A conversions, as well as data compression and decompression, present when utilizing the various incompatible image data compression formats.
Meanwhile, during the last twenty years, camera technology has advanced to a point far surpassing the performance of traditional production equipment. The video bandwidth capability has increased from 4.2 MHZ (corresponding to 340 TV-lines of resolution) to approximately 12 MHZ (corresponding to nearly 1000 TV-lines of resolution). Because of the limitations of conventional broadcast and production equipment, most of the detail information produced by today""s high-performance camera systems is lost.
For HDTV systems, one goal is to produce images having approximately 1000 TV-lines of resolution per picture height, which requires a bandwidth of approximately 30 MHZ. This, in turn, raises a new problem in terms of signal-to-noise ratio. While conventional broadcast cameras can produce signals having a S/N ratio of 65 dB, utilizing 10-bit digital processing, HDTV cameras typically produce signals having a S/N ratio of 54 dB, and utilize only 8-bit digital processing. In addition, the typical HDTV camera utilizes a 2 Megapixel CCD, in which the elements are approximately one-quarter the size of conventional broadcast cameras. This translates into a much lower sensitivity (a loss corresponding to 1-2 lens f-stops), higher levels of xe2x80x9csmearingxe2x80x9d, and lower highlight compression ratios.
Analog-based HDTV systems, such as the Japanese MUSE system, do not approach the design goal of 1000 TV-lines. In reality, only one quarter of the picture information is transmitted. Although the nominal reduced luminance bandwidth of 20 MHZ provides approximately 600 TV-lines of resolution per picture height in static program material, this resolution is drastically reduced to only 450 TV-lines where motion is occurring. The chrominance bandwidth is even further reduced by the sub-sampling scheme, to 280 TV-lines for the I-signal and 190 TV-lines for the Q-signal (in static scenes), and to 140 TV-lines for the I-signal and 50 TV-lines for the Q-signal (in moving scenes). Although this system provides a wide-screen aspect ratio of 16:9, it does not really qualify as a High-Definition Television System.
Because of the aforementioned compatibility issues, it is clear that conventional video recorders cannot match the technical performance of modern camera systems. Although xe2x80x9cD-6 formatxe2x80x9d digital recorders are available, the cost and complexity of such equipment place these units beyond the means of the vast majority of broadcast stations. Furthermore, the capability of conventional switchers and other production equipment still fail to match that of available camera systems.
Other recorders have been produced, such as the one-half-inch portable recorder (xe2x80x9cUni-HIxe2x80x9d), but this system only achieves 42 dB signal-to-noise ratio, and records in the analog domain. These specifications render this unit unsuitable for multi-generation editing applications. Furthermore, the luminance bandwidth is only 20 MHZ, corresponding to approximately 600 TV-lines of resolution.
W-VHS (xe2x80x9cWideband-VHSxe2x80x9d) recorders provide a wide aspect-ratio image, but only 300 TV-lines of resolution, which also renders this unit unsuitable for any professional applications. Other distribution formats (such as D-VHS) require the application of high compression ratios to limit the data-rate to be recorded, so these formats only achieve W-VHS quality (less than 400 TV-lines of resolution).
The newly-introduced HD Digital Betacam format (HDCAM) video recorder utilizes a 3:1:1 digital processing system rather than the 4:2:2 processing. However, it has a 24 MHZ luminance bandwidth corresponding to 700 TV-lines of resolution, and a narrower chrominance bandwidth. Although this system is clearly superior to any existing analog HDTV recording system, it still falls short of delivering the full resolution produced by an HDTV digital camera. Because of its proprietary image data compression format, the production process results in repeated data compression and decompression steps, as well as A/D and D/A conversions, which, in turn, results in many signal artifacts and various forms of signal noise.
In summary, the conventional technology for these markets utilizes professional cameras having a 30 MHZ bandwidth, and capable of 1000 TV-lines of resolution. However, they produce quality levels more characteristic of consumer-grade equipment (in terms of resolution and signal-to-noise ratio). In addition, the price of these systems is cost-prohibitive both on an absolute and also a cost/benefit basis, employing digital systems which produce only analog-type performance.
The present invention takes advantage of available general-purpose technology, where possible, in order to provide an economical multi-format digital video production system. In the preferred embodiment, specialized graphics processing capabilities are included in a high-performance personal computer or workstation, enabling the user to edit and manipulate an input video program and produce an output version of the program in a final format which may have a different frame rate, pixel dimensions, or both. An internal production format is chosen which provides the greatest compatibility with existing and planned formats associated with HDTV standard 4:3 or widescreen 16:9 high-definition television, and film. For compatibility with film, the frame rate of the internal production format preferably is 24 fps (for program materials originated in film format) and 48 fields-per-second (for live program materials such as sporting events). Images are re-sized horizontally and vertically by pixel interpolation, thereby producing larger or smaller image dimensions so as to fill the particular needs of individual applications. Frame rates are adapted by inter-frame interpolation or by traditional schemes, including xe2x80x9c3:2 pull-downxe2x80x9d for 24-to-30 fps conversions. Simple speed-up (for 24-to-25 conversions) or slow-down (for 25-to-24 conversions) for playback, or by manipulating the frame rate itself using a program storage facility with asynchronous reading and writing capabilities. The step of converting the signal to a HDTV format is performed by a modified upconversion process for wideband signals (utilizing a higher sampling clock frequency) and a resizing to HDTV format frame dimensions in pixels.
The invention preferably incorporates one or more interface units, including a standard/widescreen interface unit operative to convert the video program in the input format into an output signal representative of a standard/widescreen formatted image, and output the signal to an attached display device. A high-definition television interface unit is operative to convert the video program in the input format into an output signal representative of an HDTV-formatted image, and output the signal to the display device. A centralized controller in operative communication with the video program input, the graphics processor, and an operator interface, enables commands entered by an operator to cause the graphics processor to perform one or more of the conversions using the television interfaces. The present invention thus encourages production at relatively low pixel dimensions to make use of lower-cost general-purpose technology and to maintain high signal-to-noise ratio, and then subsequently expands the resultant image into a so-called up-converted program. This is in contrast to alternative approaches, which recommend operating at HDTV-type resolution, then down-converting, as necessary, to smaller image formats. This has led to the use of expensive dedicated hardware, the need for which the present invention seeks to eliminate. In addition, the flexible storage and playback facilities allow extensive control of the playback of the program material, enabling frame rate adjustments and alterations, and providing for time-shifting of the start and end points of the program reproduction in those cases wherein direct control of the source material frame rate is not practical, due to physical separation of the equipment or multiple reception points simultaneously producing outputs at different frame rates from the same source signal playback data stream. In commercial implementations, the invention readily accepts and processes enhanced information, such as pan/scan information or identification information to restrict viewing based on regional or geographical marketing plans.
The method and associated technology provide for maintaining the original high bandwidth of conventional cameras (up to 15 MHZ, which corresponds to more than 600 TV-lines of resolution-per picture height for 16:9 aspect ratio) and provide optimized compression techniques to fully utilize the available capacity of general storage media, such as the commercially available Panasonic DVCPRO, DVCPRO50, Sony DVCAM, JVC Digital-S, and Sony Betacam SX recorders. The system preferably employs a consistent compression scheme utilizing only intra-frame compression (such as Motion-JPEG-type systems, systems used in DV-format recorders, MPEG-2 4:2:2P@ML) throughout the entire production process. This avoids many signal artifacts, ensures high signal-to-noise ratios, and provides for editing the program material in data-compressed format. This enables the system to preserve the original camera capability of 600+ TV-lines of resolution per picture height, and with 4:2:2 processing provides a chrominance bandwidth of up to 7.5 MHZ. Utilizing 10-bit processing results in 65 dB signal-to-noise performance and improved camera sensitivity (rating of f-11). In contrast, available and proposed systems for HDTV are based on 8-bit processing, and offer performance of less than 54 dB signal-to-noise ratio and camera sensitivity rating of only f-8.
The invention provides for optimization of the available storage media as well. Utilizing hard-disks, optical discs (such as DVD, DVD-R, and DVD-RAM), magneto-optical discs, or digital tapes (such as DAT-format, DVC, DVCPRO, DVCPRO50, DVCAM, Digital-S, or 8-mm format) the data-rate to be recorded is nearly one-quarter that of conventional HDTV systems, and consumes only 20 GB of storage space to record more than 60 minutes in the Production Format compression scheme, which utilizes a data-rate of 50 Mb per second or less, which is well within the capabilities of certain conventional recording devices. Horizontal and vertical pixel-interpolation techniques are utilized to quadruple the image size, preferably resulting in an image frame size of 1920xc3x971080 pixels. The resulting program information may then be distributed in a conventional compression format, such as MPEG-2.
Three alternative image frame sizes preferably are suggested, depending on the intended application. For general usage, an image frame size of 1024xc3x97576 is recommended. As an option, a frame size of either 1280xc3x97720 or 1920xc3x971080 may be utilized, at 24 frames-per-second. A sampling frequency of up to 74.25 MHZ for luminance is utilized for 1920xc3x971080. Sampling frequencies of up to 37 MHZ are preferably are utilized for 1024xc3x97576 and 1280xc3x97720. Chrominance components preferably are sampled consistent with a 4:2:2 system, and 10-bit precision is preferred.
The technology of display devices and methodology has progressed as well, offering alternative features such as conversion of interlaced signals to progressive scan, line doubling, pixel quadrupling, and improved general techniques for horizontal and vertical pixel interpolation. Availability of these features as part of display devices will simplify the process of implementing multi-format digital production.