1. Field of the Invention
This invention relates generally to apparatus, methods, and articles used in recording visual information, either for television broadcast or for widescreen theatre exhibition. In the latter case, the exhibition may be by video projection or by transfer to film and use of conventional film projectors.
More particularly, the invention is of apparatus for making video recordings whose information detail corresponds to visual resolution exceeding that of conventional widescreen film as typically exhibited in theaters. The invention further provides apparatus for using such recordings to generate projection film whose resolution betters that of conventional widescreen film typically exhibited. The invention additionally provides methods for making such video recordings and for using such recordings to generate such projection film; and the invention encompasses recordings and film made by these methods.
In the short run the principal application for the invention lies in the preparation of widescreen cinema film, because of the large number of film projectors now in commercial theater use; however, in the longer term the greater volume of application may be expected to be in theater and closed-circuit video projection units, with some application to video broadcast, as will be explained below.
2. Prior Art
a. UNTAPPED POTENTIAL OF VIDEO CINEMATOGRAPHY--The most pressing demand for development of video systems that are quality-compatible with widescreen cinematographic film arises in the area of special effects. "Matte" technology--the process for superimposing portions of two or more moving picture scenes--is extremely cumbersome and slow in the film medium, but quite sophisticated and convenient in the video medium.
U.S. Pat. No. 4,109,278 to Mendrala and Peterson describes a video traveling matte system which eliminates telltale outlines between the different components of a matte composite, while importing even the shadows from the "foreground" image into the "background" image. Such a composite image can be viewed in "real time" while the action is being shot and recorded. The director can be viewing the composite image while giving instructions to the actors and equipment operators involved in the various images. This technique minimizes or eliminates the need for retakes arising from misalignments of the component images. Film matte techniques, on the other hand, require several film-development and other processing steps between the camera and the first viewing of the composite. A director must therefore wait several hours at a very minimum, and more typically one or more days, to see whether the composite "looks right" or whether it must be set up and shot all over again.
The term "special effects," however, may suggest incorrectly an area of use that is overly limited, for the term is usually heard in connection with, for example, showing a number of moving and talking human figures against an intrinsically fictitious background. The background may, for instance, be one that is provided by a separate shot of a model ship burning or skyscraper toppling, or a small animal made to look large, and so forth. Other examples of stereotypical special effects include people appearing to fly (without support) over a city, or to walk into or upon other people (made to appear much larger).
These uses occur, of course, only in a relatively small number of films, namely those which call for happenings that are essentially impossible in actuality. A much broader category of films, however, could be made by using matte techniques at a small fraction of the cost and in a small fraction of the time associated with conventional techniques. These uses are related to portrayal of happenings that are in actuality possible, but "merely" very inconvenient and expensive.
For example, a film that involves a sizable cast and crew on location in Africa, or a remote Pacific island, or a major European city, can involve monumental problems of logistics--housing, feeding, and transportation alone, plus assembly of set-building personnel and their equipment and materials, if structures are required. Nightmarish complexities are introduced when key members of the cast become ill or other unforeseen contingencies arise. All of these factors contribute heavily to the multimillion-dollar cost of typical major cinema productions.
These difficult factors can be virtually eliminated by use of matte techniques: a small crew makes background scenes on location, using locally available "extras" if required and in some cases shooting merely still views; a special-effects crew at the home studio builds models of structures, if necessary, for superposition on these scenes; and when these preparations are made the cast performs its action and recites its lines before a foreground camera while the director watches the fully assembled scene on a monitor. Even if the background scenes turn out to be unsatisfactory, the investment in them is relatively minimal and they can be reshot by a small camera crew later.
The result might be, for example, several key actors running along a beach on a coral lagoon, with breakers rolling in, with an elaborate temple (or a descending dirigible) on the shore in the background --and crowds of native extras running up the beach behind the lead actors.
It would be virtually unthinkable to make such scenes by cinematographic film matte techniques, because of the difficulty of predicting that the action will "assemble" well into the background--and also because of the extreme difficulty of assembling more than two or three elements (i.e., separately-shot scene components) satisfactorily. By video techniques success is virtually guaranteed, and as many as eight elements can be assembled without significant degradation of image quality.
b. REASONS FOR FAILURE TO TAP THE POTENTIAL--Despite these clear advantages, video cameras are almost completely unused in the preparation of cinema productions--outside the efforts, considered highly experimental and radical, of one major director to use video cameras and monitors for rehearsals, and in parallel with film cameras for the final takes and for editing purposes. In this latter system the film seems almost superfluous, but it is retained as the final production medium. The sole reason for all of the waste motion that is still retained in shooting and processing film, in "location" productions as well as special-effects work, is the relatively poor image quality, heretofore, of the video medium--or, as it is sometimes called, electronic photography.
The reasons and rationales for these conventional image-quality limitations will now be discussed in considerable detail. It is important to an appreciation of my invention to understand how thoroughly the conventional limitations are interwoven with each other, and with the broadcast-medium constraints, and how well entrenched they are.
Although film does of course have grain, it is much finer and less conspicuous than the poor resolution that is apparent even on a contemporary twenty-one-inch home video screen. This poor resolution, or image definition, arises from two phenomena:
(1) in the vertical direction, the image elements are constructed of discrete horizontal "scan lines" or "raster lines," of which there are an inadequate number; PA1 (2) in the horizontal direction, although the image is reproduced on a continuous basis (there are no discrete image elements as such, above the molecular level), image detail nevertheless is limited by the information-gathering capacity of the video electronics, which is constrained by the signal bandwidth.
To minimize poor horizontal resolution, "image-enhancement" techniques are used--but such techniques themselves produce peculiarities which are evident when the image is viewed relatively closely. In particular, image enhancement often produces a cartoon-like artificial "outlining" effect.
Perhaps more significant than the poor resolution itself is the familiarly "electronic" way in which dots or lines in the video image seem to migrate or "crawl," sometimes almost like worms making their way along and among the raster lines, in the present state of the video art. Yet another familiar "electronic" defect is the appearance of multiple edges--a fringing or feathering effect--along what should be sharp lines.
(In nontechnical terms, these defects arise as follows. "Crawling dots" are produced by imperfections in the separation of color information from black-and-white information, in the video display device. Crawling "worms," or moire patterns, result from imperfections in the isolation of processed signals ready for recording, in a video recorder, from the incoming video signals. Fringing is sometimes caused by an electronic "ringing" effect in the circuits [delay lines] which match the timing of the black-and-white information to the timing of the color information.)
These characteristics are evident particularly in home video receivers, particularly in prerecorded broadcasts, and particularly in matte images, but somewhat less conspicuously even in single, unitary images, live from the camera and viewed on the best studio monitors.
Finally, the conventional video technology is saddled unavoidably with very coarse resolution as to the color constituents of the image. It is very appropriate to describe the color components of the resultant image as "broad brush"--almost like painting in watercolors with a broad brush the tones needed to make a sharply defined black-and-white photo pass under the description of "color picture."
Several constraints are imposed by the broadcast format and have been unquestioningly accepted as necessary by virtually all workers in video technology. For example, the standard number of raster lines composing each frame, in the United States, is 525. In other countries the number is different, but in every case it is unduly limited for the purposes of obtaining finely resolved detail along the vertical dimension in widescreen cinema.
Furthermore, the number of frames per second, in the United States, is thirty--again, different in other countries, but nowhere in the world equal to the frame rate universally used for widescreen cinema. Cinema film is advanced at twenty-four frames per second (though each frame is flashed twice by projection systems, to avoid visible flicker). This mismatch requires, in the video-to-film transfer process, discarding a fifth of all the video data--an obvious waste of recorded information, at the same time the video-to-film transfer process is starved for image resolution, particularly with respect to the color.
Other format constraints, particularly troublesome ones when some of the most knotty problems of high-image-quality technology are confronted, are the video "interlace" system, and a related phenomenon known as the "phase-advance" color encoding system.
c. INTERLACING, AND PHASE-ADVANCED COLOR--The interlace system involves dividing each frame of (in the United States) 525 raster lines into two "fields" of 262.5 raster lines each. Each of these fields consists of every other line in the image--that is to say, alternate lines. The video camera scans through a complete field of 262.5 lines, while this field is transmitted from the camera (and either recorded or broadcast for viewing); then the camera scans through the next complete field of 262.5 lines, filling in the blank spaces between the lines of the first field. This entire process takes (in the United States) approximately one-thirtieth of a second; thus the individual fields (or "half-frames") are completed in one-sixtieth of a second each.
In the broadcast context this arrangement has various advantages. It produces an effective flicker rate of sixty fields per second, which is fast enough to make effective use of the familiar phenomenon of visual persistence, to blend subsequent images as perceived by the eye and brain; yet it requires forming and transmitting only thirty frames per second. If the thirty frames each second were transmitted as entire frames, not separated into interlaced fields, the flicker would be quite noticeable and objectionable.
Moreover, the eye and brain tend to "integrate" the alternate fields into a single image, alternate fields filling in alternate horizontal raster lines, so that the effective vertical resolution is that of 525 raster lines (except for rapidly moving objects, and particularly those that move vertically); yet only half that number of lines is transmitted in each field. The advantages of this "spatial integration" are so well entrenched in the design philosophy of the video industry that they have been exported from the raster-interlace part of the system to other areas.
In particular, in the United States the standard "NTSC" video system the color signals (to be described in more detail below) are advanced in phase by ninety degrees every field. This arrangement tends to take advantage of spatial integration of the color signal to correct some of the "broad-brush" effect mentioned earlier. Unfortunately this repetitive phase advance introduces fresh problems, which will be described shortly.
The interlaced-field system, as previously noted, aggravates some of the knotty problems of achieving high image quality through electronic photography. As is familiar to television viewers who have watched very-slow-motion "instant replays" of fast action, the image sharpness in very-slow-motion replays is very poor relative to that of the action shown at actual speed. The reason is that in very-slow-motion "instant replays" the action must be shown one field at a time, and there are only half as many raster lines per field as there are per full frame. In other words, the advantage of spatial integration is lost when the images must be shown fieldwise rather than framewise.
Yet it is absolutely necessary to show the fields one at a time, for precisely the reason that they represent different time intervals. This statement bears some explanation. If successive fields were shown in very slow motion but electronically assembled together, as the eye assembles them when they are shown at actual speed, each moving object would have two distinctly different (though overlapping) positions. A rapidly moving runner, for instance, or a traveling ball, would have two different "sub-images, " shifted along the direction of motion, as traced out by the alternate interlaced raster lines. (This phenomenon will be described in a more graphic way shortly.) Such an image would be badly jumbled and confusing.
For cinematographic use it is necessary to be able to slow down action, or even freeze it, without losing either resolution or the single-image appearance. In film technology this is accomplished by exposing the film at a faster-than-normal frame rate, and playing it back at normal rate, or by repeating a particular frame many times. In video, as will be seen, it may be difficult or impossible to speed up the frame-acquisition rate without sacrificing resolution; however, the playback rate may be controllable in a much more versatile way than for film, provided that one can get around the Hobson's choice of resolution-loss or double-image, which is due to the interlace system.
Moreover, in assembling a cinema production it is often desirable to produce a series of identical frames, by duplicating a particular frame many times. This should be easy to do electronically, but the ninety-degree phase-advance system mentioned above, for color signals, makes it difficult or impossible to splice any field directly after itself--i.e., to replicate any particular frame in self-sequence--without sacrificing the benefits of spatial integration as to the color components of the image. Only every fourth field (every other full frame) can be edited into sequence with any particular field, so the entire editing process (even apart from the desire to replicate a particular field several times in succession) is increased in complexity.
The foregoing discussion of the interlace system will now be rendered more definite by reference to FIG. 2 of the appended drawings. FIG. 2 illustrates, in the right-hand half of the drawing, the pattern of raster lines used in the prior art to create a full image on a screen 111. The prior-art "interlacing" system may be understood by examining closely the pattern of lines in frame 111. As shown there, the deflection system of the electrooptical scanning devices within the camera--essentially duplicated by electrooptical scanning devices within every studio monitor or home video receiver--first makes a complete pass over the image while constructing every other "raster" line (scan line), and then "goes back" and makes a second complete pass over the image while filling in the alternate raster lines missed on the first pass.
More specifically, the scan of frame 111 starts at some point such as 112 at the top of the image, scans along a path 113 to the right side of the image frame, and then returns (without transfering any image information) more swiftly along a path such as 114 to the left side of the image frame. The scan then proceeds along a path 115 to the right side of the frame. It will be noticed that the scan paths (raster lines) 113 and 115 straddle an additional path 125 that has not yet been followed, except to cross over it during the flyback path 114.
Next the scan system traverses another flyback path 116 to the left side of the frame, forms another raster line along a yet lower path 117, flies back along another path 118, and continues in this fashion until the lower portion of the image is reached. Of course the number of raster lines in an actual video system is dozens of times greater than the number illustrated in FIG. 2, which is only intended to exemplify the concepts of the prior-art scanning system (and, in the left side of the drawing, the concepts of the present invention, to be discussed later).
As the scan progression reaches the lower part of the image it traces out a path such as 119 near the lower edge of the frame 111, flies back along a path 120, and then begins a final path 121. This final path ends at 122, in effect defining the lower edge of the image that will be constructed. The process so far has created one video "field." From the end point 122 of this last raster line the scan mechanism follows a path 123 back to the upper edge of the frame 111, where at point 124 a new raster line 125 is begun. This is the first line of the second field.
After following the path 125 for this raster line--which as previously noted lies between the raster lines 113 and 115 of the first field--the scan system deflects the scanning beam back along a path 125' to the left side of the frame, then creates another raster line 126, and proceeds as before. Now the alternate raster lines skipped in the first field are filled in, eventually reaching the next-to-last raster line 127 in the frame. After flyback along a path 128 just above the bottom of the frame, the scanning beam is returned as along path 129 to point 112, at the top of the image, to begin another field. That field will be substantially identical to the first one that has been described here, as far as the pattern of raster lines is concerned.
The content of the image, however, will of course be different if the objects in the image have moved in the meantime. This new "first" field will contain all the image information that has accumulated since the previous "first" field was scanned. Next will come a new "second" field, containing all the image information that has accumulated since the previous "second" field was scanned. The time interval during which information is accumulated for each "first" field thus overlaps (by half) the time interval during which information is collected for the two adjacent "second" fields.
In this way the prior-art system constructs a half image at each pass from top to bottom of the received image, then proceeding to construct the other half image by means of raster lines that are "interlaced"--i.e., alternated--with the raster lines of the first pass. The two interlaced passes or fields, representing overlapping image-accumulation intervals, have been thus used in every commercial television system, whether broadcast, cable, or closed-circuit, since the earliest days of video technology. The interlace system far antedates the availability of any means for recording video signals; hence every video recorder that has ever appeared in any marketplace has been adapted to record interlaced signals--just as every color video recorder has been adapted to record encoded color (discussed below).
The interlace system is well-ingrained in commercial broadcast technology for good reasons, which have already been briefly outlined. More specifically, the phenomenon of visual persistence--which makes successive images seem to blend together into a simulation of motion--requires a "flicker" (that is, picture-presentation) rate of at least forty to forty-five images per second. Constructing and transmitting that many full frames per second would have required greater bandwidth than would "fit" within the broadcast band to be assigned each station, to achieve moderately acceptable image resolution (detail).
In any event, the designers of early video systems saw a way to get the benefits of sixty-image-per-second "flicker" while completely scanning the image only thirty times per second. The interlace system provides this benefit, and apparently does so satisfactorily for the purposes of home video--with the exception of the very-slow-motion "instant-replay" situations mentioned earlier.
Where an object is proceeding rapidly from right to left across the upper portion of an actual image that is scanned within frame 111, for example, and the object is tall enough to span raster lines 113, 125, 115, 126, and 117, each joint of the object appears as a line in each field--the line swept out by the object point during the image-accumulation interval. The line generated by each object point is shifted in position from each field to the next, but the lines in any two consecutive fields overlap, because of the overlap of time periods during which they are constructed. Accordingly the entire object will appear in two different overlapping positions if the two fields are--for slow-motion purposes--displayed together.
For example, let it be supposed that the object is near the right-hand edge of the frame 111 when raster lines 113, 115 and 117 are formed--so that the image portion constructed by these raster lines includes a "sub-image" of the object near the right-hand edge. This sub-image is blurred in the direction of motion, because it represents all the positions the object has moved through since the last previous scan of these same raster lines. These three raster lines are all formed relatively close together in time, so the general shape of the object is not badly distorted even though it is blurred because it is moving rapidly.
By the time the deflection system traces raster lines 125 and 126, however, about a sixtieth of a second has elapsed, and the object has moved a significant distance toward the left edge of the frame. Raster lines 125 and 126 therefore construct an image portion which includes a second blurred sub-image of the same object--whether the object is a baseball, a broken-field runner (and his several pursuers and defenders), or a moving vehicle. Because of the overlapping lines generated by each point of the moving object, the blurs of the two sub-images overlap.
When the two sub-images are shown at normal field rate, the spatial integration performed by eye and brain causes the sub-images to blend--into a generally lateral blur. The later sub-image is displayed after the earlier sub-image, so all that is lost is the (relatively poor) sharpness in the fleeting images of the speeding object. At normal field rate, in short, the effect is not noticeable.
The system breaks down, however, for slow-motion or stop-action purposes. The two fields cannot be shown assembled into a single frame, in very slow-motion or stop-action situations, although that would be desirable to display a high-resolution image of the slowed or stopped action. The two sub-images of the ball, the runner (and his colleagues), or the vehicle would both be present in the assembled image. The result would be an overlapped-double-exposure effect, in many cases hopelessly jumbled. Consequently the two fields must be shown separately, in slow motion or stop action, and of course when they are shown that way the image detail as seen on the screen is severely degraded.
This limitation may be acceptable in the narrow confines of sports replays, although many viewers would dispute the acceptability even there. Such a limitation is completely unacceptable, however, in the context of cinema special effects. It is also unnecessary in that context, because in cinema the need for flicker rate exceeding forty or fifty "pictures" per second has been solved by a completely different technique--namely, flashing each film frame onto the viewing screen twice.
This "double-flash" cinema solution, on the other hand, is incompatible with video broadcast standards. Absent a video frame-storage device in every video display set, or conversion to much greater bandwidth--either of which solutions would require retrofitting every video receiver in the world--the frame-storage-and-double-display solution is impossible in the established video technology. It would be impossible even if all the video cameras, receivers, monitors, recorders and other devices could be somehow converted out of interlace scanning format. There is no apparent feasible "way out," for broadcast purposes, and broadcast applications have imposed their limitations upon the availability of video technology for cinema and other applications.
Yet another drawback of the interlace system resides in the difficulties of transfering image information from video recordings to projectable film. As will be seen, video recording technology has now progressed to the point of impressing information upon magnetic tape or discs in extremely narrow tracks, that are spaced apart only by the most minute of distances. When two pieces of image information, for two particular very localized adjacent parts of an image (e.g., the top two raster lines of a full frame) are immediately adjacent each other on a recording medium, accurate construction of the two adjacent image parts from the two immediately adjacent pieces of recorded information is difficult, but relatively feasible.
If, however, the information for the top two raster lines is separated by a significant distance in a tape recording--being spaced apart along the length of the tape by the information required to construct the intervening raster lines of the first field, as in conventional video--then accurate assembly of the top two raster lines with respect to each other poses an additional, artificially created, technological challenge. This difficulty is nonexistent or at least unimportant in the context of video-screen display, where it is natural to reassemble the two fields in time sequence. In transfer of image information to film, however, the interlaced-field technique requires that either (1) the film must be held stationary while the entire two-field frame is reconstructed on the film or (2) at least one-half frame must be held in a storage device so that the film can be moved continuously while the exposing beam is moved crosswise of the film, in noninterlaced format.
The first solution has the disadvantage that the deflection system must operate in both orthogonal directions and the film-advance system must provide extremely precise pin-registered framewise positioning--especially in electron-beam recording, where (as will be explained later) three separate film frames, one for each color, must be superposed in accurate register later to obtain a three-color negative or projectable three-color positive. The second solution has the disadvantage of requiring additional storage and logic devices.
In short, the ubiquitous interlace system of the prior art imposes several disadvantages that hamper application of video technology to cinema generally, and cinema special effects particularly.
d. COLOR ENCODING, AND FREQUENCY ALLOCATIONS--In broadcast video, color is "encoded." That is, the color is not transmitted directly in the form of information elements for the three primary colors, but rather in a code form: one piece of information gives the saturation or intensity of color at each moment, and a second piece of information specifies the hue, or dominant color, at each corresponding moment. Moreover, these two color specifications are not transmitted on two completely separate radio-frequency bands. The color is transmitted by amplitude modulation of a separate "subcarrier" from the black-and-white image--but the color subcarrier and the black-and-white carrier share the same, relatively limited, band of frequencies; and to avoid major interference the color subcarrier itself is suppressed, and the frequency spectrum of its sidebands is limited. It is this limitation which causes the "broad-brush" color effect described earlier.
All of this information in the color-encoded signal, as well as the information in the sound carrier, is constrained within a six-megahertz frequency band for each video broadcast channel. In the pre-color era, black-and-white video transmissions used the entire six-megahertz band for luminance and sound. During the development of the various color video systems in the United States and elsewhere in the world, it was decided to maintain the pre-color-era usability of black-and-white television equipment (that is, primarily, home receivers). To accomplish this, the entire six-megahertz band continues to be used for luminance and sound, and the color information is "interleaved" within the fine-detail (i.e., high-frequency) information of the luminance signal (specifically, between harmonics of the raster-line frequency, near the upper end of the six-megahertz band). The "interleaved" color signal is created in such a way that black-and-white receivers are affected as little as possible by the presence of the color information. Unfortunately there is, however, some residual interference between the color information and the black-and-white information: it is mainly this interference that produces the "crawling-dot" effect (in either color or black-and-white pictures, if color has been transmitted) mentioned earlier.
The video signal direct from the sensing circuits in a video camera is directly related to the brightness of light entering the camera from the portion of the object (the "live" image) being scanned. In a color video camera, three separate video signals are produced--one for each of the independent primary colors to which the human eye responds. The three signals are mutually synchronized, and are simultaneous, and they do correspond to the different color constituents of a single image from which they are derived in common. The term "independent" is not to be interpreted as negating these basic interrelationships between the three signals.
The word "independent" is to be understood, however, as meaning that the modulation content, the information content, of each of the three signals does not depend on the modulation content of the other two; and that, rather, the modulation content of each signal stems only from the respective color content of the received image. Thus, for example, the information derived within the camera by electronic scanning of the received image as viewed through a "primary-blue-color"-transmitting optical filter is used to generate a "blue-information" signal; and similarly for the other two primary colors.
It is of course true that the three optical filters used to generate three such "independent" signals have substantial areas of color overlap, so that portions of the image that are colored a certain particular color may in fact "show up" through all three of the filters, to varying extents, and so generate some signal in response in all three video-camera color channels. It is nevertheless customary in describing color phenomena to refer to such image information, and to the corresponding electrical signals, as "independent primary" colors and signals--provided that the filters are chosen as practical approximations to the color sensitivities of the three independent sets of color receptors believed to operate in the human eye.
When the color sensitivities of the three optical filters, chosen in this way, are also roughly matched by the color outputs of phosphors used in video display screens, then from the "viewpoint" of the human observer the independent color receptors of the eye have been, in effect, extended through the video process and exposed to the scene that is before the video camera. (Unfortunately, in conventional video technology this effective "extension" is very imperfect. Certain parts of my invention may be regarded as improving the accuracy of the three-color video "extension" process.)
The three video signals within the camera are "independent" in the sense that they correspond to the independent mechanisms within the human observer for sorting out color information, and in the customary sense that color is therefore a phenomenon measurable in three independent "dimensions" (e.g., red, blue and green).
It is also important to understand the distinction between the high-resolution, true "primary color" video signals that exist only within the video camera, in the conventional system, and the "transmission primary" signals that are derived from the true "primary color" signals before the image leaves the camera. Only the true primary color signals are derived respectively from primary color information, as described in the preceding paragraphs.
In a conventional video camera, the red, green and blue signals are combined in certain specified proportions to form a luminance (black-and-white brightness) signal, usually assigned the algebraic letter Y; and also are used in conjunction with the luminance to generate certain complex color-information signals. In particular, the luminance signal, the red signal, and the blue signal are combined in certain proportions to obtain a variable I, known as the "in-phase" color transmission primary signal, and another variable Q, known as the "quadrature" (ninety-degrees-delayed) color transmission primary signal. The proportions are such that color saturation information is carried as the length of the vector sum of I and Q; and hue is carried as the angle of that same vector sum (relative to a reference zero angle).
These two signals I and Q are used in conventional video technology to amplitude modulate a video "subcarrier." The modulation produces sidebands, which are added to the luminance signal (while the central frequency of the subcarrier itself is discarded). The resulting video compound signal is handled together, as a superimposed unit, throughout the entire video process downstream from the camera head--until the video monitor or home receiver is reached. The combined signal is described conventionally as one in which the color is "encoded" onto or into the luminance signal, by means of the color transmission primary signals I and Q.
The video monitor or receiver decodes the I and Q signals, and then proceeds to reconstruct from them, in combination with the luminance signal Y, a "broad-brush" approximation to the original separate red, green and blue color primary signals originally generated within the camera. The approximation signals obtained by the reconstruction are used to excite the correspondingly colored phosphors on the video screen.
The discussion now returns to the specifics of the three color primary signals within the video camera. If the object is bright, in a particular primary color, at the particular point being scanned, the resulting signal level (voltage or current) for that primary color is high; at dark points the resulting signal is low. If the entire obJect is a uniform grey, or a uniform color, then each of the resulting primary signals is d.c.--that is to say, unmodulated, but of a voltage or current level corresponding to the brightness, in different color constituents, of the particular shade that is before the camera. (This discussion of "d.c." signals of course refers only to the portion of the video waveform which carries image information--not to the complicated pulses, occurring between the raster lines, that convey synchronization and phase-calibration information.)
Most of the time, of course, video cameras are not used to make pictures of uniformly colored fields. There are variations in the object field--the visual field at which the camera is pointed--and these can range from coarse gradations of shading to extremely fine, intricate details of complex objects or patterns.
The primary sensor in a video camera is potentially extremely fast in its response to changes in input brightness--although, as will be seen, in practice some of the potential response speed is discarded because it cannot be used by the other equipment in a conventional video system. The usual sensor is an electron beam that scans a photosensitive surface and varies in current intensity in response to changes in accumulated charge on the photosensitive surface.
The photosensitive surface becomes charged in response to impingement of light, and is discharged--in preparation for further optical signals--by the scanning electron beam. The inherent response speed of such a device is limited primarily by such factors as the internal capacitances within the relatively large vacuum tube that houses the scanning beam. Such capacitances are quite small. Accordingly the current in the electron beam is capable of changing with extreme rapidity when it traverses a very fine pattern of illumination on the photosensitive surface.
Another way of expressing this is to say that the signal from the primary sensor is very "wideband"--that is, it can be conceptualized as made up of many sinusoidal signals whose frequencies extend over an extremely wide band or range of frequencies. When the primary video "level" (current or voltage) changes in response to a very fine, intricate object or pattern, the frequency content of the signal level is extremely high; it can, in fact, contain signal components whose frequency greatly exceeds fifteen or twenty megahertz. Yet, as stated above, when the primary video "level" (current or voltage) is derived in response to a completely uniformly colored object field, the image portion of the signal is d.c. All frequencies between these two extremes are possible in a video sensing tube.
In conventional preparation for broadcasting or recording video signals, the primary color signals are encoded, bandwidth-limited, and applied to modulate at least one locally generated "carrier" signal. The first of these steps, creating the three signals Y, I and Q, has already been described; the other two steps will be briefly explained now.
The two color signals are separately limited in frequency bandwidth, or range, by passing them through two different low-pass filters, respectively. The two filters cut off at about 1.3 and 0.5 megahertz, respectively. As will be seen, all three signals Y, I and Q are further bandwidth-limited in the broadcasting process.
The upper two diagrams in FIG. 3 illustrate the frequency response, or the frequency range, of certain video signals involved in the prior art. All five of the diagrams in FIG. 3 are drawn with reference to the frequency scale 156 which appears across the very bottom of the drawing. The numerical values forming this scale are in megahertz. With respect to the topmost drawing, the frequency values represent frequency above the bottom of an assigned television broadcast channel. With respect to the other four drawings in FIG. 3, the frequency values represent absolute frequency, zero being d.c.
The uncalibrated vertical scale 157 represents carrier amplitude, as a function of frequency, for the amplitude-modulated signals shown in the topmost diagram, and for the four frequency-modulated carrier curves 168, 174, 184 and 194, which appear at the right ends of the lower four diagrams in FIG. 3. The same uncalibrated vertical scale 157 represents video "signal level" (rather than carrier amplitude) for the other four curves 158/159, 171/172, 181/182 and 191/192, at the left ends of the lower four diagrams.
When the immediate objective is to broadcast, rather than record, video information, the luminance signal is used to amplitude modulate the main video carrier. The carrier is a locally generated a.c. signal, 152 in FIG. 3, whose frequency (in the absence of modulation) is 1.25 megahertz above the low end of the frequency band assigned to the particular broadcast channel in use. By "amplitude modulate" is meant that the luminance signal voltage is made to vary the amplitude--the amount of overall voltage swing, or the height of the "envelope"--of the initially sinusoidal carrier. This is precisely the same kind of process that is used in the familiar radio "AM" broadcast band.
The color signals I and Q are both similarly applied to amplitude modulate another locally generated a.c. signal, 153 in FIG. 3, whose frequency (in the absence of modulation) is approximately 3.58 megahertz above the low end of the broadcast band. This second a.c. signal, known as the "color subcarrier," is amplitude modulated by both the I and Q signals--but these two signals are applied ninety degrees out of phase with each other, and can be separated out again later by appropriate equipment in video monitors and receivers.
In addition to the filters used to prepare the color signals for modulating the color subcarrier, the overall frequency range in a broadcast signal is also limited by filters applied to the signal before it is broadcast. One of these cuts off the low-frequency end of the modulated main carrier, starting about 0.5 megahertz above the low end of the broadcast channel--as indicated at 151 in FIG. 3. The other end of the main-carrier frequency band is cut off starting at about 5.25 megahertz above the low end of the channel, as indicated at 154 in FIG. 3.
Although the main carrier is amplitude modulated, the effect of amplitude modulating a carrier of fixed frequency with a wideband video signal level is to inject into the carrier a broad range of signal frequencies corresponding to the sums and differences of the carrier frequency and the video signal-level instantaneous frequencies. As drawn in FIG. 3, the difference-frequency signal components extend toward the left from the carrier frequency 152, and the sum-frequency signals extend toward the right. At least one of these two "sidebands" must be transmitted relatively intact, to convey the detailed picture information. It is redundant to transmit both, however, so the low-frequency end is discarded by the cutoff function which starts at 151 in FIG. 3 and proceeds along the curve 161. This cutoff and the carrier frequency 152 may then be placed near the low end of the broadcast channel, allowing a relatively wide range of frequencies to be included in the amplitude-modulated carrier in the sum-frequency sideband--extending to 154 in FIG. 3.
The amplitude-modulated main carrier and the sidebands of the color subcarrier are added together, and the high-frequency ends of both signals are cut off before (or in the process of) broadcasting--starting at point 154 in FIG. 3 and following curve 166, in effect to zero before the audio-channel frequency 155 is reached. This leaves roughly a 0.25-megahertz guard band before the start of the next broadcast channel at 6 megahertz on the FIG. 3 scale.
In the encoded-color broadcast system just discussed, approximately four-megahertz upper sidebands are available in each video channel for conveying image detail. As to the transmission of color information, not only is signal I limited to about a 1.3-megahertz bandwidth, and Q to about a half-megahertz bandwidth, but in addition both are subject to some distortion wherever there is fine detail in the black-and-white image (producing black-and-white signal frequency components near the color subcarrier).
Even in the absence of the color bandwidth limitations, and even if the color subcarrier were placed further from the primary carrier, the processing and reprocessing of the primary color signals in the formation of transmission primaries and the later decoding would still give rise to some color distortion and other adverse phenomena. A certain amount of nonlinearity is unavoidable, and radio-frequency feedback, crosstalk and other forms of interference may be expected in any practical circuit.
All of these various phenomena are particularly undesirable in the context of special effects and/or widescreen cinema, where it is desirable to have crisp, high-resolution, accurate-color images at all stages of the production process. This is particularly essential when two or more image elements are to be combined to make a composite. In that particular context, "broad-brush" color in any of the component images can produce very conspicuous peculiarities in the composite, at the boundaries between the component images.
The second drawing in FIG. 3 illustrates the corresponding conventional technology used in so-called "high band" video recording. The illustration applies directly to monochrome recording, with elaborations relating to color encoding for recording that need not be described here. In the prior art, frequency responses of camera chains and recorder are essentially uniform or "flat" from d.c. to approximately four megahertz--as shown at 158 in FIG. 3. Just above that frequency a low-pass filter in the input stages of the recorder cuts off the frequency response of the apparatus along curve 159.
The recorder uses an internally generated carrier of frequency approximately 7.85 megahertz, shown at 167 in FIG. 3. The video signal constrained within frequency-response curve 158/159 is used to frequency modulate the carrier, producing a frequency-versus-signal-level relationship that is standard in the industry. Very recent experimental results have been reported involving video recorders or recorder components operating at much higher frequencies; however, these devices are not perfected, and their prospective applications differ from those of my invention.
The so-called "sync tip" part of the video-signal waveform is caused to modulate the carrier frequency downward to seven megahertz, as shown at 160 in FIG. 3: this corresponds to the synchronization signals superimposed within the video camera after the end of each raster-line signal waveform. The so-called "blanking" part of the waveform corresponds to no frequency modulation at all--that is, the original frequency of the carrier is undisturbed. This represents "black" image regions. The so-called "peak white" or maximum-brightness signal level in the video signal is caused to modulate the carrier frequency upward to twenty megahertz, indicated at 169 in FIG. 3. The frequency-modulated carrier thus ranges along the frequency envelope 168 in FIG. 3, between sync tip at 160 and peak white at 169.
The video signal cutoff at 159 prevents high-frequency components of the video signal, in the same frequency range as the modulated carrier 160/168/169, from interfering with the latter, heterodyning to cause conspicuous "crawling-worm" interference patterns or moire patterns in the displayed video image. Only the upper sidebands (not illustrated) of the frequency-modulated carrier are recovered for retrieval of the recorded signal information, again to minimize the possibility of reproducing interference patterns due to interaction between the lower sidebands of the modulated carrier and the original video signal.
Comparison of the first two portions of FIG. 3 that have now been discussed reveals an interesting fact of the prior-art approach to video recording technology. The passband of the raw video signal is about four megahertz. This value was chosen undoubtedly because there was no point in incurring the added expense of high-frequency circuits whose upper-frequency components would never be "seen"--since the broadcast cutoff curves 165 and 166 would discard any components more than 4.0 to 4.5 megahertz above the carrier frequancy 152. Thus the broadcast channel-width limitations, translated into standard transmitter bandpass, have been exported into the video recording field to impose an arbitrary constraint that interferes with feasibility in video cinematography applications.
e. SUMMARY; AREAS OF APPLICATION--The foregoing discussion has identified various imperfections in the prior-art video technique--poor resolution (especially with respect to color), electronic "crawl," and in very-slow-motion work the impossible choice between double-imaging and half-resolution.
Such imperfections may readily be disregarded in watching the evening news on a home video receiver, particularly if a twenty-one-inch screen is viewed from across the room (six to ten feet away, for instance, yielding a subtended visual angle of perhaps ten to fourteen degrees). In a motion-picture theater, however, such imperfections are not likely to be ignored on a ninety-foot screen viewed from a mid-orchestra seat (50 to 150 feet away, for instance, yielding a visual angle of 35 to 100 degrees).
It is the conventional wisdom of the television industry that these limitations are inherent in and fundamental to the video technology. Some industry leaders may feel that the way to increased acceptance of video techniques in the making of motion pictures is simply patience--while the lowering expectations of the viewing public and the skyrocketing costs of cinematographic techniques approach each other, to finally meet in a state of mind where a somewhat-improved video image will be considered acceptable for some special-effects use in widescreen cinema.
Be that as it may, these conventional wisdoms have prevailed to the extent that video signals are now formatted to broadcast standards in the cameras, and are processed through all subsequent stages of recording, fades, dissolves, title superposition, and all editing, in these very limited broadcast-imposed formats. The single exception to this general practice is matte-compositing--but only because that process, by its nature, requires independent color input signals. Matte outputs are promptly encoded.
The broadcast-imposed standards of the television industry have tended to thwart application of the time-saving and money-saving video technology to a major potential field of application--namely, cinema productions. The incompatible frame rate, inadequate bandwidth, inadequate number of raster lines, interlaced fields, phase-shifting of color signals, and most of all the within-the-camera encoding of color information, interleaved within the upper registers of the luminance information, all combine to stymie efforts at real qualitative improvement in image quality for widescreen cinema--or even for home viewing.
Despite all these disadvantages, at least one service firm is involved in a commercial venture of converting encoded-color video to cinema film. That firm is believed to use an electron-beam recorder for transforming information for encoded-color video tape to a black-and-white film intermediary, one film frame for each primary color of each image. Each set of three frames of the intermediate is then printed with appropriate colored lights onto color negative stock. This system presumably involves backing up the video tape and playing each frame three times--each time decoding a different primary color signal--though other systems, involving intermediate video records or temporary frame storage, can be imagined.
In any event, the image quality is severely limited--by all of the factors already discussed, plus problems of registration on the film as between the different colors, and the discarding of one-fifth of such information-gathering capability as does exist. The latter wastage of information arises from the need to disregard every fifth frame, to accommodate the different frame rates of video and cinema.
At least some commercial electron-beam recorders have been provided with a "spot-wobbler" circuit that superimposes a small, very high-frequency a.c. signal on the vertical deflection signal. The result is to blend, to some extent, adjacent raster lines in the film image.
The activity of the above-mentioned service-firm venture despite its technical limitations demonstrates the enormous untapped potential for truly high-image-quality electronic photography.
These artificial, broadcast-imposed limitations also have other unnecessary adverse effects. For one, they impose image-quality deficiencies upon the viewing of commercially prepared video tapes on home viewers, and in commercially operated video-projection theaters. There is clearly no reason for such viewing to suffer the frequency-band limitations imposed by the limited number of frequencies in the broadcast spectrum, any more than the cinematographic industry should suffer such limitations.
For another, the broadcast-imposed limitations impose image-quality deficiencies upon the viewing of movies and other special features in homes that receive cable television service.
Thirdly, the broadcast standards impose image-quality deficiencies upon the preliminary, prebroadcast microwave-link transmission of television programs between television stations, between station and transmitter, and so on. Likewise they impose deficiencies upon the handling of television images in studio processing--that is, in recording, matte compositing, editing, title superposition, and the like. Home television images would be somewhat improved if the imposition of these deficiencies could at least be avoided until the final stage of signal handling just before general broadcast.
Fourthly and finally, the broadcast standards impose these same image-quality deficiencies upon video signals that are broadcast to individual home receivers. There is great debate at present over the formatting of television signals for broadcast via satellite direct to one-meter "dish" home antennas in very large regions. These developments, some proposed and some apparently under way, offer what may be a final opportunity to rethink both broadcast standards and the design of a new generation of home reception equipment. With wall-sized television screens in the offing, and an unallocated band of frequencies available for wide regional use by satellite transmission, the technical requirements for realization of qualitative improvements in image quality should be given a careful examination.
The present invention is addressed to cinema applications and also to the four video applications enumerated just above. It will be apparent that some of the features of the invention are inapplicable to certain of these applications, but in general nearly all the features of my invention do have broad applicability in all of these areas. The primary and broadest object of my invention is to produce video image quality that is essentially equal to or better than what is now shown on motion-picture wide screens.