Manipulation of video data is often employed in producing commercial films, but is becoming increasingly more important in other applications, including video available via the Internet. One form of video manipulation is the so-called blue screen substitution, which motion picture and television producers use to create composite image special effects. For example, actors or other objects may be filmed in the foreground of a scene that includes a uniformly lit flat screen background having a pure color, typically blue (but sometimes green). A camera using conventional color film or a solid state camera with a sensor array of red, green, blue (RGB) pixels captures the entire scene. During production, the background blue is eliminated based upon its luminance characteristic, and a new backdrop substituted, perhaps a blue sky with wind blown white clouds, a herd of charging elephants, etc. If the background image to be eliminated (the blue screen) is completely known to the camera, the result is a motion picture (or still picture) of the actors in the foreground superimposed almost seamless in front of the substitute background. When done properly, the foreground images appear to superimpose over the substitute background. In general there is good granularity at the interface between the edges of the actors or objects in the foreground, and the substitute background. By good granularity it is meant that the foreground actors or objects appear to meld into the substitute background as though the actors had originally been filmed in front of the substitute background. Successful blue screen techniques require that the blue background be static, e.g., there be no discernable pattern on the blue background such that any movement of the background relative to the camera would go undetected. But the relationship between camera and background must be static for backgrounds that have a motion-discernable pattern. If this static relationship between camera and background is not met, undesired fringing can result, where perimeter portions of the foreground actors or objects appear to be traced with color(s) at the interface with the substitute background.
Blue screen composite imaging is readily implemented in a large commercial production studio, but can be costly and require a large staging facility, in addition to special processing equipment. In practice such imaging effects are typically beyond the reach of amateur video producers and still photographers.
It is also known in the art to acquire images using three-dimensional cameras to ascertain Z depth distances to a target object. Camera systems that acquire both RGB images and Z-data are frequently referred to as RGB-Z systems. With respect to systems that acquire Z-data, e.g., depth or distance information from the camera system to an object, some prior art depth camera systems approximate the distance or range to an object based upon luminosity or brightness information reflected by the object. But Z-systems that rely upon luminosity data can be confused by reflected light from a distant but shiny object, and by light from a less distant but less reflective object. Both objects can erroneously appear to be the same distance from the camera. So-called structured light systems, e.g., stereographic cameras, may be used to acquire Z-data. But in practice, such geometry based methods require high precision and are often fooled.
A more accurate class of range or Z distance systems are the so-called time-of-flight (TOF) systems, many of which have been pioneered by Canesta, Inc., assignee herein. Various aspects of TOF imaging systems are described in the following patents assigned to Canesta, Inc.: U.S. Pat. No. 7,203,356 “Subject Segmentation and Tracking Using 3D Sensing Technology for Video Compression in Multimedia Applications”, U.S. Pat. No. 6,906,793 Methods and Devices for Charge Management for Three-Dimensional Sensing”, and U.S. Pat. No. 6,580,496 “Systems for CMOS-Compatible Three-Dimensional Image Sensing Using Quantum Efficiency Modulation”, U.S. Pat. No. 6,515,740 “Methods for CMOS-Compatible Three-Dimensional image Sensing Using Quantum Efficiency Modulation”.
FIG. 1 depicts an exemplary TOF system, as described in U.S. Pat. No. 6,323,942 entitled “CMOS-Compatible Three-Dimensional Image Sensor IC” (2001), which patent is incorporated herein by reference as further background material. TOF system 10 can be implemented on a single IC 110, without moving parts and with relatively few off-chip components. System 100 includes a two-dimensional array 130 of Z pixel detectors 140, each of which has dedicated circuitry 150 for processing detection charge output by the associated detector. In a typical application, array 130 might include 100×100 pixels 140, and thus include 100×100 processing circuits 150. IC 110 preferably also includes a microprocessor or microcontroller unit 160, memory 170 (which preferably includes random access memory or RAM and read-only memory or ROM), a high speed distributable clock 180, and various computing and input/output (I/O) circuitry 190. Among other functions, controller unit 160 may perform distance to object and object velocity calculations, which may be output as DATA.
Under control of microprocessor 160, a source of optical energy 120, typical IR or NIR wavelengths, is periodically energized and emits optical energy S1 via lens 125 toward an object target 20. Typically the optical energy is light, for example emitted by a laser diode or LED device 120. Some of the emitted optical energy will be reflected off the surface of target object 20 as reflected energy S2. This reflected energy passes through an aperture field stop and lens, collectively 135, and will fall upon two-dimensional array 130 of pixel detectors 140 where a depth or Z image is formed. In some implementations, each imaging pixel detector 140 captures time-of-flight (TOF) required for optical energy transmitted by emitter 120 to reach target object 20 and be reflected back for detection by two-dimensional sensor array 130. Using this TOF information, distances Z can be determined as part of the DATA signal that can be output elsewhere, as needed.
Emitted optical energy S1 traversing to more distant surface regions of target object 20, e.g., Z3, before being reflected back toward system 100 will define a longer time-of-flight than radiation falling upon and being reflected from a nearer surface portion of the target object (or a closer target object), e.g., at distance Z1. For example the time-of-flight for optical energy to traverse the roundtrip path noted at t1 is given by t1=2·Z1/C, where C is velocity of light. TOF sensor system 10 can acquire three-dimensional images of a target object in real time, simultaneously acquiring both luminosity data (e.g., signal brightness amplitude) and true TOF distance (Z) measurements of a target object or scene. Most of the Z pixel detectors in Canesta-type TOF systems have additive signal properties in that each individual pixel acquires vector data in the form of luminosity information and also in the form of Z distance information. While the system of FIG. 1 can measure Z, the nature of Z detection according to the first described embodiment of the '942 patent does not lend itself to use with the present invention because the Z-pixel detectors do not exhibit a signal additive characteristic. A more useful class of TOF sensor systems whose Z-detection does exhibit a signal additive characteristic are so-called phase-sensing TOF systems. Most current Canesta, Inc. Z-pixel detectors operate with this characteristic.
Many Canesta, Inc. systems determine TOF and construct a depth image by examining relative phase shift between the transmitted light signals S1 having a known phase, and signals S2 reflected from the target object. Exemplary such phase-type TOF systems are described in several U.S. patents assigned to Canesta, Inc., assignee herein, including U.S. Pat. Nos. 6,515,740 “Methods for CMOS-Compatible Three-Dimensional Imaging Sensing Using Quantum Efficiency Modulation”, 6,906,793 entitled Methods and Devices for Charge Management for Three Dimensional Sensing, 6,678,039 “Method and System to Enhance Dynamic Range Conversion Useable With CMOS Three-Dimensional Imaging”, 6,587,186 “CMOS-Compatible Three-Dimensional Image Sensing Using Reduced Peak Energy”, 6,580,496 “Systems for CMOS-Compatible Three-Dimensional Image Sensing Using Quantum Efficiency Modulation”.
FIG. 2A is based upon above-noted U.S. Pat. No. 6,906,793 and depicts an exemplary phase-type TOF system in which phase shift between emitted and detected signals, respectively, S1 and S2 provides a measure of distance Z to target object 20. Under control of microprocessor 160, optical energy source 120 is periodically energized by an exciter 115, and emits output modulated optical energy S1=Sout=cos(ωt) having a known phase towards object target 20. Emitter 120 preferably is at least one LED or laser diode(s) emitting low power (e.g., perhaps 1 W) periodic waveform, producing optical energy emissions of known frequency (perhaps a few dozen MHz) for a time period known as the shutter time (perhaps 10 ms).
Some of the emitted optical energy (denoted Sout) will be reflected (denoted S2=Sin) off the surface of target object 20, and will pass through aperture field stop and lens, collectively 135, and will fall upon two-dimensional array 130 of pixel or photodetectors 140. When reflected optical energy Sin impinges upon photodetectors 140 in array 130, photons within the photodetectors are released, and converted into tiny amounts of detection current. For ease of explanation, incoming optical energy may be modeled as Sin=A·cos(ω·t+θ), where A is a brightness or intensity coefficient, ω·t represents the periodic modulation frequency, and θ is phase shift. As distance Z changes, phase shift θ changes, and FIGS. 2B and 2C depict a phase shift θ between emitted and detected signals, S1, S2. The phase shift θ data can be processed to yield desired Z depth information. Within array 130, pixel detection current can be integrated to accumulate a meaningful detection signal, used to form a depth image. In this fashion, TOF system 100 can capture and provide Z depth information at each pixel detector 140 in sensor array 130 for each frame of acquired data.
In preferred embodiments, pixel detection information is captured at least two discrete phases, preferably 0° and 90°, and is processed to yield Z data.
System 100 yields a phase shift θ at distance Z due to time-of-flight given by:θ=2·ω·Z/C=2·(2·π·f)·Z/C  (1)
where C is the speed of light, 300,000 Km/sec. From equation (1) above it follows that distance Z is given by:Z=θ·C/2·ω=θ·C/(2·2·f·π)  (2)
And when θ=2·π, the aliasing interval range associated with modulation frequency f is given as:ZAIR=C/(2·f)  (3)
In practice, changes in Z produce change in phase shift θ although eventually the phase shift begins to repeat, e.g., θ=θ+2·π, etc. Thus, distance Z is known modulo 2·π·C/2·ω)=C/2·f, where f is the modulation frequency.
Canesta, Inc. has also developed a so-called RGB-Z sensor system, a system that simultaneously acquires both red, green, blue visible data, and Z depth data. FIG. 3 is taken from Canesta U.S. patent application Ser. No. 11/044,996, publication no. US 2005/0285966, entitled “Single Chip Red, Green, Blue, Distance (RGB-Z) Sensor”. FIG. 3A is taken from Canesta's above-noted '966 publication and discloses an RGB-Z system 100′. System 100′ includes an RGB-Z sensor 110 having an array 230 of Z pixel detectors, and an array 230′ of RGB detectors. Other embodiments of system 100′ may implement an RGB-Z sensor comprising interspersed RGB and Z pixels on a single substrate. In FIG. 3A, sensor 110 preferably includes optically transparent structures 220 and 240 receive incoming optical energy via lens 135, and split the energy into IR-NIR or Z components and RGB components. In FIG. 3A, the incoming IR-NIR Z components of optical energy S2 are directed upward for detection by Z pixel array 230, while the incoming RGB optical components pass through for detection by RGB pixel array 230′. Detected RGB data may be processed by circuitry 265 to produce an RGB image on a display 70, while Z data is coupled to an omnibus block 235 that may be understood to include elements 160, 170, 180, 290, 115 from FIG. 2A.
System 100′ in FIG. 3A can thus simultaneously acquire an RGB image, preferably viewable on display 70. FIG. 3A depicts an exemplary RGB-Z system 100′, as described in the above-noted Canesta '966 publication. While the embodiment shown in FIG. 3A uses a single lens 135 to focus incoming IR-NIR and RGB optical energy, other embodiments depicted in the Canesta '966 disclosure use a first lens to focus incoming IR-NIR energy, and a second lens, closely spaced near the first lens, to focus incoming RGB optical energy. Referring to FIG. 3A, system 100′ includes an RGB-Z sensor 110 having an array 230 of Z pixel detectors 240, and an array 230′ of RGB detectors 240′. Other embodiments of system 100′ may implement an RGB-Z sensor comprising interspersed RGB and Z pixels on a single substrate. In FIG. 3A, sensor 110 preferably includes optically transparent structures 220 and 240 receive incoming optical energy via lens 135, and split the energy into IR-NIR or Z components and RGB components. In FIG. 3A, the incoming IR-NIR Z components of optical energy S2 are directed upward for detection by Z pixel array 230, while the incoming RGB optical components pass through for detection by RGB pixel array 230′. Detected RGB data may be processed by circuitry 265 to produce an RGB image on a display 70, while Z data is coupled to an omnibus block 235 that may be understood to include elements 160, 170, 180, 290, 115 from FIG. 2A.
FIG. 3B depicts a single Z pixel 240, while FIG. 3C depicts a group of RGB pixels 240′. While FIGS. 3B and 3C are not to scale, in practice the area of a single Z pixel is substantially greater than the area of an individual RGB pixel. Exemplary sizes might be 15 μm×15 μm for a Z pixel, and perhaps 4 μm×4 μm for an RGB pixel. Thus, the resolution or granularity for information acquired by RGB pixels is substantially better than information acquired by Z pixels. This disparity in resolution characteristics substantially affects the ability of RGB-Z system to be used successfully to provide video effects.
FIG. 4A is a grayscale version of an image acquired with an RGB-Z system, and shows an object 20 that is a person whose right arm is held in front of the person's chest. Let everything that is “not” the person be deemed background 20′. Of course the problem is to accurately discern where the edges of the person in the foreground are relative to the background. Arrow 250 denotes a region of the forearm, a tiny portion of which is shown at the Z pixel level in FIG. 4B. The diagonal line in FIG. 4B represents the boundary between the background (to the left of the diagonal line), and an upper portion of the person's arm, shown shaded to the right of the diagonal line. FIG. 4B represents many RGB pixels, and fewer Z pixels. One Z pixel is outlined in phantom, and the area of the one Z pixel encompasses nine smaller RGB pixels, denoted RGB1, RGB2, . . . RGB9.
In FIG. 4B, each RGB pixel will represent a color. For example if the person is wearing a red sweater, RGB3, RGB5, RGB6, RGB8, RGB9 should each be red. RGB1 appears to be nearly all background and should be colored with whatever the background is. But what color should RGB pixels RGB2, RGB4, RGB7 be? Each of these pixels shares the same Z value as any of RGB1, RGB2, . . . RGB9. If the diagonal line drawn is precisely the boundary between foreground and background, then RGB1 should be colored mostly with background, with a small contribution of foreground color. By the same token, RGB7 should be colored mostly with foreground, with a small contribution of background color. RGB4 and RGB2 should be fractionally colored about 50% with background and 50% with foreground color. But the problem is knowing where the boundary line should be drawn. Unfortunately prior art techniques make it difficult to intelligently identify the boundary line, and the result can be a zig-zag boundary on the perimeter of the foreground object, rather than a seamlessly smooth boundary. If a background substitution effect were to be employed, the result could be a foreground object that has a visibly jagged perimeter, an effect that would not look realistic to a viewer.
Thus there is a need for video processing techniques that can employ relatively inexpensive arrays of RGB and Z pixels, and provide video manipulation generally associated with high quality, and greater density arrays. Further there is a need for such techniques that operate well in the real world, even if some Z data is erroneous or not present. Finally, such techniques should operate substantially in real time.
The present invention provides such techniques for manipulating RGB-Z data, including segmentation, up-sampling, and background substitution.