1. Field of the Invention
This invention relates to the improved performance of occlusion processing devices for inserting realistic indicia into video images.
2. Description of the Related Art
Electronic devices for inserting electronic images into live video signals, such as described in U.S. Pat. No. 5,264,933 by Rosser, et al., have been developed and used for the purpose of inserting advertising and other indicia into video, such as sports events. These devices are capable of seamlessly and realistically incorporating logos or other indicia into the original video in real time, even as the original scene is zoomed, panned, or otherwise altered in size or perspective.
Making the inserted indicia look as if they are actually in the scene is an important, but difficult, aspect of implementing the technology. In particular, if an object in the foreground of the actual scene moves in front of the plane in which the inserted object is being positioned, the foreground object must be made to obscure, or partially obscure the insertion in a realistic manner. This aspect of live video insertion technology is termed occlusion processing and has been the subject of previous patent applications.
In U.S. Pat. No. 5,264,933, Rosser et al assumed that a modified version of existing chroma-key or blue screen technology, as described by Kennedy and Gaskins of NBC in J. Soc. Motion Picture and Television Engineer, December 1959, pp 804-812 and in U.S. Pat. Nos. 4,100,569 and 4,344,085 and commonly used in television weather forecast programs, would be sufficient to deal with this problem. For a small subset of the intended applications, such as indoor tennis, this is correct. However, chroma-key technology has limitations which restrict the range of applications.
Existing chroma-key or blue screen technology requires a specific range of chroma values to key off, usually a very specific range of blues, and requires that the foreground object does not have any portion with that range of blue values. Inside a studio, with controlled lighting, the range of colors that are keyed off can be very limited. For optimum performance, very specific blues are used in the background and the lighting has to be well controlled. In order to be able to compensate for non-uniformities in the backing color and luminance, the lighting has to be kept constant, as discussed in detail in Valaho's U.S. Pat. No. 5,424,781 "Backing Color and Luminance Nonuniformity Compensation for Linear Image Compositing", which uses stored correction factors developed by comparing the RGB video obtained from the backing before the subject is put in place, with the ideal values of the RGB that would have been obtained from a perfect backing. These correction factors correct the RGB video when the scene is scanned with the subject in place. The technology is usable in the controlled light environment of an indoor sports event, especially if there is freedom to paint the insertion region an exact color. However, in an outdoor sports event changing light conditions are inevitable and simple chroma-key technology would require a wide range of excluded colors. One would both have to alter the color of the insert region of the stadium and make sure that the players involved in the game were not wearing any of the broad range of colors needed to key off. These restrictions severely limit the outdoor applications of a video insertion system relying solely on chroma-key technology for occlusion. The improvements over simple chroma-key technology that are required to make the system robust and usable in a wide variety of variable lighting conditions are the subject of this patent.
Rosser et al. also discuss the possibility of motion tracking of objects or parts of objects. This is possible in events such as auto-racing where the foreground objects are in constant motion. A major difficulty of this approach is events in which the foreground objects are stationary for any length of time, as occurs in a large number of sports.
Other parties have tackled the occlusion problem in other, less robust, ways.
In U.S. Pat. No. 5,353,392, Luquet et al. discuss a method and device for modifying a zone in successive images in which the occlusion is accomplished by having a stored representation of the target panel. This stored representation is then subjected to three transforms--geometric, color/shade and optionally one related to the modulation transfer function. They take an ideal case of the target zone and reposition it to have the correct pose in the current image by a geometrical transform, then make adjustments to the color of the repositioned target panel based on a mathematical color transform. The color transform is not clearly described in the patent, but appears to consist of observing the color changes of a selected number of well defined points and then doing a linear interpolation in color space to alter the color values of all other pixels. The decision on whether the insertion is occluded or not is then done on a pixel by pixel basis, where it is assumed that any pixels of the transformed stored representation that differ from the current scene represent something other than the target panel, i.e. they are an obstacle interposed between the camera and the panel. That set of points that are different constitute a mask associated with the obstacle. The inlay or insertion is then not done in the region of that set of points, on a pixel by pixel basis, for opaque objects, or that set is used to modify the pattern before inlaying, on a pixel by pixel basis, for semi-opaque objects.
Luquet et al. also discuss the possibility of time filtering the color transform to reduce noise, and the possibility of using cameras with at least four spectral bands, rather than the conventional three. The fourth band they propose would be in the near infrared frequency spectrum.
The main problem with the approach suggested by Luquet is that it relies on pixel accurate warping, both for the geometric image and as the starting point for the color transform. In any practical system there are bound to be errors, especially if the insertion system is well down stream of the originating camera, due to noise in the video and other practical issues such as lens distortions. The method of adaptive occlusion with a simple synthetic reference image that is the subject of the present patent application offers a robust alternative that does not require special cameras or the delays associated with time averaging.
Sharir and Tamir, in their PCT application WO 95/10919, discuss an apparatus and method for detecting, identifying and incorporating advertisements in video in which the occlusion is done in the following steps:
1. Subtract sign image in the video field from its perspective transformed model PA1 2. Filter internal edge effects from difference image PA1 3. Identify large non-black areas in difference image as occlusion areas PA1 4. Temporally smooth occlusion map
Additionally, in replacing the image, the application proposes implementing anti-aliasing procedures. Sharir and Tamir also talk of using motion detection to identify objects that move from the background, or to use texture and geometric shape to distinguish objects from the background. Their proposed method not only requires pixel perfect warping, which is unlikely in a real, noisy video stream, but makes no adjustment for illumination changes between their reference image and the current image. The motion detection will suffer from the same drawbacks as discussed above with respect to the Rosser patent. Sharir and Tamir do not give any details of how they propose, in practice, to make use of texture or geometry to compensate for occlusion.
In sum, the existing state of the art either uses a simplistic modification of the blue screen technique, which would only be acceptable in a limited sub-set of controlled illumination environments, or a more sophisticated system of comparing the image to a suitably warped and color corrected reference image. The practical problem of comparing the current image to a transformed reference image is that slight errors in the warping and color correction can result in unacceptable performance of the occlusion processing system.
The present invention overcomes these difficulties by using adaptive occlusion with a synthetic reference image. This system is usable in varying light conditions, and allows insertion with occlusion on surfaces which are reasonably uniform in color and texture. Such surfaces include, for example, tennis courts, vertical walls, e.g. the padding behind home plate on a baseball field, or even moderately textured reasonably flat surfaces like grass or artificial turf on a soccer or football stadium.
With modifications, which will be discussed in detail in the body of this application, adaptive occlusion with a synthetic reference image can handle occlusion on stationary pattern backgrounds with complex patterns of both low and high contrast, such as advertising billboards in a stadium. By stationary-pattem, it is meant that there is no internal motion within the background region, not that the background is stationary. An example of a stationary-pattern background is a camera panning past an existing advertising billboard.