1. Field of the Invention
This invention relates to the performance of occlusion processing for inserting realistic indicia into video images.
2. Background Art
The present invention represents an improvement over the technology described in U.S. Pat. No. 5,953,076 (which is incorporated by reference in its entirety) to Astle and Das and the other referenced therein.
One of the key elements used to make inserted indicia (inserts) look realistic in motion video systems such as broadcast television, Internet streaming video systems, motion pictures, DVD discs, etc., is to make objects that move in front of the inserts appear to pass over the top of or “occlude” the insert. The majority of existing processes for providing this capability utilize color differencing techniques which are sometimes called chroma-keying. This term generally implies that the system effectively uses a single color or a limited continuous range of color, sampled before and/or during the video production, as a reference color at the desired location of the insert. A difference “mask” is then created by subtracting the pixels existing in the live image at the insert location from the reference color pixel values. Wherever the result of the subtraction is at or near zero, pixels from the insert are included in the resulting image. Wherever the subtraction result has a large magnitude, the pixels in the live image are retained.
The single color range differencing technique described above can work well if the insert location(s) on the live image is indeed within a single color range at the beginning and throughout the video broadcast. However, there are many circumstances where this single color criterion may not be met. For example, the insert could be placed on the playing field in an outdoor sporting event that is held in a stadium or a location surrounded by large structures or buildings. In this case, there is a significant chance that the insert location will have shadows cast on portions of the field by a structure as the sun's position changes throughout the game. If the insert location has areas that are both sunlit and in shadow, a minimum of 2 distinct color ranges are introduced that must be supported by the system. The occlusion processing system using a single color range described above will only be able to provide occlusion on either the sunlit or shadowed area but not both, making the insert look much less realistic. Another example is where the insert location includes a large multi-colored team logo painted in the center of a playing field. If the insert were intended to cover or partially cover this logo then the single color system would fail again, only being able to cover a single color in the logo, and again the resulting insert would look much less realistic.
U.S. Pat. No. 5,953,075 to Astle and Das describes an alternative to a single color differencing scheme for overcoming some of the difficulties discussed above. The technique discussed therein relates to the use of a synthetic reference image that is captured during setup prior to the live video production going to air. While that technique provides for some measure of multiple color handling, it has some drawbacks. For example, although the system is designed to handle global changes in lighting conditions by updating, it fails to handle situations such as a shadow creeping across a field, which only changes part of the occluding region. Furthermore, since the reference image may be highly filtered as described, it may by its nature introduce new artifacts that will be particularly evident in areas of color transition and will, in many cases, provide a non-realistic look to the insert.
In addition, the required use of image warping capability by Astle and Das puts a very significant and potentially high-cost processing burden on the occlusion processing system. Astle and Das describe a method to reduce or eliminate the warping cost which essentially describes a simplified version of their system using single color processing. Also, Astle and Das describe a fairly complex and computationally intensive method of processing and mixing Y, U, and V component values on a per pixel basis to determine the appropriate mask value for a pixel. Briefly, the method involves subtracting each of the Y, U, and V reference image values from the positionally corresponding Y, U, and V live image values and then taking the square root of the sum of the squares of the differences as shown in the following formula:S=(Wy(YL−YR)2+Wa((UL−UR)2+(VL−VR)2))1/2where w is a weighting factor for the Y (luma) and C (chroma) values.
Not only is the formula complicated and computationally intensive, but the method by which the result is used to distinguish foreground pixels from background pixels (i.e., the pixels that must not be occluded from the pixels that must be occluded) causes an increased number of pixels to receive the wrong foreground/background designation. For example, if two of the color components evaluate to zero, meaning that they match the reference color, but the third has a notable but not a significantly large difference, the result will likely indicate that the pixel is part of the background. However, the single third component differentiation indicated a difference that may have been erroneously suppressed in this compositing calculation.