A common tool used in digital image compositing is a silhouette tool. It enables a graphic designer to cut out a desired portion of a photograph, such as a subject in the picture, for use with an image under design. Alternatively, the desired portion can be cut out of the photograph so as to leave a transparent hole. The shape of the portion being cut out is typically non-rectangular, so that more processing is involved than that required to provide simply a rectangular clipping. The objective of a silhouette tool is thus to find (and represent) the shape of the portion to be cut out as automatically as possible. Several different tools are currently in use by various companies.
For example, a designer could begin with a photograph of an entire family, and desire to cut out the father's face from the picture. The silhouette tool would have to remove everything from the photograph except the father's face. In another example, the photograph might have a picture of a woman speaking on a telephone, and the designer would want to cut the telephone out of the picture. In this case, the silhouette tool would leave everything in the photograph intact, except for the telephone which is removed.
In the world of digital images, every image is a rectangular array of pixels. A non-rectangular shape, such as a man's face, cannot be stored as an independent image in itself. Instead, it must be part of a larger rectangular image. What is commonly done to delineate the face is to assign to each pixel location an opacity value, indicating the degree of transparency of the image color at that pixel location. An opacity of 0% indicates complete transparency--so that the image color at that pixel location is not seen at all. An opacity of 100% indicates that the image color is opaque, and is seen over any background. Opacity values between 0% and 100% are used to indicate partial transparency. Opacity is a way of enabling a designer to superimpose images on top of one another, and indicate how the composite image appears.
Thus, to create an image of a man's face alone, one can identify the pixel locations corresponding to the man's face and label them as opaque, with an opacity of 100%, and label all the other pixel locations as transparent, with an opacity of 0%. More generally, one would prefer to taper the opacity values, so that artifacts such as fringes around the edges are diminished.
One type of silhouette tool commonly used employs an image mask. In its barest form, a mask is simply an array of bits used to indicate which pixels are within the desired portion being silhouetted. More generally, a mask is an array of opacity values, used to mark the desired portion of the image as opaque, and the undesired portion as transparent, with some transition opacity values between the two extremes. Still more generally, a mask could be represented in resolution-independent coordinates, so that it can be digitized to a pixel mask at whatever sampling resolution the image is being displayed or printed.
In many applications silhouetting is accomplished by background elimination. For professional photographs, the subject being silhouetted is typically the portion of the photograph which is not background. Conventional techniques such as "blue screen" generate masks by accepting as input the background color and eliminating all pixels with colors close to the background color. This works well, provided the background color is not also an integral color within the subject of the photograph. Saturated blue, for example, is typically used since it is the least dominant color in human skin. The background color can be keyed to chrominance or to luminance, hence the use of "luma key" and "chroma key" for masking. The luma key process correlates image luminance over an entire image to an opacity map, and the more general purpose chroma key process relies on global color differences in an image.
Another commonly used tool employs curvses constituting a boundary path of the desired portion. Whereas the natural representation for a mask is in raster format, the natural representation for a boundary path is in vector format. Of course, the mask and the boundary curves can be converted one to another.
Several commercial products use the techniques described above, among them PHOTOSHOP.RTM., MASK PHOTOFUSION.RTM.. PHOTOSHOP.RTM. uses both the image mask and the boundary path approaches to silhouetting. Its pixel mask approach uses alpha channels. To define an alpha channel, the user creates a selection, which can then be saved into a new alpha channel, or else added or removed from an existing channel. The selection is created by using MA style tools such as rectangle, ellipse, and lasso for freehand drawing and by using a magic wand. The magic wand operates by selecting all pixels connected to a seed pixel within a pre-determined color range. PHOTOSHOP.RTM. also uses a path approach, using Bezier curves with an Illustrator.RTM.-type interface. Bezier curves are constructed by using multiple control points as guides to determine location and slope along the curve, and are well known in the art.
MASK channel and path modes. By using a straight line, B-spline and edge finding tool in alternation, a user marks a series of anchor points to define a shape which, once closed, is added to or subtracted from the mask. The edge finding tool operates by detecting an edge in a neighborhood of a segment. Automatic edge detection identifies edges as groups of pixel locations where there are abrupt changes in image color. Techniques for automatic edge detection are described in Gonzalez, R. C., Digital Image Processing, Addison-Wesley, Reading, Mass, 1992, pgs. 418-420, the disclosure of which is hereby incorporated by reference.
PAINTER.RTM. is similar to PHOTOSHOP.RTM., and allows the user to base the mask on luminance or chrominance. PHOTOFUSION.RTM. eliminates background using "blue screen" techniques, whereby the background pixel locations are identified by their uniform, or nearly uniform color, thereby determining the desired subject as those pixel locations which are not background.
Unless the user specifies the silhouette portion of the image with perfect or near-perfect accuracy, determining the mask or boundary path involves automated image processing--usually edge detection. Since there can be many edges within the subject being silhouetted, and only the boundary edges are of significance, this can be a formidable task. One approach to finding the boundary edges, referred to as "shrink-wrap" operates by having the user first designate a region which encompasses the desired portion to be extracted. The outline of the region is then shrunk inwards until it impinges upon edges. This technique works well when dealing with images with a featureless background, but for images with a complex background it can easily get fooled by spurious edges in the background.
Silhouetting tools which operate by edge detection are only successful if there is a substantial contrast between the subject being silhouetted and the rest of the image, and if the subject is opaque. Under low contrast conditions, or if the subject is transparent, the edges produced by these tools as a boundary path will suffer from error, and as a result, either too much or too little of the photograph will be included in the silhouette.
Thus it can be appreciated that a key weakness of present edge detection is its failure to produce good results when there is no artificial "blue screen" background and when there is weak contrast between the subject and the background. For example, this is the case when trying to cut out a father's face from a family portrait. The father's face may be positioned next to other faces, and current edge detectors would either cut too much out, by including portions of other faces, or cut too little out, by cutting into the father's face.