This invention relates generally to computing systems and more specifically to systems and methods for detecting a border in a digital image.
A digital image is a collection of digital information that may be cast into the form of a visual image. Digital images may include, for example, photographs, artwork, documents, and web pages. Digital images may be obtained, for example, from digital cameras, digital video, scanners, and facsimile. The images may be two-dimensional or multi-dimensional. For example, three-dimensional images may include representations of three-dimensional space, or of two-dimensional movies, where the third dimension is time.
The fundamental element of a digital image is a pixel. Referring to FIG. 1, a digital image 100 is shown which is 10 pixels wide and 10 pixels high. A single pixel 105 in this image 100 is represented by a square. Generally, a pixel has a specific location (designated in two-dimensional space as {right arrow over (r)}=(x, y)) in the digital image and it contains color information for that location. Color information represents a vector of values, the vector characterizing all or a portion of the image intensity information. Color information could, for example, represent red (R), green (G), and blue (B) intensities in an RGB color space. Or, as shown in FIG. 1, color information may represent a single luminosity in a grayscale color space.
A color space is a multi-dimensional space in which each point in the space corresponds to a color. For example, RGB color space is a color space in which each point is a color formed of the additive amounts of red, green and blue colors. As another example, color information could represent information such as cyan-magenta-yellow (CMY), cyan-magenta-yellow-black (CMYK), Pantone, Hexachrome, x-ray, infrared, and gamma ray intensities from various spectral wavelength bands. Thus, for example, in CMY color space, each point is a color formed of the combination of cyan, magenta, and yellow colors. Color information may, in addition, represent other modalities of information, such as acoustic amplitudes (sonar, ultrasound) or magnetic resonance imaging (MRI) amplitudes.
In RGB color space, levels of red, green, and blue can each range from 0 to 100 percent of full intensity. Each level can be represented by a range of decimal numbers from, for example, 0 to 255 (256 levels for each color) with an equivalent range of binary numbers extends from 00000000 to 11111111. The total number of available colors would therefore be 256xc3x97256xc3x97256, or 16,777,216 possible colors.
One way to express the color of a particular pixel relative to other pixels in an image is with a gradient vector (designated as {right arrow over (G)}). The gradient vector at the particular pixel is an indication of a direction and magnitude of change in colors of pixels relative to the particular pixel and it may be calculated using the color and position information inherently associated with the pixels in the image. Generally, the gradient vector for a pixel at a position {right arrow over (r)} may be designated as {right arrow over (G)}({right arrow over (r)})=(g({right arrow over (r)}) cos xcfx89({right arrow over (r)}), g({right arrow over (r)}) sin xcfx89({right arrow over (r)})), where G({right arrow over (r)})=|{right arrow over (G)}({right arrow over (r)})| is the magnitude of the gradient vector at the pixel located at position {right arrow over (r)} and xcfx89({right arrow over (r)}) is the angle or direction of the gradient vector at the pixel located at position {right arrow over (r)}. An example of a gradient vector is shown schematically by vector 110, which points in the direction of the greatest change in color, and whose magnitude indicates the amount of color change. In the case of a linear boundary that bisects a first region that is white and a second region that is black, the gradient vector at each pixel along the boundary would be of the same magnitude and angle (which would be perpendicular to the linear direction of the boundary). Moreover, the gradient magnitude at a pixel outside of the linear boundary and distinctly within one of the white or black regions would be zero because the surrounding pixels have the same color as that pixel.
It is common for one working with a digital image to cut or separate a foreground region of the image from a background region of the image. The foreground region often corresponds to an object or set of objects in the image. Alternatively, the foreground region may correspond to a region outside of an object or set of objects. In any case, regions of the image that are not part of the desired foreground region may be referred to as background regions.
Referring to the example of FIGS. 2a and 2b, digital image 200 includes a foreground region 202 (the chair) and a background region 204 (the hallway, doors, floor, windows, and walls). While foreground region 202 only includes a single object (the chair) that is highlighted in FIG. 2b, foreground region 202 can include plural objects some of which may overlap. For example, in FIG. 2a, the user may have designated one of the doors as the foreground region. Or, the user may have designated the combination of the floor and the chair as the foreground region.
In a method for identifying the foreground region 202 in the digital image 200, the user can select, using a graphical interface device (or brush) 207, boundary 210 (shown in FIG. 2b) in the digital image 200 that encompasses or traces the chair and then designates the chair as the foreground region 202. The graphical interface device is a mechanism that enables the user to indicate or xe2x80x9cpaintxe2x80x9d the boundary, much like a brush is used by a painter.
Boundary 210 bounds the chair and can also include portions of other objects. For example, boundary 210 may include portions of a door or the floor if the user wants to include those objects in the foreground region. FIG. 2b shows a highlighted portion of the boundary of the chair.
Defining a boundary 210 that only encompasses the chair can be difficult. For example, the user can trace with a relatively larger brush around the top of the chair, but a relatively smaller brush is required near the wheels and the arm rests. The user may select different sized brushes depending on the region that will be traced. However, in order to ensure that the region to be highlighted actually covers the boundary to be traced, a larger brush is typically selected. Moreover, even when the user traces with a relatively narrower brush around the wheels and arm rests, the narrow brush may still cover many features of the chair and the background region.
In addition to being time consuming, tracing along the boundary won""t resolve how much of the pixel color for a pixel in the boundary came from the object (for example, the chair) and how much came from the background region. The process of characterizing individual pixels in the boundary is difficult because their data is a blend of both the object data and the background region data.
A portion or object of a digital image may be identified for further processing using an identification or selection operation. An example of such operation is a masking operation in which an object in a digital image is cut so that the object can be manipulated (for example, blended into another region or otherwise manipulated). Masking typically includes defining an opacity (conventionally represented by alpha xcex1) for pixels in the masked and unmasked regions, where the opacity specifies the degree to which an associated pixel is selected (for example, identified or masked). A value of 1 can be used to indicate that the pixel belongs completely to the object or foreground region. A value of 0 can be used to indicate that the pixel belongs completely to the background region. Values between 0 and 1 indicate partial membership in both.
Referring also to the digital image of FIG. 3a, a foreground region 202 including the chair can be masked from the background region by clicking on the chair using a cursor 300. A masking technique is described in xe2x80x9cIDENTIFYING INTRINSIC PIXEL COLORS IN A REGION OF UNCERTAIN PIXELS,xe2x80x9d application No. 09/298,872, filed Apr. 26, 1999, which is incorporated herein by reference. In that technique, a linear blend model for a particular pixel is determined by, roughly, observing a color cf of a pixel in the foreground region closest to the particular pixel and a color cb of a pixel in the background region closest to the particular pixel. An opacity xcex1 for the particular pixel with color c is determined using a computation in color space that chooses 0 less than xcex1 less than 1 so that c is as close as possible to the opacity found using the linear blend model described by xcex1xc3x97cf+(1xe2x88x92xcex1)xc3x97cb.
FIG. 3b shows the result of a masking operation in which the chair has been masked from the digital image 200. All pixels in the background region 204 are assigned an opacity value of 0 whereas all pixels in the foreground region 202 are assigned an opacity value of 1. Pixels lying between the clearly distinguished regions may have opacity values between 0 and 1 that are based on the linear blend model technique discussed above.
The color-based detection technique described above and shown in FIGS. 2b, 3a, and 3b may make incorrect decisions about those pixels lying between the clearly distinguished regions if the foreground and background region colors are similar. For example, in FIG. 3b, the color-based technique has difficulty discerning an appropriate value for the opacity of some pixels that are clearly in the background region (for example, the pixels in the door frames). Thus, for example, pixels in region 305 were given opacity values near to 1 even though the pixels in that region are clearly in the background region (that is, in the rear window area). As another example, the color-based technique incorrectly assigns portions of the dark carpet triangles as belonging to the foreground region or chair (as shown in region 310). In contrast, the color-based technique incorrectly assigns portions of the dark left arm of the chair (region 315) as belonging to the background region as evidenced by the holes in the left arm of the chair. This problem occurs because the chair""s left arm overlaps some black trim on the wall behind the chair and it is difficult for the color-based technique to distinguish the chair""s black color from the background""s black color.
In addition to masking an object, the intrinsic color of each pixel in the image may be determined in a process referred to as color decontamination. Each pixel has associated with it an intrinsic color, which is the color of the pixel in the image if the pixel were not blended with the background region. The intrinsic color may differ from the observed color because regions of the digital image may be blended. The blending can arise either from imaging optics in the process of capturing a digital image or from the composition of multiple image layers. Blends can arise from xe2x80x9ccolor spill,xe2x80x9d in which light from one portion of a scene reflects to another. For pixels that are not blended, the intrinsic color is the observed color. For pixels that are blended, (including blending due to color spill) the intrinsic color is a color that differs from the observed color. After the object is masked and the intrinsic colors of pixels in the object are determined, the object has been color extracted.
The masked object may be blended with another background region in a process referred to as compositing. When the masked object is blended with the other background region, a resultant image is produced including pixels having values derived from the blending of the object with the other background region. In the composition process, opacity values for the masked object are used to determine the blend. Where the opacity values are 0, pixel data is derived from the other background region and where the opacity values are 1, pixel data is derived from the masked object.
Traditional masking techniques work best when the user manually specifies a width that describes the boundary; thus, no automation is provided. Some masking techniques determine a background region color by requiring the user to position the center of the brush (referred to as a brush tip) in the background region.
Often, poor results have been obtained in previous masking techniques because the size of the user""s brush may be too large or too wide, which can cause it to cross or intersect another boundary of the image. Poor results in previous masking techniques may also occur when the boundary is near to or coincident with fine texture or when foreground and background region colors are similar, as discussed above.
In one aspect, a border is identified in a digital image defined by a plurality of pixels, each pixel being defined by a pixel color and a pixel position indicating a location of the pixel in the digital image. User inputs are received that include an area of interest that includes at least a portion of the border to be identified. Identification of the border includes estimating information about an edge zone that models the border portion including estimating a position and width of the edge zone. The position of the edge zone is estimated by calculating a weighted average value of pixel positions of each pixel in the area of interest. A measure of confidence in the edge zone information is calculated. A width of the edge zone is estimated at which the calculated measure of confidence decreases appreciably if the estimated edge zone width increases. The border is identified based on the estimated edge zone information.
Aspects of the method may include one or more of the following features. For example, receiving the area of interest may include receiving a user input indicating the area of interest. The weighted average value of pixel positions may be calculated by weighting each position of a pixel in the area of interest by a first function of a gradient magnitude at the pixel. The weighted average value of pixel positions may be calculated by weighting each position of a pixel in the area of interest by a second function that is a difference between the pixel gradient direction and a predetermined bias direction.
The bias direction may be a previously estimated edge zone direction. The bias direction may be derived by comparing a location of a previous area of interest to the area of interest.
Identification of the border may include receiving, for each pixel, a pixel gradient indicating a direction and magnitude of change in color.
Estimating the weighted average value of pixel positions may include estimating a center of the area of interest and weighting each pixel position in the area of interest by a third function. The third function is a difference between the pixel position and the estimated center of the area of interest. The area of interest center may be estimated by accessing a previously estimated edge zone position.
Receiving the area of interest may include receiving information relating to the area of interest through a user controlled graphical interface device. Moreover, estimating the center of the area of interest may include estimating one or more local maxima of a function of the gradients along a path that intersects the graphical interface device and lies parallel to a predetermined direction relating to the edge zone.
The predetermined direction relating to the edge zone may be a previously estimated normal direction of the border.
A local maximum may be selected by finding a distance between the local maximum and a previously estimated edge zone position, and calculating a gradient value of the pixels that are crossed along a path connecting the previously estimated edge zone position and the local maximum. A local maximum may be selected by calculating an average (per pixel) amount of agreement in gradient angle with the predetermined direction relating to the border along a path connecting the previously estimated edge zone position and the local maximum. A local maximum may be selected using any one or more of the above described methods in addition to using other gradient magnitude- and gradient direction-based criteria that seek to identify a candidate that lies along the same border as a previously estimated edge zone position.
Information about the edge zone may be estimated by estimating a direction of the edge zone to calculate a weighted average value of pixel gradient directions at each pixel in the area of interest. The measure of confidence in the edge zone information may be calculated by calculating an average value of a difference between the estimated edge zone direction and the pixel gradient direction at each pixel over an estimated edge zone area.
Identifying the border may include comparing the position of a pixel in the area of interest to the estimated edge zone position to determine a relative position of the pixel, and calculating an opacity for a pixel in the area of interest based on the relative position. The position of the pixel in the area of interest may be compared to the estimated edge zone position along the estimated edge zone direction. The opacity may be calculated by calculating a value based on the comparison and the estimated edge zone width.
Identification of the border may include receiving one or more other areas of interest. Each of the one or more other areas of interest includes at least a portion of the border to be identified. For each of the one or more other areas of interest, a position of an edge zone may be estimated by calculating a weighted average value of pixel positions of each pixel in the other area of interest. Moreover, a direction of the edge zone for the other area of interest may be estimated by calculating a weighted average value of gradient directions at each pixel in the other area of interest. A width of the edge zone for the other area of interest may be estimated. Accordingly, a measure of confidence in the edge zone direction, position, and width for the other area of interest may be calculated. Additionally, identification of the border may include estimating a width of the edge zone at which the calculated measure of confidence decreases appreciably if the estimated edge zone width increases. An opacity for a pixel in the other area of interest may be calculated by comparing the position of the pixel in the other area of interest to the estimated edge zone position for the other area of interest.
Identification of the border may also include analyzing the calculated measures of confidence for those areas of interest in which a pixel is included and calculating a border-derived opacity for the pixel based on the analyzed measures of confidence. The calculated measures of confidence may be analyzed by calculating a measure of confidence that corresponds to an acceptable measure of confidence. Calculating the border-derived opacity may include selecting an opacity corresponding to an acceptable measure of confidence.
Calculating the border-derived opacity may include calculating a weighted average value for the opacities for each pixel by weighting each opacity by the corresponding calculated measure of confidence. The border-derived opacity may be calculated by selecting an opacity corresponding to a most recently calculated opacity for a given pixel.
Identification of the border may also include masking a region of the digital image by receiving input from a user indicating a region to mask. The masking may include calculating a second opacity for a pixel, comparing the border-derived opacity to the second opacity, and calculating a final opacity for the given pixel by analyzing the comparison. The second opacity for the pixel is calculated using a color space computation based on a linear blend model. Calculating the final opacity may include analyzing one or more of the measures of confidence in the areas of interest in which the pixel is included.
Calculating the final opacity may include estimating an error in calculating the border-derived opacity, estimating an error in calculating the second opacity, and selecting as the final opacity the border-derived opacity, the second opacity, or a composite opacity based on the estimated errors. The composite opacity depends on the border-derived opacity and the second opacity.
Identification of the border may also include automatically indicating to a user the estimated edge zone information. The user may be enabled to identify the area of interest using the indicated edge zone information. The method may also include automatically indicating to the user the estimated edge zone position, the estimated edge zone direction, the estimated edge zone width, or the calculated measure of confidence.
The method may include receiving as an input another area of interest that includes at least a portion of the border to be identified. The portion is modeled by another edge zone. Identification of the border may include combining the edge zone with the other edge zone.
A position, direction, and width of the other edge zone may be estimated. Estimating the position of the other edge zone may include calculating a weighted average value of pixel positions of each pixel in the other area of interest. Estimating the direction of the other edge zone may include calculating a weighted average value of gradient directions at each pixel in the other area of interest. A measure of confidence in the direction, position, and width of the other edge zone may be calculated. A width of the other edge zone may be estimated to be that estimated other edge zone width at which the calculated measure of confidence decreases appreciably if the estimated other edge zone width increases.
The border may include one or more pixels along a direction that is normal to a general direction of the border. For example, the border may be one pixel wide or the border may be multiple-pixels wide.
Aspects of the methods and systems described herein can include one or more of the following advantages. For example, the techniques described here allow applications to detect and use borders to extract and/or mask an object from a digital image. In general, the detection methods permit a user to easily and accurately mark a border of an object by snapping a brush tool (which is a cursor indication within a graphical user interface) to a border center and by establishing a width of the border at the brush tool. Because of this, less highlighted region is applied with the brush tool relative to the size of the brush tool. Such ease and accuracy improves results of known masking techniques. Moreover, the techniques provide reliable estimates of foreground and background region color mixtures by fitting a border profile along a normal direction of the border across the border width. The techniques also greatly reduce extraneous pixels in the background region and/or fill in the pixel holes in the foreground region that occur in known color-based techniques. Such unwanted effects such as extraneous pixels in the background region or pixel holes in the foreground region may be present when texture is very near to the border and/or when the foreground and background region colors are similar.
Other advantages and features will become apparent from the following description and from the claims.