This specification relates to selecting parts of images.
Typical pixel classification techniques allow users to provide input (e.g., through the use of a mouse) that paints or selects some of the pixels from an object in an image that they desire to select. The classification techniques can then classify all other pixels in the image based on the received user input, ideally selecting all pixels from the desired object.
Graph cut techniques, in particular, are pixel classification techniques that lend spatial coherence to the pixel classification, which can be achieved by minimizing an objective cost function with two types of terms: regional costs and boundary costs. Regional costs reflect the cost of labeling a particular pixel as foreground or background. Boundary costs reflect the cost of labeling a pair of neighboring pixels as both foreground, both background, or one each foreground and background. The boundary cost of labeling a pair of pixels with different classifications is high when the two pixels have similar color, since it is likely that two pixels with the similar color should either both be in the selection or both not. Regional costs can be determined by comparing the color of a given pixel to a model of the colors expected in the foreground and the background. Generally, this model is derived from the selected pixels provided by the user. The classification of pixels can be determined by finding a labeling of pixels that minimizes the cost function constrained by the user-selected pixels within the image. There are many techniques that can be used to find a pixel classification by minimizing a cost function. One effective approach achieves this by mapping the pixel classification problem onto a graph and solving a minimum graph cut problem or its equivalent maximum graph flow problem.
Graph cut techniques typically provide a contiguous selection of one or more discrete objects depicted within the image without selecting undesirable or disconnected areas outside the one or more desired objects. However, these techniques typically require extensive computation, making them unresponsive to live user input.
Live classification techniques classify the pixels in an image so that a presentation of the classification can be presented to users as they provide input (e.g., paint the desired object in the image). However, live classification techniques can suffer from poor, incoherent classification, which is typically characterized by partial, rather than complete, object selection (e.g., small areas of misclassification, or ‘holes’ in the resultant classification). Moreover, live classification can exhibit erratic or unstable behavior as input is received that users find difficult to predict (e.g., classification leaks through relatively small gaps of the boundary of an object depicted in the image). Unintended and undesirable selections require the user to make time-consuming corrections. Furthermore, live classification techniques have previously only been demonstrated on small images because of the slowness of the algorithms employed.