Object selection is one of the most commonly used tools in digital image processing. For instance, to visually edit a portion of a digital image without altering visual characteristics of other portions of the digital image, users can select a region of interest by using an input device to define a border of the region. Once selected, image processing applications enable visual properties of the region of interest to be modified, such as by increasing a contrast of the selected region, by altering one or more colors of the selected region, and so forth. However, even for skilled image editing professionals, the manual input required to precisely select a region of interest is tedious and cumbersome, particularly for professional, high-resolution digital images.
To reduce the amount of manual input required for segmenting a region of interest from a remainder of the image (e.g., segmenting a foreground of the image from a background of the image), conventional approaches implement machine learning to automatically classify each pixel in an image as either a foreground pixel or a background pixel. For example, given a picture of a dog (foreground object) in a park (background), a neural network can generate a probability map for the image, where each pixel of the probability map indicates a probability that the pixel corresponds to the dog (foreground object).
Conventional approaches use these neural network probability maps as a baseline for image segmentation and object selection. However, due to hardware computation and memory resource constraints, probability maps suffer from limited output sizes. Thus, conventional approaches are required to first downsample high-resolution (e.g., 7000×5000 pixels) images to an output size of the neural network (e.g., 256×256 pixels). After downsampling, the neural network can be run to generate a probability map for the downsampled image, then the probability map can be upsampled to an original resolution of the image. However, conventional approaches for performing upsampling generate an upsampled probability map that does not accurately account for curved object edges in the full resolution image. To account for curved edges, conventional approaches require users to manually define a trimap for the image, which segments the image into three partitions: definite foreground, definite background, and unknown. Post-processing is then used to assign each pixel in the unknown region to either foreground or background, and the resulting set of foreground or background pixels may be selected as the object or region of interest for editing.
Manually defining a trimap using these conventional approaches, however, is cumbersome and tedious, particularly for high-resolution images where foreground pixel colors are similar to background pixel colors.