With advancements in digital cameras, smartphones, and other technology, the ability to capture, access, and utilize images and video has steadily increased. For instance, businesses now routinely utilize digital visual media for presentations, advertising, recruiting, merchandising, and other purposes, particularly for online platforms. Similarly, individuals now routinely utilize digital visual media for communication, entertainment, or employment purposes.
The increased proliferation in the use of digital visual media has resulted in an increase in the use of systems and methods for processing such digital visual media (e.g., modification or manipulation of a digital image or video). For instance, a digital image may include features or objects (e.g., a person, pet, car, etc.) that a user seeks to select and move, modify, copy, paste, or resize. In response to this user demand, some digital object selection systems have been developed that permit a user to identify, select, and label objects within a digital visual medium (e.g., a digital image). In some such systems, the identification of an object in a digital visual medium may involve a process that is at least partially automated.
In one example, a conventional system employs methods for detection (i.e., identifying an object in medium) and segmentation (e.g., partitioning the medium into segments corresponding to the object) that include generating a number of bounding boxes for an instance of an object. The bounding boxes include rectangles (or squares) defining a set of pixels that correspond to the location of a least a portion of the object. The segmentation is computed given the bounding boxes. This approach allows for multiple segmentations for a given object instance. The approach also often returns multiple overlapping candidates for a single object instance, with different class labels applied to each of the instances determined by the segmentation. The different class labels can result in a mislabeling of the object. Thus, this approach falls short of producing an actual instance-based segmentation of an image.
In another example, a conventional system employs a semantic segmentation method that labels all pixels of a given object class. For example, given a digital image including three people, the system labels all of the pixels corresponding to the three people as the class “person,” without distinguishing one person object from another. Similarly, if one person in the image is touching another person (e.g., the two person objects are overlapping), conventional systems do not provide a way to separate them from each other, yielding a representation (e.g., an image mask) corresponding to both people rather than each person as individual target objects.
Thus, existing segmentation solutions algorithms may present disadvantages such as (but not limited to) those described above.