Semantic image segmentation is a pixel-level labeling task to divide an image into meaningful, non-overlapping regions. In complex images, whether the segmentation is meaningful or not depends on the user's intention of what he wants to extract from the image. It is extremely challenging, if not impossible, to devise a universal approach to segment images as accurately as one would expect. This makes the problem highly ill-posed, thus user interaction is indispensable, which increases user's interaction workload.
Recent approaches have attempted to harness the capabilities of deep learning techniques for image recognition to tackle pixel-level labelling tasks. One central issue in this methodology is the limited capacity of deep learning techniques to delineate visual objects that commonly result in non-sharp boundaries and blob-like shapes in semantic segmentation tasks. Convolutional Neural Networks (CNNs) lack smoothness constraints that encourage label agreement between similar pixels, and spatial and appearance consistency of the labelling output. Such smoothness constraints can be incorporated by formulating the mean-filed approximate inference for dense Conditional Random Field (CRF) as Recurrent Neural Network (RNN) which can refine the coarse outputs from the traditional CNN in forward pass while passing error derivatives back to the CNN during training. However, such Deep Neural Networks (DNN) are mostly refined on the benchmarking datasets without considering any user interactions.
The disclosed methods and systems are directed to solve one or more problems set forth above and other problems.