Category-level and instance-level semantic segmentation are two fundamental computer vision tasks for scene understanding (e.g., determining properties of one or more objects in a scene represented in an image). Category-level semantic segmentation aims to assign a label for each pixel, which may be indicative of an object type or category. Instance-level semantic segmentation aims to localize and recognize objects using masks. A similar task to instance-level semantic segmentation is object detection, which uses bounding boxes instead of masks to localize objects. These tasks are performed independently to provide respective category and instance information for the image. Recent progress in category-level and instance-level semantic segmentation has made tremendous improvements due to the success of deep convolutional neural networks. On the other hand, instance-level semantic segmentation has recently become a core challenge in scene understanding.