Both high-level and low-level computer vision problems such as semantic image segmentation or depth estimation often involve assigning a label to each pixel in an image. While the feature representation used to classify individual pixels plays an important role in this task, it is similarly important to consider factors such as image edges, appearance consistency and spatial consistency while assigning labels in order to obtain accurate and precise results. It is not a surprise that several computer-vision tasks can be beneficially used together or even merged into one joint task.
For example, the semantic segmentation aims to predict a category label for every pixel in the image, while boundaries or edge detection aims to determine boundary pixels in the images that are highly beneficial in improving a wide variety of vision tasks including the semantic segmentation. To that end, those two problems can be merged together into category-aware semantic boundary detection as a separate problem in computer vision. However, while classical boundary detection is a challenging binary problem in itself, the semantic boundary detection by nature is an even more challenging problem.
Recently, the problem of boundary detection has been addressed with deep learning, and some neural networks directly combine semantic segmentation results and edge detection results to perform the semantic boundary detection, instead of more systematically combining the network architectures.
Thus, such a combination of semantic segmentation and edge detection is not always efficient due to the requirement of multiple neural networks.