Perception is a process of understanding an environment where a robot moves about. An example of an application where perception is used is with autonomous driving (AD) cars. AD vehicles may need to understand the environment, which may include one or more obstacles, driving lanes, driving rules pertaining to a specific location, etc., in order to freely move around. AD vehicles may also need to understand the environment in order to classify different obstacles encountered, for example, a pedestrian, a bicycle, etc., for prediction and high level reasoning.
Conventional methods use either engineered features or small convolutional networks to address the problem of perception. Convolutional neutral networks (CNNs) and in general artificial neural networks (NNs) provide conventional performance in almost all image processing tasks. They compute features directly from data, but are limited to small image patches. Mask R-CNN has been used to show the potential of application of NNs to larger images, but it has only been applied for image classification and detection.