Autonomous robotic systems typically use dedicated haptic or tactile sensors for interaction with objects. These sensors are required to determine exact contact points and forces when grasping or pushing an object. However, before an object is manipulated by a robot it is typically searched and tracked using a visual sensor, such as a camera or a laser scanner. Robots therefore often rely on two separate sets of sensors, based on the proximity of an object.
The use of two sensors based on different modalities causes a number of problems: A handover point between the sensor subsystems must be determined, ensuring coherency between haptic and visual perception. Incoherent measurements may lead to failures, such as incorrect grasps. Furthermore, system complexity and costs are higher due to the additional sensors. For example, haptic sensors require a lot of cabling if they cover larger surface areas of the robot. Finally, both the visual and haptic modalities have their specific shortcomings: Visual methods, for instance, often fail for transparent or specular objects, and cannot provide any information about weight or deformability of an object. Haptic sensors only provide sparse information about an object and require time-intensive exploration steps.