(1) Field of Invention
The present invention relates to a system for identifying functional parts of objects for robotic manipulation and, more particularly, to a system for identifying functional parts of objects for robotic manipulation using tactile and auditory sensory feedback.
(2) Description of Related Art
Successful robotic use of hand-held tools (e.g., drills, staplers, flashlights) requires that a robot is capable of detecting the specific object features (e.g., buttons that need to be actuated to turn on the tool). The vast majority of two-dimensional (2D) and three-dimensional (3D) point cloud object representations used in the robotics industry are solely vision based (see the List of Incorporated Cited Literature References, Literature Reference Nos. 1 and 2). While such 3D models capture what an object looks like through a 3D sensor, they do not encode multi-modal information (e.g., how an object feels or sounds) that may be indicative for solving a task. Because of this limitation, when robots are tasked with manipulating objects (e.g., pressing a button), they're typically pre-programmed by the human user to apply the behavior at a hard-coded location on the object. Such an approach suffers from the obvious problem that the robot cannot adapt to novel objects, ones for which a hard-coded target location is not available.
Recently, some work has focused on enabling robots to detect tactile, proprioceptive (e.g., the set of joint torques of a robot arm), and acoustic object properties (see Literature Reference No. 3), but the drawbacks of those methods are that they fail to take the object's geometry into account and can only handle simple objects that have no degrees of freedom (e.g., cup, box), but not a stapler or a drill with a button, for instance.
The work described in Literature Reference No. 4 attempted to expand the aforementioned approaches. In their experiments, the robot was able to estimate the location of a button which, if pressed, produced a sound. The main limitation of that work, however, was that the robot's perception of the object (e.g., a doorbell button mounted on a flat surface) was only in 2D with the assumption that there was only one fixed frame of reference (i.e., the button had to be in a hard-coded initial location). In addition, to register a successful button press, the robot had to have a direct line of sight to its finger, an assumption that does not hold in practice.
Each of the prior methods described above exhibit limitations that make them incomplete. Thus, a continuing need exists for a process that produces a multi-modal 3D object representation that not only allows the detection of functional object features, but also opens the door for enabling robots to classify objects based on multi-modal sensory feedback as opposed to using just visual input.