Many robots are programmed to utilize one or more end effectors to grasp one or more objects. For example, a robot may utilize a grasping end effector such as an “impactive” gripper or “ingressive” gripper (e.g., physically penetrating an object using pins, needles, etc.) to pick up an object from a first location, move the object to a second location, and drop off the object at the second location. Some additional examples of robot end effectors that may grasp objects include “astrictive” end effectors (e.g., using suction or vacuum to pick up an object) and one or more “contigutive” end effectors (e.g., using surface tension, freezing or adhesive to pick up an object), to name just a few.
Various machine learning based approaches to robotic grasping have been proposed. Some of those approaches train a machine learning model (e.g., a deep neural network) to generate one or more predictions that are utilized in robotic grasping, and train the machine learning model using training examples that are based only on data from real-world physical robots attempting robotic grasps of various objects. For example, the machine learning model can be trained to predict a likelihood of successful grasp at each of a plurality of iterations, based on a corresponding image for the iteration and a candidate motion vector for the iteration. The corresponding image can be a most recent image captured by a camera of a robot and the candidate motion vector can be a motion vector being considered for implementation by the robot. Based on the likelihood of successful grasp at each iteration, it can be determined whether to attempt a grasp or to instead implement the candidate motion vector and perform another iteration of predicting a likelihood of successful grasp.
However, these and/or other approaches can have one or more drawbacks. For example, generating training examples based on data from real-world physical robots requires heavy usage of one or more physical robots in attempting robotic grasps. This can be time-consuming (e.g., actually attempting a large quantity of grasps requires a large quantity of time), can consume a large amount of resources (e.g., power required to operate the robots), can cause wear and tear to the robots being utilized, and/or can require a great deal of human intervention (e.g., to place objects to be grasped, to remedy error conditions).