Robotic systems, such as a robotic manipulator containing a gripping component, may be used for applications involving picking up or moving objects. For instance, a robotic device may be used to fill a container with objects, create a stack of objects, or unload objects from a truck bed. In some cases, all of the objects may be of the same type. In other cases, a container or truck may contain a mix of different types of objects, such as boxed items, cans, tires, or other stackable objects. Such robotic systems may direct a robotic manipulator to pick up objects based on predetermined knowledge of where objects are in the environment.
In some examples, a robotic system may use computer vision techniques to determine a representation of three-dimensional (3D) scene geometry. By way of example, a robotic system may triangulate information observed from a scene to determine a depth to one or more surfaces in a scene. One approach to depth sensing is the use of stereo image processing. According to this approach, two optical sensors with a known physical relationship to one another are used to capture two images of a scene. By finding mappings of corresponding pixel values within the two images and calculating how far apart these common areas reside in pixel space, a computing device can determine a depth map or image using triangulation. The depth map or depth image may contain information relating to the distances of surfaces of objects in the scene.
Another approach to depth sensing using structured-light processing may be employed. The main idea of structured-light processing is to project a known illumination pattern onto a scene, and capture an image of the scene that includes the projected pattern. For example, as shown in FIG. 1, a projector 102 may project a known texture pattern onto an object 104, and an optical sensor 106 (e.g., a camera) may capture an image 108 of the object 104. A computing device may then determine a correspondence between a region in the image and a particular part of the projected pattern. Given a position of the projector 102, a position of the optical sensor 106, and the location of the region corresponding to the particular part of the pattern within the image 108, the computing device may then use triangulation to estimate a depth to a surface of the object 104.
Typically the projector 102 and optical sensor 106 are displaced horizontally along a baseline, and the projector 102 and optical sensor 106 are calibrated. The calibration process may map a pixel in the optical sensor 106 to a one-dimensional curve of pixels in the projector 102. If the sensor image and the projector image are rectified, then this curve may take the form of a horizontal line. In this case, the search for matches to the projected texture pattern can proceed along this line, making the process more efficient.