Field of the Invention
This disclosure presents a system and method through which a mobile device, such as a robot, can detect obstacles while navigating in an environment using a laser pattern to augment the visible field of a visual sensor and measure the distance to objects.
Description of the Related Art
To create a robust collision detection system, an algorithmic approach using only a video feed is unlikely to detect all near-field objects with a reliability preferred for robotic navigation.
Automatic obstacle detection is a useful functionality for a mobile device that navigates in an environment. For fully autonomous devices, automatic obstacle detection enables the device to move within an environment without damaging objects and without getting caught, pinned, jammed, or trapped. For a semi-autonomous device, such as tele-operated vehicle, automatic obstacle detection can facilitate the tele-operated control of the vehicle by either alerting a human operator to the presence of obstacles or by adapting the operator's commands in order to avoid collisions with detected obstacles.
There are several known methods for automatic obstacle detection. One conventional method for obstacle detection uses a laser range finder. In a laser range finder, a laser beam is pulsed, and the time required to receive the reflection of the pulse is proportional to the distance to an obstacle. The laser and detector are typically mounted on a rotating platform so that a scan of the environment can be performed. Typical scan rates are 10 scans per second, with 360 or more points scanned per rotation. The accuracy is also relatively high, with the estimated distance to an obstacle typically detected with accuracy better than 1 cm and with a range of up to 80 meters. Despite the relatively high quality of the data, the device is relatively expensive, precluding its use in many products. Furthermore, the scan is done only in one plane (typically a horizontal plane if the laser range finder is horizontally mounted); thus, the sensor is unable to detect objects that do not lie in the scan plane (such as a tabletop).
Another conventional method for obstacle detection uses infrared (IR) sensors. An IR sensor typically comprises an IR light-emitting diode (LED) illumination source with a lens so as to project the light in a thin, pencil-like beam, and a photosensitive detector that is aligned with the illuminator. The detector is typically a one-dimensional (1-D) array, and given the position along the 1-D array where the illumination is detected, the distance to an obstacle can be calculated. IR sensors typically have a range on the order of 1 meter. Because IR sensors only provide detection along the pencil-like beam, it is, disadvantageously, relatively difficult to obtain complete coverage of the environment, even when a relatively large number of IR sensors, such as dozens of IR sensors, are used.
Another conventional method for obstacle detection uses ultrasound sensors. In an ultrasound sensor, an emitter of ultrasonic sound generates an ultrasonic sound pulse. The time it takes to receive the reflected echo is proportional to the distance to the obstacle. Ultrasound typically has a range of up to 8 meters, and a single detector can typically cover a cone of approximately 30 degrees. However, using ultrasound sensors, it is relatively difficult to localize an object precisely, to disambiguate between various objects at different distances, and to disambiguate echoes from true objects.
There are several conventional approaches to obstacle detection in which the sensor used is a 2-dimensional visual sensor such as that of a CCD camera, and the techniques used are those of computer vision or machine vision, wherein the image produced is analyzed in one of several ways. One vision-based approach uses stereo vision. In the stereo vision approach, images of the environment are taken by two or more sensors among which the relative positioning is known. By finding pixel correspondence among the multiple views of features in the environment, the pixel coordinates can be triangulated to determine the 3-D location of the detected features. A challenging part of this process is determining the correct correspondences of features seen in one image with what is seen in other images. This is a relatively computationally intensive process, which can often produce inaccurate results, particularly when the environment being imaged has relatively little texture (i.e., features, or points that can easily be identified across images).
Another conventional vision-based approach uses a technique known as structure from motion. This method is a variant of the stereo vision method, in which instead of using images from different imagers that are spatially co-located, images from the same imager taken at different instances in time as the mobile device moves through the environment are utilized. The method of detecting corresponding image points and the method of triangulation are the same as in stereo vision, but the problem is made more difficult because the spatial co-location of the imager when the images were acquired is, a priori, not known. Even if the motion of the mobile device is somehow measured, it is typically not known with sufficient accuracy to enable accurate triangulation. The solution to this problem is typically known in the computer vision community as the “Structure from Motion” problem, and several standard techniques have been developed to solve it. The overall method is of comparable computational cost and complexity as the stereo vision method, and the method suffers from the same drawback, in that it typically does not provide reliable and complete estimates when there is insufficient texture in the environment.
Another variety of vision-based approaches use structured light. This method overcomes the possibility that there are not enough features in the environment by using a dedicated light source to generate visible features in the environment. The light source can be a simple point source, such as a laser beam or a focused LED beam, can be a stripe of light, or can be any other fixed projection pattern. Typically, the relative position of the illumination source to the imager is known, so that once a point from the illumination pattern is detected, the 3-D position of the obstacle on which the illumination pattern point was detected can be triangulated. One challenge of this method is to reliably detect the illumination pattern. A typical difficulty is that the pattern must remain visible to the imager even in the presence of other light sources, such as the sun in the case of an outdoor environment. There are several known approaches to improve or maximize the visibility of the pattern.
In one approach for improving or maximizing the visibility of a structured light pattern, the illumination source can be made very bright, so that it is not washed out by other sources. In some applications this may not be feasible, due to power considerations or due restrictions on illumination intensity required by eye safety standards.
In another approach for improving or maximizing the visibility of a structured light pattern, a special optical filter is placed on the imager so as to allow only the frequency of light produced by the pattern illumination source. This will block out a relatively large portion of the light produced by other sources, which makes the pattern illumination source relatively clearly visible in the image. Typically, the use of such a filter produces an image in which the only visible element is the illumination pattern. The use of such a filter, however, precludes the use of the imager for other purposes for which other frequencies of light are necessary.
In another approach for improving or maximizing the visibility of a structured light pattern, the pattern illumination source can be flashed on very intensely for a very short period of time (typically tens of milliseconds or less). Concurrently, the imager is synchronized with the illumination source so as to collect light to form an image only during the short interval during which the light source is on. This method produces an image where the illumination pattern is clearly visible (again, typically the pattern is the only thing registered in the image), because during the time the illumination is on, it is of much higher intensity than any other source. However, once again, this method precludes the use of the imager for other purposes for which other frequencies of light are necessary.
In yet another approach for improving or maximizing the visibility of a structured light pattern, the detectability of the pattern can be enhanced by taking two images in rapid succession, one with the pattern source on and another with the pattern source off, and detecting the pattern in the difference of the two images. If the two images are taken in sufficiently rapid succession, then, presumably, the environment, which includes other light sources, has not changed, and so the illumination pattern should be easily detectable as the only difference between the two images. This method enables detection of the pattern with a much weaker illumination source than the other methods. However, it works effectively only if there is neither motion of the imager nor motion of anything else in the environment. Any moving object (or apparent moving environment if the imager itself is moving) will register in the difference of images, making it relatively difficult to detect the illumination pattern. Furthermore, many indoor lighting sources connected to alternating current sources experience power flicker (in the case of fluorescent lights) or fluctuate (in the case of incandescent lights), typically at 50 Hz or 60 Hz, which also violates the assumption that nothing but the pattern illumination source varies between the two acquired images.