1. Field of the Invention
The present invention relates to a method and apparatus for environment recognition, and particularly to a technique for recognizing an object in three-dimensions using images photographed by a single camera while in motion. The present invention further relates to a technique for determining the motion and orientation of the camera itself using the photographed images. Preferably, a camera mounted on a vehicle is used to detect obstacles surrounding a vehicle and to determine the motion of the vehicle. However, the present invention is not limited to such applications.
2. Description of the Background Art
Conventionally, three types of sensors are known for use on vehicles to detect obstacles on a road surface, millimeter wave radar, laser radar, and vision system using photographed images.
Millimeter wave radar and laser radar are generally considered to operate very reliably under unfavorable conditions, and are adopted for practical use in auto cruise control systems. However, these sensors do not easily detect small, non-metallic obstacles such as tires and wooden boxes.
As vision systems, a variety of stereo systems have been proposed including, for example, that detailed in xe2x80x9cA High-Performance Stereo Vision System for Obstacle Detection,xe2x80x9d T. Williamson, Ph. D Thesis, The Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa., October 1998. However, a stereo system requires a plurality of cameras, which is disadvantageous considering necessary space and cost.
Further, in a stereo system, it is usually necessary to provide a baseline longer than 1 m to adequately enhance the range resolution. In addition, long focal-length lenses must be used to achieve high spatial resolution. In some systems, more than three cameras are used to better ensure reliable results. These requirements may restrict the possible camera installation positions and, as a result, reduce the range of camera field of view allowed for use.
On the other hand, use of a single camera for object recognition has also been proposed. The natural baseline between human eyes is not sufficiently long for drivers to recognize distant objects with stereopsis. Rather, drivers rely on motion stereo and/or intensity cues. By adopting such scheme in an artificial system, it is possible to recognize obstacles using only one camera and thereby reduce system cost.
In one recognition technique using motion cues, the use of optical flow has been suggested. An obstacle can be detected based on the difference in the optical flows generated by the obstacle and the background.
Specifically, optical flow vectors generated from the images of a planar road surface conform to specific equations. Optical flow vectors are vectors that connect an identical point in a continuous series of images. When a point in an image is not on the road surface, the optical flow vector of the point does not follow the equations. An object having a different height from the road surface can be recognized accordingly.
General techniques for image processing using optical flow are described in, for example, xe2x80x9cGazo Rikaixe2x80x9d, K. Kanatani, Morikita Publishing, Tokyo, 1990. Techniques are also disclosed in International Publication No. WO97/35161. These documents are incorporated herein by reference.
However, when attempting to detect an obstacle from camera images using only optical flows, accurate detection with respect to a small obstacle is difficult because the difference between the optical flow vectors of such an obstacle and the road surface is very small. Similarly, accurate detection is also difficult when the time difference in the optical flow calculation is small or when the camera motion is slow.
In the example of FIG. 1, the camera height is 1.5 m, and an object with a height of 15 cm is located 90 m ahead of the camera. In the camera image, the uppermost point of the object is in an identical position with a point on the road plane located 100 m ahead. The angle at which the camera looks down at the two points is 0.853 degrees.
If a second image is obtained after the vehicle traveled 1 m at 100 km/h, the camera then looks down at the uppermost point of the object at 0.868 degrees, while the viewing angle with respect to the aforementioned point on the road plane is 0.869 degrees. The difference between these angles is extremely small. Under such conditions, it is difficult to detect the obstacle by comparing the optical flow vectors.
Although problems in obstacle detection was explained above using an example based on a vehicle-mounted camera, similar problems also exist in other known recognition techniques. Other techniques related to the present invention include those discussed in xe2x80x9cA Specialized Multibaseline Stereo Technique for Obstacle Detection,xe2x80x9d T. Williamson and C. Thorpe, Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR ""98), Santa Barbara, Calif., June 1998, and in xe2x80x9cDetection of Small Obstacles at Long Range Using Multibaseline Stereo,xe2x80x9d T. Williamson and C. Thorpe, Proceedings of the 1998 IEEE International Conference on Intelligent Vehicles, Stuttgart, Germany, October 1998.
The present invention was created in light of the above problems. The primary object of the present invention is to provide a method and apparatus that can enhance recognition capability with respect to small objects.
To accomplish the above object, the present invention provides a method for recognizing, through image processing, an object from images captured by photographing a surrounding region with a camera. According to the present invention, a sequence of images is captured using a single camera in motion. The camera movement may be a displacement relative to the object. A possible object captured in an image is identified, and the identified possible object is tracked within the image sequence. Three-dimensional object information is generated based on information obtained by the tracking concerning changes in the images of the possible object.
The three-dimensional object information may concern, for example, height, location, width, or shape. The object information may include simple information such as the presence of protrusion of the object from the background. Preferably, dimensions, such as height, of the object protrusion are included in the information.
As the present invention tracks a possible object, differences between the image movement of the possible object and that of portions other than the possible object are more apparent, and object recognition ability is enhanced. Accurate object recognition is possible even for small objects, even when the time interval between captured images is short (high capture rate), and even when the camera is moving slow.
The present invention is effective even when the object of recognition and the background have similar colors (intensity). A portion that has a similar color to the background can be provisionally identified as a possible object and tracked. Based on the data collected during the tracking, it is judged whether or not the possible object is a real object. For example, a judgement is made as to whether or not the possible object protrudes from the road plane. In this way, the present invention can similarly enhance recognition capability related to photographing conditions.
Preferably, motion of the camera during the tracking is measured, and the tracking information is processed along with data for the determined camera motion. Generally, camera motion would be equivalent to the motion of the moving structure on which the camera is mounted. By taking into account such motion, success of recognition of the location, size, and other information on the object is enhanced. More preferably, camera pose is also determined along with camera motion, and the tracking information is processed based on the determined camera motion and pose. Pose includes orientation and location. By taking into account such motion and pose, location of the object relative to the camera can be reliably recognized.
Further, motion and pose are preferably determined using the image sequence. The image sequence photographed by a single camera is used not only for the object recognition, but also for the determination of the camera motion and pose on which the object recognition is based. This eliminates the need for sensors exclusively for the detection of various parameters related to motion and pose, and provides an advantage with regard to cost.
Detection signals from a camera motion sensor can also be used in addition to the image sequence when determining motion and pose, thereby increasing reliability.
Preferably, when determining motion and pose, flow parameters in image coordinates of the image sequence are converted into physical parameters in three-dimensional coordinates, and the motion and pose are then calculated. The optical flow parameters in image coordinates are not suitable for accumulation (integral) processing. Physical parameters in three-dimensional coordinates, on the other hand, can be easily accumulated and used for the determination of motion and pose during tracking.
Considering a case in which optical flow parameters are accumulated, a difference may be obtained in the accumulation results at a level that allows distinguishing between the object and the background. However, physical values, i.e., dimensions such as height, size, distance, and width, cannot be determined from the accumulation of flow parameters. The present invention, in contrast, accumulates physical parameters in three-dimensional coordinates, allowing determination of how the camera moved during the tracking. Based on the accumulated information and the movement of the possible object in the images, the three-dimensional shape of the object can be physically identified, allowing calculation of any desired physical values such as dimensions. In this way, the present invention enables precise recognition of object information, this being one major advantage of the present invention.
A Kalman Filter capable of non-linear processing may preferably be used for motion and pose determination to successfully process non-linear data and to reduce influences of noise in the images.
Further, gradient of the surface on which the camera moves may be determined. Recognition processing is performed while relating camera motion and gradient to the tracking information to thereby recognize objects more accurately.
Preferably, in determining the gradient, gradient information is obtained based on the difference between the estimated pitch angle estimated from the image sequence as the camera motion and the detected pitch angle detected using a sensor. It may similarly be preferable in this case to use a Kalman Filter capable of non-linear processing.
It may also be preferred to use a Kalman Filter capable of non-linear processing in the recognition step. Preferably, when a new possible object is detected, a Kalman Filter is assigned to the new possible object. A plurality of Kalman Filters are used to perform recognition processing for a plurality of possible objects. Accordingly, multiple objects successively appearing in the images can be favorably detected.
Preferably, each of the images are divided into a plurality of sections. Based on the results of the recognition processing with respect to each of the divided sections, information on unevenness between the sections is obtained. In this case, irregularities in the surface on which the camera moves are determined by treating each divided section as a possible object during the recognition processing.
One aspect of the present invention relates to a method or apparatus for recognizing, through image processing, an object captured in images. According to the present invention, a sequence of images is obtained by photographing with a single camera in motion. The photographed object captured in images is tracked within the image sequence. Based on information obtained by the tracking concerning positional changes of the images of the photographed object, three-dimensional information on the photographed object is generated.
Although the recognition technique of the present invention is suitable for application in obstacle detection in moving vehicles, the present invention is not limited to such use. For example, the present invention may be used for controlling any desired vehicle or for creating three-dimensional maps. Further, the present invention may be implemented in structures other than vehicles, such as surveillance cameras.
While in the above aspects of the environment recognition method and apparatus were described in connection with an object of the present invention being the provision of an improved environment recognition technique, the present invention is not limited to this aspect.
An additional object of the present invention is to provide a method and apparatus for successful motion detection. According to one aspect of the present invention, motion is determined using images captured by a single camera in motion. During determination processing, optical flows are suitably converted into physical parameters in three-dimensional coordinates. A Kalman Filter is favorably used for the determination processing. In this way, the motion of the camera itself or the moving structure on which the camera is mounted is determined. Pose can be determined together with motion, or pose alone may be determined. Use of the results are not limited to environment recognition. For example, by mounting the camera on a vehicle, the determined results can be used for vehicle control including control of various actuators (engine, brake, transmission, or steering devices).
In another aspect, the present invention takes the form of a gradient determining method and apparatus using photographed images to determine gradient of a surface along which a camera moves. The gradient is determined based on an estimated pitch angle obtained by image processing and a detected pitch angle obtained through a sensor. The determined gradient can be used for object recognition and other purposes, such as vehicle control.