Robot vision is the combination of hardware and software algorithms to allow a robot to process its environment by gathering and processing various signals originating or interacting with the environment. Several such systems are based on the collection and analysis of light, such as laser rangefinders, structured light systems, visual odometry systems, and others. There are currently many sensors and techniques under development for underwater robot vision.
Laser-based sensors project a laser and calculate ranges based on time-of-flight calculations while making some assumptions about the scene geometry. See e.g. Cain et al., “Laser based rangefinder for underwater applications,” Proceedings of the American Control Conference, (2012). A particular method utilizes two line lasers and a camera to provide a two dimensional and three-dimensional representation of the environment. See e.g. Cain et al., “Laser based rangefinder for underwater applications,” Proceedings of the American Control Conference, (2012); see also Hanson et al., “Short-range sensor for underwater robot navigation using line-lasers and vision,” IFAC-PapersOnLine 48-16 (2015). Other approaches have also been developed. See Karras et al., “Localization of an underwater vehicle using an IMU and a laser-based vision system,” IEEE Proceedings 15th Mediterranean Conference on Control & Automation (2007); see also Jaffe, “Development of a laser line scan LiDAR imaging system for AUV use,” Scripps Institution of Oceanography, La Jolla, Calif., Final Report (2010).
Structured light is another technique receiving attention. Structured light works like laser scanners by projecting light and viewing the reflected light with a camera set at an angle. The difference is largely that light projected has a specific pattern rather than simply a point or beam. Comparing the expected pattern (assuming no object in the path of the light) to the actual return can determine the shape of the object that caused the distortion. The projected light may be black and white, colored or even at higher frequencies such as infrared or ultraviolet and may be projected in an infinite variety of patterns. See e.g. Campos et al., “Evaluation of a laser based structured light system for 3D reconstruction of underwater environments,” 5th MARTECH International Workshop on Marine Technology (2013); see also Payeur et al., “Dense stereo range sensing with marching pseudorandom patterns,” Fourth Canadian Conference on Computer and Robot Vision (2007); see also Fernandez et al., “Absolute phase mapping for one-shot dense pattern projection,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2010); and see Sarafraz et al., “A structured light method for underwater surface reconstruction. ISPRS J. Photogramm. Remote Sens (2016). Other variations may include two or more cameras at various angles to improve accuracy or to compensate for the directionality of the pattern. See e.g. Ishii, “High-speed 3D image acquisition using coded structured light projection,” in IEEE RSJ International Conference on Intelligent Robotics and Systems (2007); see also Huang et al., “Fast three-step phase-shifting algorithm,” Appl. Opt 45 (2006); see also Bruno et al., “Experimentation of structured light and stereo vision for underwater 3D reconstruction,” ISPRS J. Photogramm. Remote Sens 66(4) (2011). Different patterns may be projected sequentially and then stitched together to form a point cloud. The resolution of the resultant point cloud is limited by the resolution and complexity of the projected pattern.
Another method of robot vision is based on a technique called visual odometry, which determines the position and orientation of a robot by analyzing associated camera images. Images are acquired using either a single camera or multiple cameras working in stereo or omnidirectional cameras. Visual odometry can generally be done at a fraction of the cost and computing power of other robot vision methods and has been studied extensively. See e.g. Campbell et al., “A robust visual odometry and precipice detection system using consumer-grade monocular vision,” IEEE International Conference on Robotics and Automation (2005); see also Irani et al., “Recovery of ego-motion using image stabilization,” 1994 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1994); see also Burger et al., “Estimating 3-D egomotion from perspective image sequences,” IEEE Trans. Pattern Anal. Mach. Intell 12(11) (1990); see also Jaegle et al., “Fast, robust, continuous monocular egomotion computation,” IEEE International Conference on Robotics and Automation (2016); see also Botelho et al., “Visual odometry and mapping for underwater autonomous vehicles,” 6th Latin American Robotics Symposium (2009); and see Shakernia et al., “Omnidirectional egomotion estimation from back-projection flow,” IEEE Conference on Computer Vision and Pattern Recognition (2003).
As is well understood, the way light interacts with the ocean is peculiar and has been studied for decades. Light changes as it enters the water and as it travels to the depths it continues to change. This has a marked effect on colors as light attenuates due to absorption and scattering. Absorption is of particular interest because light at different wavelengths experiences higher or lower absorption over the same distance. For example, red light is absorbed over a short distance and may only travel up to 10 m through clear salt water, whereas green light may travel up to 25 times as far before it is absorbed. As a result, underwater photography and videography frequently requires additional light sources or filters to restore visible wavelengths of light to compensate for the absorption. The absorption of light in water is generally described by the Beer-Lambert law:Id=IOe−ad 
where Id represents the intensity of the light at a given distance d and IO represents the intensity of the light at the source. The absorption coefficient is represented by α. This represents an exponential decay proportional to the distance and absorption coefficient for a given wavelength. The absorption coefficient can be corrected for temperature and salinity, as:Φ=α+ΨT(T−273.15)+ΨsCs,
where ΨT is a wavelength dependent temperature dependence, Ψs is a salinity dependence, T is a temperature in degrees Kelvin, and Cs is a salinity. For example, for a wavelength of 620 nm in salt water, the temperature dependence ΨT is about 0.000539 m−1° C.−1 and the salinity dependence Ψs is 0.0000838 m−1g−1L. Utilizing the temperature and salinity corrected absorption coefficient Φ, the Beer-Lambert expression can be rearranged to express a distance d as:
  d  =            -              (                  1          Φ                )              ⁢          ln      ⁡              (                              I            d                                I            O                          )            
Additionally, it is understood that color may be expressed as RGB values. The RGB values are related to three standard primaries called X, Y, and Z by the International Commission on Illumination or Commission Internationale de l'Éclairage (CIE). The XYZ color space is an international standard used to define colors invariant across devices. The primaries are correlated to specific wavelengths of light. This comparison links the physical pure colors to physiological perceived colors and defines the XYZ color space and the RGB color space. The RGB color space varies between devices as a local device's interpretation of the XYZ color space standard. Typically some portion of the color space comprises a color triangle and (x,y) values correlate with corresponding RGB values between zero and one. Each red, green, and blue value that makeup a color is typically stored as an 8-bit byte for most devices, although higher resolution is available on some devices. A one corresponds to 255, and each corner is represented as (255,0,0) “red,” (0,255,0) “green,” and (0,0,255) “blue.” For every fraction of each of these values there is a corresponding wavelength of color. For example, a wavelength of 620 nm corresponds to an RGB value of (255,0,0) or (1,0,0), the brightest red. “Brightest” may be misleading and refers to the shade of red and not the typical brightness. The combination of RGB values generally indicate the color at a pixel in a digital imaging device.
It would be advantageous to provide a robot vision system using digital imaging devices to distinguish relative distances between objects in a captured image. It would be particularly advantageous if such a system could provide the relative distances in a passive manner without requiring an emission such as a laser light or a sound ping, by simply collecting images and estimating the attenuation of reflected light. It would be additionally advantageous if such a system could estimate the attenuation of reflected light using understood camera filtering techniques combined with an existing standard such as an RGB color triangle. Such relative distances passively sensed from captured images of a surrounding environment could be utilized to provide a 3D point cloud of the surrounding environment, greatly enhancing the ability of a robotic vision system to ascertain surroundings for the purpose of navigation.
These and other objects, aspects, and advantages of the present disclosure will become better understood with reference to the accompanying description and claims.