Definitions:
The following non-limiting definitions are used to aid the reader with this patent application.
Autostereoscopic: A self-contained stereoscopic display device that supplies different views to the two eyes of a viewer without requiring additional external optical or electronic display devices.
Capture board: A printed circuit board that houses the imaging sub-system and optical components within the camera head.
Catadioptric: Pertaining to, produced by, or involving, both the reflection and refraction of light; with regard to lenses, this is a long lens that uses mirrors within its construction, allowing an extremely long focal length lens to fit within a relatively short barrel. Also known as reflex or mirror lens.
Extrapupillary: Outside of the pupils of two eyes; When used with “separation distances”, this is a distance greater than the distance between the centers of the pupils for a given individual. An average interpupillary distance is around 2.5 inches, so extrapupillary separation distances would exceed that.
Fail-soft: The operational characteristic of a system in which failures of one or more components cause a degradation in performance but not total inoperability of all functions.
Far-field: Visual areas far away from a camera; subjectively determined.
Gen-lock: The process of synchronizing more than one video frame to start at the same time.
HMD: Head-mounted display; a display device worn on the head and used to present separate or identical electronic images to each eye.
Hyper-stereo/hyper-stereoscopic: Still or video images that are produced using cameras separated by a distance larger than the normal separation distance of human eyes, about 2.5 inches.
Imager: An electronic light-gathering device, such as a charge-coupled device (CCD), charge-injection device (CID), or CMOS visual sensor device used for capturing images; may be in the form of a strip or an array of light-gathering picture elements (pixels). The “pixel density” of an imager is the amount of light-gathering elements in a given unit area.
Immersive imaging: The capture, display and/or manipulation of images that give a sense to the viewer of being surrounded by the images (“immersed” in them).
Interocular: Between two optical components. When used with “separation distances”, this is the distance between two optical systems used to create stereoscopic images.
Interpolation: The determination of intermediate values between known existing values. Interpolation refers to the determination of image data values at pixel locations between other pixel locations using known data from existing pixels locations. An “interpolator” or “interpolation processor” is a device, set of processes, circuits, or other means for calculating interpolation values.
Interpupillary: Between the pupils of two eyes. When used with “separation distances”, this is the distance between the centers of the pupils for a given individual. Normally for humans, this distance averages around 2.5 inches.
Monoscopic: Representations providing only a single visual perspective for each point in the field-of-view, as might be seen using just a single eye or captured using just a single imaging sub-system.
Near-field: Visual areas within a few feet of a camera; subjectively determined.
Normalize: A process or procedure by which non-standard values are brought back to standard value ranges; in the context of intrapupillary and extrapupillary separation distances, normalizing implies correcting images captured at extrapupillary separation distances to make them appear as if captured at average interpupillary separation distances.
Optical plane: The plane in and about which light rays may be captured and viewed; relates to the plane in which a plurality of optical components and image sensing devices may be physically located; images derived from components not on the same optical plane would require manipulation of the images to correct for changing tilt angles.
Panoramic: A wide field-of-view that encompasses 360° in the horizontal plane (the horizon in all directions) and a limited number of degrees less than 180° in the vertical plane; does not usually include image data from directly over and under the image capture device.
Panospheric/Spherical: A wide field-of-view that encompasses all of the visual field around an image capture device; in most practical devices, this excludes the area under the support structure of that device.
Stereopsis: The perception of stereoscopic viewing effects induced by the presentation of slightly different views to a viewer's two eyes. The “effective stereo viewing range” is that distance from a two-eyed viewer or stereoscopic camera in which stereopsis is readily perceived by individuals with normal viewing capabilities.
Telepresence: The presentation of remotely-acquired image and audio information in such a way as to give the person experiencing the information a sense of presence at the remote location.
Triangulation: A method for determining the location of objects in two- or three-dimensional space through trigonometric calculations using information from a plurality of sensors who locations in relationship to each other is known.
Immersive imaging is defined as still or video images that provide a realistic or pseudo-realistic visual representation of an environment in such a way that the person viewing the images feels “immersed” in them (fully surrounded, as when immersed in water). With improvements in digital imaging components and processing speeds, there is increasing interest in immersive imaging because electronic and computer technologies can now practically support it. These immersive images are sometimes called wide-angle, panoramic, spherical or panospheric, based on the extent of viewing field that is shown.
There are a number of useful purposes for products based on this technology. Such a camera system is useful for security purposes for monitoring large public areas, such as common areas around a university campus, and it is equally valuable for protecting dormitories or military barracks either domestically or in foreign countries. Asset protection is another valuable use, in which voluminous warehouses and their external traffic areas, supply depots or stockyards can be viewed from a central camera location, yielding reduced deployment cost, or multiple locations, yielding more extensive coverage as well as redundancy. Some models of these devices are practical for surveillance, since the camera head is small, and this function is appropriate for airports, other transportation hubs, in public buildings, and in military surveillance applications. A similar use includes deployment of the system in battlefield settings to track movements and provide comprehensive first-hand views of combat situations, and it is equally effective in urban police and fire-rescue actions and when coupled with a robotic vehicle in hazardous surroundings.
Another military use is that of surveillance of vehicles in the area surrounding warships. In this application, it is important to track and identify objects in the area of a ship whether it is moored in a harbor or tied up to a dock. In many ports, active sensing systems such as radar and sonar are considered intrusive by the foreign host country or municipality. Accordingly, a need exists for imaging systems that calculate distances and can track objects through passive imaging means over widely varying distance ranges.
Additional uses may be found in the commercial world. A full 360° stereoscopic video transmission would be valuable in promoting retail locations, sporting/entertainment events, real estate properties for sale, and news events to remote audiences, providing an audio and video experience similar to being on-location but without the incumbent travel or time expenditures. Similarly, videoconferencing technologies save significant amounts of money for businesses by reducing travel for face-to-face meetings, but the experience can be significantly enhanced by providing images of meeting counterparts in 3 dimensions. Immersive travel and educational video is supported by this design, and the commercial potential for such uses is staggering. Some of the attraction for the technology is related to the wide FOV, but the major attraction is in coupling this characteristic with true stereoscopic capture and viewing.
The scientific research is voluminous on the specific benefits inherent in stereoscopic image acquisition and viewing. Stereoscopy presents a more natural experience that supports smoother conscious performance of tasks (For general information refer to Rosenberg, L. B. “The Effect of Interocular Distance upon Depth Perception when Using Stereoscopic Displays to Perform Work within Virtual and Telepresent Environments,” Interim Report for Project Number 7231, January–June 1992, carried out at Stanford University Center for Design Research.) partly due to the fact that stereoscopic views stimulate pre-attentive processing in the primary visual cortex for faster subconscious recognition (For general information refer to <http://gatsby.ucl.ac.uk/˜zhaoping/preattentivevision.html>; <http://www.ccom-jhc.unh.edu/vislab/VisCourse/PreAttentive.html>).
Superior object positioning during remote manipulation has been demonstrated through a number of studies (For general information refer to Merritt, J. O., Cole, R. E., Ikehara, C.: “Interaction Between Binocular and Monocular Depth Cues in Teleoperator Task Performance,” Society for Information Display International Symposium Digest of Technical Papers, Playa del Rey, Calif.: Society for Information Display, May 1992; Merritt, J. O., Cole, R. E., Ikehara, C.: “A Rapid-Sequential-Positioning Task for Evaluating Motion Parallax and Stereoscopic 3D Cues in Teleoperator Displays,” IEEE Conference Proceedings on Systems, Man, and Cybernetics, 91CH3067-6, October 1991, pp. 1041–1047; Smith, D. C., Cole, R. E., Merritt, J. O., Pepper, R. L.: “Remote Operator Performance Comparing Mono and Stereo TV Displays: The Effects of Visibility, Learning, and Task Factors,” Kailua, Hi.: Naval Ocean Systems Center, Hawaii Laboratory, Technical Report 380, February 1979; Spain, E. H., Holzhausen, K. P.: “Stereoscopic versus orthogonal view displays for performance of a remote manipulator task,” Stereoscopic Displays and Applications II: Proceedings of the SPIE, Vol. 1457. Bellingham, Wash.: Society of Photo-Optical Instrumentation Engineers, February 1991; Touris, T. C., Eichenlaub, J. B., Merritt, J. O.: “Autostereoscopic Display Technology in Teleoperation Applications,” Proceedings of the SPIE, Vol. 1833. Bellingham, Wash.: Society of Photo-Optical Instrumentation Engineers, February 1993.), and it is particularly useful in remote explosives handling (For general information refer to Dracsic, D., Grodski, J. J.: “Using Stereoscopic Video for Defense Teleoperation,” Stereoscopic Displays and Applications W: Proceedings of the SPIE, Vol. 1915. Bellingham, Wash.: Society of Photo-Optical Instrumentation Engineers, February 1993). These advantages are present using either autostereoscopic or stereoscopic head-mounted displays (HMD), in which distance estimation errors for short distances were significantly decreased through use of a stereoscopic display device (For general information refer to Singer, M. J., Ehrlich, J., Cinq-Mars, S., Papin, J.: “Task Performance in Virtual Environments: Stereoscopic vs. Monoscopic Displays and Head-Coupling”, U.S. Army Research Institute for the Behavioral and Social Sciences, Technical Report 1034, December 1995). Stereoscopic imaging also provides more accurate distance recognition than ordinary estimation techniques, which when added to directional information yields effective real-time targeting methods that are easily automated for visually tracking moving people or objects without intrusive sensing techniques.
Current technology has not heretofore provided all of the features and quality necessary for truly immersive experiences, so this invention contributes by advancing the state-of-the-art in video acquisition and signal management. The nature of conventional video cameras is that they supply only limited depth and width of field, and the same is true of most still image cameras not employing ultra-wide-angle lenses. Several approaches to wide FOV imaging are available that use single sensors for the capture apparatus or employ various optical or motorized mechanisms, but these have limitations. There are also several techniques for capturing stereoscopic images, but these are usually incompatible with full 360° views at all times. The following technologies are examples of devices used for stereoscopic or wide FOV coverage of visual environments.
Stereoscopic images were originally taken early in the 1900's by exposing two negatives in side-by-side cameras, producing a pair of images that could be viewed with a companion photograph holder designed to position the combined pair of images at the right distance from each other and from the eyes. These initial camera systems were both bulky and limited to still-life compositions, but they provided a higher sense of reality through the depth effect of stereo viewing. Accordingly, a need exists for practical portable video systems that can produce the stereoscopic viewing effect with moving images.
Related Art
More recently, video cameras have been paired-up to capture stereoscopic images and video streams. Coupled with mechanisms for moving the camera pair around, pan-tilt-zoom (PTZ) mechanisms support wide scope of visual field through such movement and have the potential for optical zooms that can magnify scenes, improving resolution by trading it for reduced FOV. Lipton (For general information refer to U.S. Pat. No. 4,418,993), and Miura et al. (For general information refer to U.S. Pat. No. 4,879,596) disclose paired camera systems that can be moved about to cover wider collective FOV than stationary camera systems.
However, there are drawbacks to that mechanism. The motorized mechanisms of PTZ cameras have lag times associated with physical movement, high power consumption, limited rotational extents, FOV obstructions by motor and supports, maintenance and mechanical wear of moving components, and the risk of loss of calibration over time. There is also a concern over response time, size, weight, and reliability. Most critically, PTZ cameras can only capture the visual field where they are directly pointed at any given instant in time, limiting the usefulness of such a system for telepresent operations or viewing. Nonetheless, there are many commercial suppliers for monoscopic versions of PTZ cameras, including Pelco (For more information, refer to Pelco Spectra II: <http://www.pelco.com/catalog/camerasite/camerasystems/spectra/21487.htm> which is hereby incorporated hereinto its entirety), Panasonic (For more information refer to Panasonic WV-CS854 Unitized Dome Camera: <http://cctv.panasonic.com/specsheets/WV-Cs854A.pdf> which is hereby incorporated hereinto in its entirety), Sensormatic (For more information refer to Sensormatic SpeedDomes: <http://www.sensormatic.com/vsd/SEC/domes.htm> which is hereby incorporated hereinto in its entirety) and Everest VIT (For more information refer to Everest VIT: <http://www.everstvit.com/ptz/index.html> which is hereby incorporated hereinto in its entirety). There is, therefore, a need for a stereoscopic image and video acquisition system that can capture the full panoramic visual fields around the camera system without the drawbacks of mechanical movements.
As an alternative to physically moving entire camera systems, a flat mirror can be pivoted around above an upward-directed camera, allowing different areas around the camera to be captured, as is shown by Morgan (For general information refer to U.S. Pat. No. 4,499,490). Similar to PTZ systems, this design can only view or acquire a portion of a surrounding environment at any given instant in time.
Catadioptric approaches can provide a wide visual field at video rates, covering an entire 360° FOV (horizontally) seamlessly by reflecting the entire surrounding scene onto a single image sensor. Representative patents for curved reflectors (some also include refractive optics) (For general information refer to: U.S. Pat. Nos. 5,845,713; 5,790,181; 5,790,182; 6,003,998; 5,473,474; 5,920,376 and 4,566,763). Commercial suppliers include BeHere (For more information refer to BeHere: <http://www.behere.com> and “New Movies Take on a Different Perspective”, pg. 6G, Sun-Sentinel, South Florida, Jun. 11, 2000 which is hereby incorporated hereinto in its entirety) and RemoteReality (For more information refer to RemoteReality: <http://www.remotereality.com/products/paramax.html> which is hereby incorporated hereinto in its entirety), and many universities have explored this approach, including Carnegie Mellon, MIT, Columbia, LeHigh, Kyoto, UC-San Diego (For more information refer to University listing page for omnidirectional designs: <http://www.cis.upenn.edu/˜kostas/omni.html> which is hereby incorporated hereinto in its entirety). A number of patents have been awarded that employ various reflective and refractive optics for wide FOV acquisition (For more information refer to U.S. Pat. No. 5,854,713 (Kuroda+, 1998); U.S. Pat. No. 5,790,181 (Chahl+, 1998); U.S. Pat. Nos. 6,003,998 and 5,790,182 (St. Hillaire, 1999 and 1998); U.S. Pat. No. 5,990,934 (Nalwa, 1999); U.S. Pat. No. 5,920,337 (Glassman+,1999) which is hereby incorporated hereinto in its entirety), and these are used extensively in consumer grade telescopes. These types of cameras have shortcomings inherent in their design approach, though. Their resolution is limited to that of the individual image sensors that are used to pick up the light, so high resolution requires high priced sensors.
The nature of a catadioptric optical system is that distortions yield a non-uniform pixel distribution across the surface of the individual imager which has an impact on the quality and subsequent extent of digital zooming and magnification of prints. Distance measuring (ranging) is limited to estimations, which are potentially computation-intensive and prone to error. Alternatively, accessory sensing modes and devices must be added to perform this function. Because the images are distorted by the wide-angle lens, de-warping software should be designed for real-time use in video applications. Accordingly, there is a need for a camera system that can achieve a more uniform and higher resolution distribution of the light from objects onto the surface of the imager components.
Rather than use a curved reflective surface such as the one employed by the BeHere system, a related approach involves a downward-pointed reflective pyramid coupled with multiple image sensors beneath the pyramid. This approach is explained (For general information refer to U.S. Pat. No. 5,990,934 and U.S. Pat. No. 5,745,305) by V. Nalwa of Lucent Technologies. With this approach, the sensors point up at the pyramid, which reflects the scene from 360° around the camera horizontally. By using one image sensor for each reflective face of the pyramid, this method improves overall resolution over single-imager/reflector systems. While the use of multiple sensors increases the number of image-sensing elements involved in scene capture, this method suffers from an inefficient use of image sensors, since it is not possible to fully inscribe a triangle within a square with complete coverage. There is also a requirement for seaming across four overlapping edges. Accordingly, there exists a need for an efficient image capturing method that provides high resolution and high uniformity of pixel distribution with its wide FOV.
A complete spherical view can be achieved with multiple cameras pointing in all directions, providing excellent resolution in portions of the spherical images that are viewed in normal aspect ratio on conventional devices, such as TVs or monitors. One such system is a dodecahedral camera design from iMove Inc. (For more information refer to iMove Inc.: <http://www.imoveinc.com> which is hereby incorporated hereinto in its entirety), which captures a complete spherical view of the entire environment and does so at video rates. (For general information refer to U.S. Pat. Nos. 5,023,725 and 6,141,034 by McCutcheon). The design emphasis is on complete spherical coverage by a collection of cameras, with as little overlap between cameras as possible to maximize efficiency in capture. Based on its design, however, there is no practical stereoscopic video acquisition, as well as no true distance measuring.
In addition, current implementations have a large physical size and significant communication requirements. Systems are currently only available for rent, adding to the limitations for their use in many applications. University of Maryland has developed a similar structure with multiple imagers arrayed on a spherical surface (For more information refer to University of Maryland: <http://www.cfar.umd.edu/˜larson/EyesFromEyes.html> which is hereby incorporated hereinto its entirety) that is useful for many different types of analysis, but, at this time, it is available for research-oriented purposes only. There is therefore a need for a very wide FOV image capture system that can produce true stereoscopic images and video, while doing so in practical packages.
Another approach for acquiring large FOV scenes is through using very-wide-angle optics such as fisheye lenses, as demonstrated by McCall et al. (For general information refer to U.S. Pat. No. 6,002,430). Combining two such lenses back-to-back allows capture of almost a full spherical view but introduces image construction problems unique to the design. Dual hemispheric lens systems like this one from Internet Pictures Corp. (For more information refer to Internet Pictures Corp.: <http://www.ipix.com> which is hereby incorporated hereinto in its entirety), for example, have edge-seaming requirements in difficult-to-seam areas (the low resolution portions of the image), fisheye effects (changing apparent depth based on direction of view), non-uniform pixel distributions over the face of the imagers, and limited resolution. The system cannot acquire stereoscopic video in real-time. Nor can it measure distances.
One of the simpler ways to use multiple imagers is to place all of them on the same plane radially directed outward from a central point. The benefit of such a design is the ability to produce images that can be viewed remotely as if the viewing person were at that central point of the camera system. Such a design has been disclosed by Henley (For general information refer to U.S. Pat. No. 5,657,073), and it is capable of producing either panoramic or panospheric output images that can be panned, tilted, rotated or digitally zoomed through an accessory controller. This is a useful approach for monoscopic views of an environment, but it is incapable of capturing full stereoscopic views or distance measurements. Accordingly, a wide FOV stereoscopic video capture system is needed to support more realistic representations of visual environments.
Another way to put multiple imagers together in a planar fashion radiating out from a central point is disclosed by Rogina et al. (For general information refer to U.S. Pat. No. 5,703,961). Rogina's method supports high resolution and real-time stereoscopic acquisition, but the radial planar configuration of multiple imagers limits viewing of near-field subjects, requires a large number of imagers taking up a large physical space, and consumes significant communication bandwidth. While this provides a middle ground between individual imager systems and full spherical arrays, it is expensive to produce and is not designed to independently measure distances without additional external sensors.
Judd et al. (For general information refer to U.S. Pat. No. 5,612,533) and Glassman et al. (For general information refer to U.S. Pat. No. 5,920,337) disclose improving the size requirements by mounting light sensors on a horizontal planar substrate and using a reflective mirror or prism next to each sensor and a lens at the edge of the planar substrate. Processing of the image data from the collection of imagers then forms a panoramic digital data strip that can be viewed as a continuous panoramic image. There is no provision, however, for stereoscopic capture or distance measurement in this radial arrangement of components.
The RingCam omnidirectional camera (For more information refer to Microsoft: <http://research.microsoft.com/˜rcutler/ringcam/ringcam.html> which is hereby incorporated hereinto in its entirety) from Microsoft is a planar layout of low-cost imagers that achieves good resolution. Rather than direct all imagers outward radially from a central point, however, the RingCam rotationally offsets the camera sub-systems so that the full 360° panorama can be covered including much of the near-field without being occluded by the other camera sub-systems in the layout. This method does not support stereoscopy or ranging (i.e., distance measurements).
Yamamura et al. describe a different planar arrangement of three cameras to capture a wide FOV (For general information refer to U.S. Pat. No. 5,880,815) which is directed to preventing overlap or lack of image content between cameras. The use of mirrors allows the inventors to keep all of the three cameras on the same plane and line up the edges of the three images for seamless presentation. This method does not encompass full 360° panoramic acquisition, nor is it capable of stereoscopy.
Hankawa et al. (For general information refer to U.S. Pat. No. 5,727,239) describes a photography apparatus that can change the arrangement of optical paths through reflective members to capture either wide FOV or stereoscopic images on a single image sensor or film. This system is significantly limited in resolution and the extent of its FOV, not acquiring even 180° of visual content. Katayama et al. (For general information refer to U.S. Pat. No. 5,668,595) describes a different apparatus that can alternate between “panoramic” (though nowhere near 360°) and stereoscopic modes through the use of 2-imager sub-systems, but this is likewise limited in the extent of its FOV. Katayama et al. disclose a method for achieving high image resolution by integrating the outputs of multiple imager sub-systems (For general information refer to U.S. Pat. No. 5,668,595). This system targets detection and correction of misalignments and keystone distortions that might occur between two camera sub-systems. The method does not teach the coincident alignment of multiple imaging sub-systems through manufacturing processes to ensure inter-pixel spacing for high resolution.
Due to the aforementioned limitations of other techniques and to support immersive imaging and telepresence, there exists a need for a very-wide-FOV stereoscopic imaging system that can capture stereoscopic images and measure distances to objects with few to no moving parts.