Ever since early humans drew images of their world on cave walls, mankind has endeavored to create images of the environment in which we live. Over the subsequent several millennia, man continued to develop his image recording techniques. With the exceptions of sculpture and other carvings the majority of images were recorded on two-dimensional surfaces, like those of paintings or tapestries. Over time, artisans succeeded in developing perspective and chiaroscuro techniques to add a greater sense of depth to their two-dimensional works.
In the early part of the nineteenth century artistic skills were augmented with mechanical and chemical advancements, as well as a better understanding of human visual mechanics. Charles Wheatstone realized that each of the human eyes viewed an object from a slightly different horizontal point of view. Armed with this knowledge, he invented the stereoscope in 1832. His invention was the birth of stereoscopic imaging.
The ensuing discovery of the physiological phenomenon of persistence of vision led to the invention of parlor devices like William Horner's zoetrope (patented 1834) that allowed the viewing of images with the illusion of movement.
The invention of Daguerreotype photography in 1839 by Louis Daguerre, as well as the subsequent development by William Henry Fox Talbot of a system of negative recording and positive reproduction, allowed for the accurate documenting of real images on a two-dimensional surface. In 1849, the Scottish physicist David Brewster developed the Stereopticon, a convenient device for viewing stereoscopic photographs.
The latter part of the 1800's saw the development of flexible photographic film by George Eastman and a workable motion picture camera/projection system by Thomas Edison's New Jersey laboratories. On Dec. 28, 1895, the Lumieres brothers held the first public screening of Cinematographe films at the Grand Cafe, Boulevard des Capucines, Paris, and the movies were born. Shortly thereafter, British film pioneer William Friese-Greene filed a patent for a stereoscopic movie process consisting of two films projected side by side on screen and viewed through a stereoscope to converge the two images.
A. A. Campbell Swinton, a Scottish electrical engineer, outlined in 1906 a method that laid the foundation for the modern television. By 1932, the Radio Corporation of America (RCA) had demonstrated a 120-line resolution, all-electric television system.
In the hundred years from 1832 to 1932, the world saw the development and successful marketing of the fundamental systems for two-dimensional and three-dimensional stereoscopic motion pictures and television systems. This period established the human perceptual mechanisms whose building blocks are the basis for all modern image capture and display technologies.
The recent development of cost effective portable computers, high speed Internet, digital imaging, and high speed/capacity digital storage and multi format flat screen displays has made motion imagery ubiquitous. Technology now allows everyone to have a television/phone/computer/music player/etc. in their pocket.
Despite the continued advances in digital imaging and display technologies, the basic underlying human psychophysical visual mechanics that are exploited to create the perception of three-dimensions has remained unchanged for the last one hundred fifteen plus years.
Human Visual Mechanisms
Visual perception is the interpretation by the brain of what the eyes see. Human brains have certain innate visual mechanisms to assist in process of perception. These mechanisms include a propensity to make certain assumptions about the images that are being seen based on limited information. Examples of this include certain human visual mechanisms having to do with recognition and object occlusion.
Humans perceive images on displays devices like television and computer monitors because the devices present information in a manner that exploits certain visual mechanisms that have to do with motion and color perception. Images displayed on television, in motion pictures, and on computers do not continuously move. Instead, these devices present a series of still images with spatial separations in a manner that can be perceived by the brain as fluid movement.
Color displays also work in a comparable manner. Humans may be able to perceive millions of colors on a computer monitor, but the monitor itself produces only three particular colors, namely red, green, and blue. The illusion of additional colors are produced by presenting these three colors in a particular relationship to one another that exploits certain color visual perception mechanisms and thereby creating the illusion of a full spectrum of colors.
It is known that the act of visual perception is a cognitive exercise and not merely a stimulus response. In other words, perception is a learned ability which we develop from infancy. Binocular vision is the preferred method for capturing parallax information by humans and certain animals. However, other living organisms without the luxury of significant overlapping fields of view have developed other mechanisms to determine spatial relationships.
Certain insects and animals determine relative spatial depth of a scene by simply moving one eye from side to side or up and down. A pigeon bobbing its head back and forth as it walks is a good example of this action. The oscillating eye movement presents motion parallax depth information over time. This allows for the determination of depth order by the relative movement of objects in the scene. Humans also possess the ability to process visual parallax information presented over time.
The fundamentals of human sight are based on the fact that we have two eyes that look forward with visual fields that overlap. The eyes focus on an object by a means called accommodation. This function is performed simultaneously with a convergence of the eyes. Each eye records a two-dimensional image of the object from a slightly different point of view (or “parallax position”) on to the retinas. The two two-dimensional images are transmitted along the optical nerves to the brain's visual cortex and fused over time into a three-dimensional perception of the object through a process called stereopsis. The object's three-dimensionality exists only in the brain—not in the eyes.
Humans are able to perceive two-dimensional photographs, graphics, television and motion pictures because we have all learned to read three-dimensionality into a two-dimensional image using monocular cues like linear perspective, overlapping images, motion, relative size, and light and shadow cues. However, monocular cues only provide a limited amount of dimensional and spatial information. True three-dimensionality requires the addition of parallax depth information.
Methods and apparatus for producing three-dimensional illusions have to some extent paralleled the increased understanding of the physiology of human depth perception as well as developments in image manipulation through analog/digital signal processing and computer imaging software.
Perception of three-dimensional space depends on various kinds of information in the scene being viewed including monocular cues and binocular cues, for example. Monocular cues include elements such as relative size, linear perspective, interposition, highlights, and shadows. Binocular cues include retinal disparity, accommodation, convergence, and learned cues including a familiarity with the subject matter. While all these factors may contribute to creating a perception of three-dimensional space in a scene, retinal disparity may provide one of the most important sources of information for creating a three-dimensional perception. Particularly, retinal disparity results in parallax information (i.e., an apparent change in the position, direction of motion, or other visual characteristics of an object caused by different observational positions) being supplied to the brain. Because each eye has a different observational position, each eye can provide a slightly different view of the same scene. The differences between the views represent parallax information that the brain can use to perceive three dimensional aspects of a scene. In addition to parallax, there are several visual system sub-processes that also contribute to the mechanics of perception.
A distinction exists between monocular depth cues and parallax cues in the visual information received. Both eyes provide essentially the same monocular depth cues, but each eye provides different parallax depth information, a difference that is essential for producing a true three-dimensional perception. Depth information may be perceived, to a certain extent, in a two-dimensional image. For example, monocular depth may be perceived when viewing a still photograph, a painting, standard television and movies, or when looking at a scene with one eye closed. Monocular depth is perceived without the benefit of binocular parallax depth information. Such depth relations are interpreted by the brain from monocular depth cues such as relative size, overlapping, perspective, and shading. To interpret monocular depth information from a two-dimensional image (i.e., using monocular cues to indicate a three-dimensional space on a two-dimensional plane), the viewer is actually reading depth information into the image through a process learned in childhood.
As previously stated, three-dimensional visual perception is a series of cognitive exercises built on fragmentary information. In his 1995 book, Foundations of Vision, hereby incorporated by reference, Brian Wandell states,
Perception is an interpretation of the retinal image, not a description.                Information in the retinal image may be interpreted in many different ways. Because we begin with ambiguous information, we cannot make deductions from the retinal image, only inferences . . . we have learned that the visual system succeeds in interpreting images because of statistical regularities present in the visual environment and hence in the retinal image. These regularities permit the visual system to use fragmentary information present in the retinal image to draw accurate inferences about the physical cause of the image. For example, when we make inferences from the retinal image, the knowledge that we live in a three-dimensional world is essential to the correct interpretation of the image. Often, we are made aware of the existence of these powerful interpretations and their assumptions when they are in error, that is, when we discover a visual illusion.        
In addition, the following publications regarding three-dimensional perception are also herein incorporated by reference:    1. Rock, I. The Logic of Perception. Cambridge, Mass.: MIT Press, 1985.    2. Churchland, P. et al. The Computational Brain. Cambridge, Mass.: MIT Press, 1992.    3. Tomlin, P. “Maintaining the Three-dimensional Illusion.” Information Display, Dec. 1987:11-14    4. Ogle, K. N. “Some Aspects of Stereoscopic Depth Perception.” Journal of the Optical Society of America 57, no. 9 (1967): 1073-1081.    5. Marr, D. Vision. San Francisco: W. H. Freeman, 1982.    6. Jones, E. et al. “Visual Image Depth Enhancement by Parallax Induction.” Advances in Display Technology IV, SPIE Proceedings. Society of Photo-Optical Instrumentation Engineers, 1984. 16.    7. McLaurin, A. P. et al. “Visual Image Depth Enhancement Process: An Approach to Three-Dimensional Imaging.” Displays 7, no. 3 (1986): 112.    8. Mayhew, C. A. “Texture and Depth Enhancement for Motion Pictures and Television.” SMPTE Journal 9, no. 10 (1990): 809-814.    9. Mayhew, C. A. “True 3-Dimensional Broadcast Television without Glasses.” NAB Engineering Proceedings. Altanta, 1990. 478. (revised version)    10. Mayhew, C. A. “Vision III Single-Camera Autostereoscopic Methods.” SMTPE Journal 100 (1991): 411-416.    Mayhew, C. A. “A 35 mm Autostereoscopic System for Live-Action Imaging Using a Single Camera and Lens.” SMTPE Journal 102 (1993): 505-511.    12. Mayhew, C. A. “Parallax Scanning Using a Single Lens.” SPIE Stereoscopic Displays and Vitrual Reality Systems III Proceedings. San Jose, 1996.154-160.    13. Proffitt, D. et al. “Perceived depth is enhanced with parallax scanning.” University of Virginia—Cognitive Science Department, March 1999.    14. Subramanian, A. et al. “Segmentation and Range Sensing Using a Moving-Aperture Lens.” Proceedings of the 8th IEEE International Conference on Computer Vision (ICCV 2001). Vancouver, 2001.500-507.    15. Mayhew, C. A. et al. “Three-dimensional visualization of geographical terrain data using temporal parallax difference induction.” Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7240, 72401H, San Jose, Calif., 2009    16. Serrano-Pedraza, P. et al. “Stereo vision requires an explicit encoding of vertical disparity.” Journal of Vision, 9 (4):3. 1-13, Apr. 3, 2009.    17. Farell, B. “Orientation-Specific Computation in Stereoscopic Vision.” The Journal of Neuroscience, Sep. 6, 2006-26(36):9090-9106    18. Teichert, T. et al “Depth perception during saccades.” Journal of Vision, 8(14):27, 1-13, Dec. 23, 2008.    19. Pylyshyn, Z. W. “Seeing and Visualizing” Massachusetts Institute of Technology 2003.
A visual sensation becomes a perception by an unconscious association and interpretation of ideas held in memory. The visual order of perception is reflected by a learned knowledge of the environment that is based on subjective experience. This presents an ability to view the world with an understanding made possible by the processing of sensate experience into representations that have meaning. Intangible connections between stimulus and sensation are organized into signs that have meaning by corresponding to reality in a manner that is thought to be similar to the way words do in speech. This is because humans use all kinds of visual data provided by the two eyes via a series of sub-process to make a perception. Bits of visual data are assigned meaning and used to create a unified three-dimensional perception of the surrounding world. As humans encounter different forms of visual data through day-to-day experiences, new meanings and signs are developed to accommodate an on going perception.
The human eyes are dynamic by their very nature. The eyes' gaze is never fixed or completely steady. The eyes are designed to constantly scan a scene to maintain and refresh the visual memory. This is, in part, due to the fundamental fact that the eyes are reasonably low resolution imagers. The function of the eyes in simple term is as follows: the retina is an area located at the rear of the eye on to which the eye's lens focuses an image. The retina is lined with specialized nerve cells called neurons that are light sensitive. The central region of the retina is called the fovea centralis or fovea. The fovea has the highest density neurons and therefore highest resolution. It is surrounded by several belts of neurons with diminishing density and therefore a diminishing resolution. The neurons that make up the retina feed information to the optic nerve which in turn connects to the visual cortex where an image perception takes place. Nearly 50% of the nerve fibers in the optic nerve carry information from the fovea, while the remaining 50% carry information from the neurons in the rest of the retina. The fovea comprises less than 1% of retinal area but the information it captures require as much as 50% of the brain's visual cortex to process. Humans maintain the perception of a sharp full field of view by constantly scanning the eyes and thereby the fovea a cross the scene being viewed.
The human eye is continuously scanning although these actions are generally imperceptible. This scanning action is called a saccade. The saccade serves in part to refresh the image being cast onto the fovea and surrounding retina at the back of the eye.
Current psychophysical and physiological evidence suggests that vertical disparities influence the perception of three-dimensional depth, but little is known about the perceptual mechanisms that support this process. Perhaps these perceptual effects are reconciled by a specific encoding of non-horizontal parallax. Whatever the specific mechanisms are, it is clear that the motion and gaze direction of the eyes contribute significantly to the process of three-dimensional sight.
Conventional thought is that because human have two eyes separated horizontally by an average distance of 65 mm (the interocular distance), two cameras capturing images in the same manner would work equally as well. However in the art of image capture, lens distortions, misalignments can cause vertical parallax. Vertical parallax is created by a misalignment of the two camera's points of view. This can be a cause of eyestrain. Conventional stereoscopic image capture goes to great lengths to avoid and/or eliminate any vertical parallax differences in the images. The stereoscopic production trend is also increasingly capturing images with disparities that are 50% or less than the human interocular (IO) of 65 mm. This trend is fueled, in part, by a desire to keep the images a comfortable range for the general viewing public. However, with less disparity comes less horizontal parallax and therefore less 3D effect. Less disparity also leads to a flattening of background scene elements. The addition of parallax scan information into the left and right image capture improves the overall perception of three-dimensionality in the final stereoscopic production. This is because the viewers have the benefit of the additional sub-process information with which to generate a more unified three-dimensional perception.
Under certain circumstances, conventional stereoscopic imagery is subject to being misinterpreted. Since the eyes gaze and saccade contribute additional information to the overall left/right binocular parallax three-dimensional perception, the absence of this information can cause the brain to “see” things as “odd” and unrealistic. Stereo perception created from two static horizontally separated left and right views can create a “cut out” 2D appearance for objects at various planes of depth. The subject volume looks three-dimensional, but the objects themselves appear flat. This is especially true if the images are captured using small IO disparities. A realistic visual scene contains multiple different disparities as captured by the eyes; imagery that contains only horizontal parallax creates a pseudo-stereoscopic perception.
The simple mechanics of conventional stereoscopic imaging provides the following variables to place the position in depth of a scene object (with regard to the plane of the screen):                Disparity between the two points of view (also known as interocular distance or IO)        Point of convergence of the two optical axisAn object's spatial position relative to the plane of the screen is determined by the amount of disparity and point of convergence. When the point of convergence is set behind an object in the foreground, the distance that the point of convergence is set behind that object and the amount of disparity between the two points of convergence will determine how far in front of the surface of the screen the object will be projected. The focal length of the lens and format of the capture medium will effect afore mentioned stereoscopic variables, but only in the amount required to achieve the same result.Three-Dimensional Imaging        
Several mechanical and/or electronic systems and methods exist for creating and/or displaying true three dimensional images. These methods have traditionally been divided into two main categories: stereoscopic display methods and autostereoscopic display methods. Stereoscopic techniques including stereoscopes, polarization, anaglyphic, Pulfrich, and shuttering technologies requiring the viewer to wear a special viewing apparatus such as glasses, for example. Autostereoscopic techniques such as holography, lenticular screens, and parallax barriers produce images with a three-dimensional illusion without the use of special glasses, but these methods generally require the use of a special screen.
The present disclosure is directed to an alternative approach to three-dimensional imaging. The approach described herein is centered on the concept of presenting parallax three-dimensional information over time in a manner that exploits human short-term visual memory, depth mapping, and other sub-processing visual perceptual mechanisms. Parallax scanning and square-wave switching methods have been developed to exploit parallax over time in a manner that is compatible with conventional media systems.
The process for conventional stereoscopic image capture and display is well known. Books like Lenny Lipton's 1982 Foundations of the Stereoscopic Cinema and Bernard Mendiburu's 2009 3D Movie making: Stereoscopic Digital Cinema detail the current approach to three-dimensional imagery capture and display, both of these publications are hereby incorporated by reference. Recent advances in digital imagery have improved the process of stereoscopic imaging, but the basic perception fundamentals remain the same throughout the various processes.
Other systems and methods have been developed that use square-wave switching and parallax scanning information to create autostereoscopic displays that allow a viewer to perceive an image as three-dimensional—even when viewed on a conventional display. For example, U.S. Pat. No. 5,991,551 discloses, inter alia, a method for a single camera to record images while undergoing a parallax scanning motion. The optical axis of a single camera is made to move in a repetitive pattern that causes the camera lens optical axis to be offset from a nominal stationary axis. This offset produces parallax information. The motion of the lens optical axis is referred to as parallax scanning. As the motion repeats over the pattern, the motion becomes oscillatory. At any particular instant, the motion may be described in terms of a parallax scan angle.
Over the years, the present inventors and their associates have developed a body of work based on methods (optical and synthetic) and apparatus that capture and display parallax information over time. U.S. Pat. Nos. 5,014,126, 4,815,819, 4,966,436, 5,157,484, 5,325,193, 5,444,479, 5,699,112, 5,933,664, 5,510,831, 5,678,089, 5,991,551, 6,324,347, 6,734,900, 7,162,083, 7,340,094, and 7,463,257 relate to this body of work and are hereby incorporated by reference. In addition, U.S. patent application Ser. Nos. 10/536,005, 11/547,714 and PCT Patent Application No. PCT/US2010/021627 are also related to this body of work and are hereby incorporated by reference.
Parallax scanning methods rely on discrete parallax differences between depth planes in a scene. The differences are caused by a parallax scan. When properly balanced (tuned) and displayed, the discrete parallax differences are perceived by the brain as depth.
A parallax scan records a pattern of sequential parallax views on a single strip of film or digital media. The lens's optical axis sweeps in the plane of the nominal X and Y axes around the nominal optical Z axis, pivoting on the optical convergence point (out along the Z axis), so that it passes through positions having parallax in relation to the optical convergence point. The circular scanning of the lens's optical axis traces out a coaxial cone pattern with the convergence point as its apex.
Perceptual tests revealed that the brain will translate parallax scanned information into depth information at scanning frequencies of between 3 and 6 Hz, and that the ideal frequency is 4.31 Hz. The scan pattern may be repeated with each cycle, or may change.
The digital parallax scanner (DPS) iris scanning mechanism is disclosed in U.S. patent application Ser. No. 11/547,714. Depending on the application, the assembly can be made of many different parts. One embodiment of the DPS employs two custom linear actuators and a central pivoting armature that holds the iris. The two parallel linear actuators have coordinated motion in such a way as to produce both x and y motions of the iris. For illustrative purposes think of the way a tank moves.
If both tank treads move forward/backward the “gun tip” moves forward/backward (both treads moving normally). If one tread moves opposite to the other (turning, both treads moving differentially) the “gun tip” would move left or right. It is this type of differential motion that allows the iris to be positioned in any area of the optical axis in the lens, and do it in the smallest possible space.
In the above design example, the linear actuators consist of a moving coil and fixed magnetic yoke assembly, very similar to the typical actuator that controls the read/write heads in a computer hard drive. By incorporating miniature, high-resolution optical encoders, PWM voice coil drivers, and a microcontroller, the entire scanner mechanism control system is completely digital.
Parallax information may also be incorporated into computer generated images, as described in the aforementioned U.S. Pat. No. 6,324,347 (“the '347 patent”). The '347 patent discloses, inter alia, a method for computer generating parallax images using a virtual camera having a virtual lens. The parallax images may be generated by simulating a desired parallax scanning pattern of the lens aperture and employing, for example, a ray tracing algorithm to produce the images. The images may be stored in computer memory on a frame-by-frame basis. The images may be retrieved from memory for display on a computer monitor, recorded on video tape for display on a TV screen, and/or recorded on film for projection on a screen.
Thus, in the method of the '347 patent, the point of view of a camera (e.g., the lens aperture) is moved to produce the parallax scanning information. The ray tracing method of image generation, as may be used by one embodiment of the method of the '347 patent, may be used to generate high quality computer images, such as those used in animated movies or special effects. Using this ray-tracing method to simulate optical effects such as depth of field variations, however, may require large amounts of computation and can place a heavy burden on processing resources. Therefore, such a ray tracing method may be impractical for certain applications, such as 3D computer games, animation, and other graphics applications, which require quick response.
Another previously mentioned U.S. Pat. No. 7,463,257 (“the '257 patent”) discloses, inter alia, a method for parallax scanning through scene object position manipulation. Unlike the moving point of view methods taught in the '347 patent, the '257 patent teaches a fixed point of view, and scene objects are moved individually in a coordinated pattern to simulate a parallax scan. Even though the final images created using the '347 patent and the '257 patent may appear similar, the methods of generating these images are very different.
U.S. Patent Application Publication No. 2006/0203335 teaches, inter alia, methods for critically aligning images with parallax differences for autostereoscopic display. The process requires two or more images of a subject volume with parallax differences and whose visual fields overlap in some portions of each of the images. A first image with an area of interest is critically aligned to a second image with the same area of interest but with a parallax difference. The images are aligned by means of a software viewer whereby the areas of interest are critically aligned along their translational and rotational axes to converge at some point. This is accomplished by alternating views of each image at between 2 to 60 Hz and adjusting the axial alignment of each image relative to one another until a critical alignment convergence is achieved on a sub-pixel level at a point in the area of interest. Autostereoscopic viewing is achieved by alternately displaying (a.k.a. square-wave switching) a repetitive pattern of critically aligned parallax images between 3 and 6 Hz.
The historical and contemporary stereoscopic prior art teaches images captured from fixed (in the X horizontal axis) left and right points of view. Although disparity and convergence change, there is no provision for capture of sub-process visual information. Further, much of the parallax scanning, square-wave switching, and other parallax visualization prior art deals with capturing, simulating and/or presenting three-dimensional scenes in which objects and the environment are generally captured by a single camera lens (optical and/or virtual).
The present invention is directed to overcoming one or more of the problems associated with two lens stereoscopic imaging methods. For example, the presently disclosed embodiments may include the capability to capture non-horizontal parallax and other sub-process three-dimensional visual information in a manner that triggers a perceptional response that is not fatiguing to the viewer. In addition, stereoscopic parallax scanning can be used to simulate information captured by the eye's natural gaze and saccadic motions. This allows the combined stereoscopic (left and right views) display to present a variety of three-dimensional information to the viewer in a manner that will create a unified visual perception.