Realistic and highly accurate 3D video is useful in entertainment, business, industry, and research. Realistic and highly accurate 3D video is of special importance in the field of minimally invasive surgery (e.g., endoscopic and laparoscopic surgeries) since surgeons performing these procedures are guided entirely by the images that they view on a video monitor. Accuracy in industry, research, and medicine is required in order to carry out complex manipulations such as medical dissection and suturing procedures and in order to safely navigate within and among tissue and organ structures. Equally important, the 3D video imagery must be comfortable to view for long periods of time (8 hours in business, industry, and research, and up to 3-4 hours or even longer under great stress for some surgical procedures) without having the viewing system impart stress and eye strain. Further, it is especially desirable to enable viewing of 3D displays on one or several color monitors, which can be viewed by several people or at several positions in (or remote from) the office, factory floor, laboratory, or the operating theater. Also, it is advantageous to be able to transmit the 3D signal for distant viewing, such as would be required for teleconferencing, plant supervision, research collaboration, and for remote expert medical consultations or for live viewing by medical students.
Traditional stereoscopy has commonly employed a binocular system, e.g., two lenses, or two cameras, to produce the two channels of visual information; the critical factor that produces depth perception in these systems is the spatial parallax brought about by the spatial offset of the two input channels. "Parallax" refers to the difference in spatial orientation and perspective encountered when the same object or scene is viewed by two lenses (e.g., our eyes) which are spatially offset from one another. Many different embodiments of stereoscopic systems have been developed, including those that utilize twin-screen displays using "passive" polarized or differently colored viewing lenses in glasses worn by the viewer, field or frame-multiplexed systems which utilize a single display screen, head-mounted displays such as those commonly used in `virtual reality` systems, where dual liquid-crystal screens or dual CRT's may be built into an assembly worn on the viewer's head, projection systems, and auto stereoscopic systems not requiring viewing glasses.
Attempts have also been made to develop systems which convert an input two-dimensional (2D) video signal into a form suitable for stereoscopic display. These have utilized various mechanical, electrical, and electro-optical devices and procedures which act to split the input image into two separate channels of visual information.
To date, the prior art methods and systems developed to produce stereoscopic three-dimensional (3D) video have not proved acceptable for entertainment, for many business, manufacturing, and research uses, and in the biomedical area. This situation is in contrast to stereoscopic display of computer-generated graphics, which has found commercial success, e.g. , in the field of biochemistry where stereoscopic visualization of computer graphics images of complex molecular structures has become routine, typically utilizing software running on advanced workstation computers.
The reasons for the aforementioned lack of acceptance are manifold and include system complexity, expense, and physiological difficulties experienced by some viewers of these systems.
The key technical factor necessary to produce high-quality stereoscopic video, in systems that employ two lenses for input, is the maintenance of proper alignment of the two channels of image data. The external lenses or cameras of the known systems must be properly aligned and the signals must preserve that precise alignment relationship as they are processed by the system electronics or optics. Twin-screen viewing systems are known to be particularly prone to mis-alignment problems. Twin-screen systems also tend to be bulky and cumbersome. Single-screen solutions, such as the field/frame multiplexed method, minimize the problems associated with use of dual display monitors, yet still rely on accuracy of alignment of the input cameras.
One entertainment field implementation of multiplexed single-screen stereo video is the stereo video game systems marketed recently by the SEGA Corp. of Japan. These video game systems are based on use of a 60-Hertz display on conventional analog television monitors. Such systems are prone to serious flicker since each eye is receiving only 15 video frames per second. The flicker and jerky motion involved lead to stress and eye strain and are unsuitable for use, for example, in business, industry, research, and in the surgical theatre.
Other variables that are pertinent in the production of high-quality stereo video include picture resolution, brightness, and color reproduction, presence of display or processing artifacts, and width and depth of the viewing field. Autostereoscopic methods, for example, have not yet overcome problems with resolution and providing a satisfactory viewing zone for multiple viewers as is often required in business and medicine.
A factor limiting the commercial success of traditional stereoscopy has been adverse physical reactions including eyestrain headaches and nausea experienced by a significant number of viewers of these systems. Illustration is provided by the 3D movies that were popular briefly in the 1950's and '60's. While a limited number of 3D movies continue to be produced today, and are popular in theme parks and like venues, these movies are typically limited to less than about 30 minutes in length, because the average viewer tolerance for these media is limited. Viewer-tolerance problems are intrinsic to the methodology of traditional stereoscopy, and result from the inability of these systems to realistically emulate the operation of the human visual system. Such systems are also limited due to a failure to account for the central role of the human brain and the neural cooperation employed therein for effective visual processing. The relevance of this point to the present invention will be elaborated upon hereinafter.
The efficient conversion system of the present invention can produce highly realistic, accurate, and visually-comfortable 3D video imagery in effective real-time from a single camera source. This is advantageous for several reasons. First, the present invention produces a "synthesized" stereoscopic video presentation which is not prone to the limitations noted above associated with traditional stereoscopy. Second, systems based on this synthetic stereo are automatically compatible with virtually all existing single camera video systems as used in business, industry, and research, and especially existing biomedical video (i.e., endoscopy, microscopy, and other) systems since they require as their input signal the same 2D video input signal that drives the normal 2D display monitor.
Another method of synthesizing a three-dimensional image from a two-dimensional source includes the "DeepVision" system from Delta Systems Design, Ltd. and AVS, a division of Avesco, London, England. It is believed that this system employs three mechanisms for producing a three-dimensional view from a two-dimensional video source: spatial parallax from the spatial offset of sequential video frames; a "temporal parallax" arising from the translation of motion-displaced objects from adjacent frames into spatial parallax; and a "short-term visual memory" arising from an imposed time delay between successive video frames.
The processing of video imagery employed by the earlier DeepVision system stands in contrast to that employed by traditional video stereoscopy. Traditional stereoscopy, as noted above, has commonly employed a binocular system, e.g., using two lenses, or two cameras, to produce the two channels of visual information; the critical factor that produces depth perception in these systems is the spatial parallax brought about by the spatial offset of the presentation of the two input channels. While it is evident that binocular parallax is a sufficient condition for producing depth-enhanced imagery, the DeepVision processing approach demonstrates that it is not a necessary condition. DeepVision produces depth-enhanced imagery from a single, monocular source, through manipulation of hitherto unappreciated "depth cues." In particular, these "depth cues" include motion and visual persistence, or "memory." The early DeepVision method demonstrates the apparent existence of neural mechanisms in the human eye-brain system, in addition to those involved in processing binocular parallax information, which are active in depth perception.
The quality of video imagery produced by the early DeepVision system has been observed to differ in some respects from binocular stereo video images. While many observers have been unable to distinguish between the 3D DeepVision video image derived from a monocular source and a traditional 3D image from a binocular source, others have commented that there is an appearance of less depth in certain scenes, or that the DeepVision images appear either on or behind the plane of the display screen, but never in front of it, as is possible with binocular stereo. These differences are again attributable to the different methods employed in producing 3D imagery in these two modalities.
The strength of binocular stereo lies in its ability to produce consistent depth enhancement within the "zone of convergence," i.e., the region defined by the overlap of the viewing zones of each of the two viewing elements (lenses or cameras). However, unlike human vision which can adapt its binocular focus rapidly and continuously adjust axially for different viewing depths, the two camera axes of binocular stereovision are fixed, and no such axial adjustment capability is found in these stereovision systems. Thus, objects viewed outside the convergence zone may appear distorted and can produce eyestrain in the viewer.
Monocular DeepVision video, by contrast, provides no fixed zone of convergence. While this allows viewer concentration to range freely within a given scene without eye strain, the perceived sense of depth may not always appear to be consistent, particularly in those scenes where there are rapid shifts in the field-of-view or where there is rapid motion.
"Motion artifacts," seen as an unnaturally jerky or discontinuous representation of movement in the viewed image, are occasionally observable when rapid movement occurs between successive video frame images in the earlier DeepVision system. Differences in three-dimensional effect may also vary for images of the same scene when viewed in systems implementing the PAL specification, as compared with systems implementing the NTSC system. This may be due to a longer interframe delay in the two systems.
A further limitation of existing video stereo systems is the absence of integration of these systems with modern computer technology. These systems have been created as "enhanced" television systems, rather than as fully digital computer-based systems with 3D capability.