Three-dimensional imaging is often used to analyze objects, particularly in the fields of medicine and science. Various methods with which television pictures in particular can be produced in three dimensions have also been developed for general consumer applications. Three-dimensional images also are used for entertainment, such as television programs, movies, videos, video games, etc.
Among the methods of 3D imaging, there is a basic distinction between sequential image transmission, in which the images for the right eye and the left eye are transmitted alternately one after the other or saved to a storage medium, and parallel transmission, in which the images are transmitted on two separate channels.
One particular disadvantage of sequential image transmission in connection with conventional television systems is the fact that the refresh rate is reduced to twenty-five images per second for each eye. This creates an unpleasant flickering for the viewer. Of course, this limitation does not occur when the image sequences are each transmitted on their own channel (left or right). However, problems may still arise with synchronizing both channels, as well as with the requirements placed on the receiver, which must be able to receive and process two channels simultaneously. This is not possible for most systems generally available on the market.
Signal transmission and processing likely will be entirely digital in future television systems. In such systems, every image is broken down into individual pixels which are transmitted in digitized format. In order to reduce the bandwidth required for this process, the appropriate compression methods are used; however, these create problems for stereo transmission.
For example, using block coding methods with a reasonable rate of compression, generally it is not possible to reconstruct every individual line of an image precisely. In addition, interframe coding techniques, such as MPEG-2, do not allow transmitting or saving stereo images in a sequential image format, because image information from one image still is contained in another image, creating the so-called “crosstalk effect”, which makes clear separation of the right image from the left image difficult or impossible.
Other methods for generating a 3D image sequence from a 2Dl image sequence are disclosed in DE 35 30 610 and EP 0 665697. An autostereoscopic system with an interpolation of images is disclosed in EP 0 520 179, whereas in “Huang: Image Sequence Analysis” (published in Springer Verlag) problems of the recognition of motion areas in image sequences are discussed.
Therefore, one problem addressed by the invention is the creation of a method and a device with which it is possible to generate images that provide a very natural 3D image impression even if using the above-mentioned transmission and/or compression methods.
Several examples are mentioned in which techniques have been used to convert 2D images to 3D images. One conversion technique according to an aspect of the invention is described in detail below. Two-dimension to 3D conversion is advantageous when a given television program, movie, video, video game, or other sequence of images, etc. (hereinbelow collectively referred to as “video” for brevity) has been prepared in a 2D format. Therefore, if a viewer (person) were to obtain a 2D video, e.g., on a DVD (digital video disk), compact disk (CD), video tape, or some other storage medium (hereinbelow collectively referred to as “DVD” for brevity), the individual may be able to view the video in 3D format using a 2D to 3D conversion system.
Sometimes a video may be prepared in both 2D and 3D formats and stored in a DVD for subsequent display in the selected 2D or 3D format, as the viewer may select. From time to time a person viewing a 3D format movie may decide to watch the same in 2D format. This would be possible by switching from the 3D format stored on the DVD to the 2D format stored on the DVD, but the switching may take time because usually it is not possible to switch from one scene in 3D format to the same scene in 2D format; rather it would be necessary to restart the 2D format video and to search for the scene where the 3D format of the video was stopped. Meanwhile, the interest, emotion, suspense or intrigue created by watching the video may be lost while taking the time to make the format switch. Some videos are created and stored only in a 3D format. If such were the case, a viewer would not be able to switch between the 3D format and a 2D format. Accordingly, there is a need to provide such ability to switch from a 3D format movie to a 2D format. There also is a need to improve the versatility of such display systems to allow the receiving of 2D format or 3D format input image data/images and to provide outputs for display of the images represented by the image data in either 2D or 3D format.
The term “hard cut” is a term frequently used in the video field to indicate that an image of one scene very quickly switches to another scene. For example, one scene may be of a person sitting at a table while eating, and the next scene may show a view outside the window of the room in which that person is sitting. The two scenes are entirely different. It is possible that a portion of one scene may be included in the other scene, but still there is a substantial difference overall between the two scenes.
In the techniques used for converting 2D format images, e.g., movies, etc., to 3D format images for display, the techniques used may involve a determining of relationship between two images and making the appropriate adjustments (compensation) to left or right images, for example, so that the viewer perceives a 3D visual effect. However, upon occurrence of a hard cut, the relationship between two sequential scenes may be almost non-existent. Absent such relation, it may be difficult, or impossible, to obtain a conversion to 3D format or at least to a sequence of images that may be comfortably viewed by a viewer. Thus, there is a need to provide hard cut compensation when converting from a 2D format to a 3D format.
The occurrence of a hard cut may disrupt smooth conversion of images from 2D format to 3D format. Such disruption may lead to a less than optimum 3D viewing experience for the viewer. Therefore, there is a need to detect occurrence of hard cuts and to compensate or adjust for the same in a 2D to 3D conversion system.
Vertical motion effect in images, e.g., movies, etc., that are converted from 2D format to 3D format can be somewhat disadvantageous. For example, most depth perception (3D or stereo effect in viewing a scene, for example) is obtained in a generally horizontal plane rather than in a vertical plane because the viewer's eyes are spaced apart usually in a generally horizontal plane and respective views are seen according to the stereo base of the distance between the viewer's two eyes. Vertical motion or disparity between a pair or more than a pair of sequential images of a movie may tend to be construed by 2D to 3D conversion systems as motion indicative of depth, which, of course, may be incorrect.
Accordingly, viewing images with vertical motion in 3D format may present a less than optimal 3D viewing experience for the viewer and also may inaccurately represent the actual depth information. Accordingly, there is a need to detect vertical motion effect and to reduce the same or to eliminate it in systems and methods for converting 2D format image data to 3D format image data and associated images provided therefrom.
Some types of movies, television shows, videos, etc., are more active than others, e.g., greater action, movement, scene changes, etc. For example, a war movie usually would have much more action than a television talk show. If the same 2D to 3D conversion technique, variables, values, etc., were used to convert an action movie as would be used to convert a television talk show from 2D format to 3D format, substantial effect of the 3D images may be lost during at least part of the program. For example, in an action movie there is a great deal of motion and usually a great deal of depth information that can be detected as a result of that motion; therefore, if the conversion from 2D format to 3D format relied on the motion, then substantial depth would be encountered. However, if the same approached were used convert a talk show program from 2D format to 3D format, the depth or 3D effect may be much less than in the action movie or, in any event, it may take much longer for the depth or 3D effect to become evident to a viewer from the 3D format images of the talk show program. Accordingly, there is a need to accommodate the different characteristics of the type of program, e.g., action movie, television talk show program, wildlife video, etc., when converting to 2D format representations thereof to 3D format.
It is noted here that reference may be made to television, video, movie, video game, etc., all of which are obtained by a sequence of images. The sequence of images provide information indicating the characteristic of a given scene at one moment in time and the characteristic of that scene in another moment of time. Reference herein to television, video, movie, video game, etc., may be construed to include all of the same and other formats of information conveyance, e.g., by a sequence of images that are presented for viewing. The viewing may be via a television, a computer monitor, a flat panel display, a projector, glasses or goggles, such as, for example, those used in connection with virtual reality type displays, game displays, etc., which provide images directly to the eyes of a viewer, and so forth. Furthermore, reference herein to images may also include image data, i.e., signals, data, etc., that may be stored, may be transmitted, or may be created, synthesized, interpolated, or otherwise obtained to represent the images. Thus, in the description below, reference to an image may be construed to mean the image itself or the data representing the image, whereby, if that data is appropriately provided to a display device, so the image can be viewed by a viewer (reference to viewer typically would mean a person who would view the image). 2D means two-dimensional images, sometimes referred to as mono, monoscopic, etc. type of images. 3D (sometimes written as 3-D) means three-dimensional images or images that appear to have not only height and width characteristic, but also depth characteristics. Sometimes 3D is referred to as stereo, stereoscopic, etc. type of images. These definitions (and others presented herein) are not intended to be limiting; standard or conventional definitions of the terms used herein also apply.
Another aspect relates to a method for the generation of 3D images represented by 3D image data that includes at least two channels of data for respective left and right eye views from a sequence of 2D images represented by 2D image data, comprising the steps of:
i) assigning the 2D image data to a first channel;
ii) determining a measure of similarity (δk) between sets of 2D image data representative of sequential 2D images;
iii) comparing the measure of similarity (δk) to predetermined threshold values (δ0<δ1<δ2) where                (1) if δ0<δk<δ2 and δk−δk−1≦−δ1 and an approximation variable (α) is less than or equal to Ks, the approximation variable is set to α:=α+s,        (2) if δ0<δk<δ2 and δk−δk−1≧δ1 and α≧s, the approximation variable (α) is set to α:=α−s, and        (3) if δ0≧δk, the approximation variable (α) is set to α:=0, where the approximation variable (α) determines the width of a stereo base, and s denotes a step width;        
iv) calculating 3D image data for a second channel from the 2D image data using the approximation variable (α); and
v) assigning the first channel and the second channel to respective left and right eye views for display.
Another aspect relates to an apparatus for generating 3D images represented by 3D image data that includes at least two channels of data for respective left and right eye views from a sequence of 2D images represented by 2D image data, comprising:
i) means for assigning the 2D image data to a first channel;
ii) means for determining a measure of similarity (δk) between sets of 2D image data representative of sequential 2D images;
iii) means for comparing the measure of similarity (δk) to predetermined threshold values (δ0<δ1<δ2) where                (1) if δ0<δk<δ2 and δk−δk−1≦−δ1 and an approximation variable (α) is less than or equal to Ks, the approximation variable is set to α:=α+s,        (2) if δ0<δk<δ2 and δk−δk−1≧δ1 and α≧s, the approximation variable (α) is set to α:=α−s, and        (3) if δ0≦δk, the approximation variable (α) is set to α:=0, where the approximation variable (α) determines the width of a stereo base, and s denotes a step width;        
iv) means for calculating 3D image data for a second channel from the 2D image data using the approximation variable (α); and
v) means for assigning the first channel and the second channel to respective left and right eye views for display.
Another aspect relates to a system for the generation of 3D images represented by 3D image data that includes at least two channels of data for respective left and right eye views from a sequence of 2D images represented by 2D image data, comprising:
i) means for determining a measure of similarity (δk) between sets of 2D image data representative of sequential 2D images;
ii) means for comparing the measure of similarity (δk) to predetermined threshold values (δ0<δ1<δ2) where                (1) if δ0<δk<δ2 and δk−δk−1≦−δ1 and an approximation variable (α) is less than or equal to Ks, the approximation variable is set to α:=α+s,        (2) if δ0<δk<δ2 and δk−δk−1≧δ1 and α≧s, the approximation variable (α) is set to α:=α−s, and        (3) if δ0≦δk, the approximation variable (α) is set to α:=0, where the approximation variable (α) determines the width of a stereo base, and s denotes a step width;        
iii) an image generator for calculating 3D image data for a second channel from the 2D image data using the approximation variable (α).
Another aspect relates to a method for generating 3D images, comprising providing a first sequence of images for display, computing a second sequence of images for display from one or more images of the first sequence using an approximation variable representative of a relationship between at least two images of the first sequence, allowing the approximation variable to change in response to a relationship between at least two images of the first sequence, checking for occurrence of a hard cut between images of the first sequence, and in response to occurrence of a hard cut, setting the approximation variable to a value that provides a low degree of parallax between respective images of the first and second sequences of images or the appearance of a 2D image.
Another aspect relates to an apparatus for generating 3D images, comprising means for providing a first sequence of images for display, means for computing a second sequence of images for display from one or more images of the first sequence using an approximation variable representative of a relationship between at least two images of the first sequence and allowing the approximation variable to change in response to a relationship between at least two images of the first sequence, means for checking for occurrence of a hard cut between images of the first sequence, and means operative in response to occurrence of a hard cut for setting the approximation variable to a value that provides a low degree of parallax between respective images of the first and second sequences of images or the appearance of a 2D image.
Another aspect relates to a method of compensating 3D image data in response to hard cut detection, comprising preparing 3D image data for display from a first sequence of images and a second sequence of images computed from at least two respective images of the first sequence by using an approximation variable that changes based on a relationship between at least two images of the first sequence, in response to occurrence of a hard cut in images of the first sequence, setting the approximation variable to a value that provides a low degree of parallax between respective images of the first and second sequences of images or the appearance of a 2D image.
Another aspect relates to an apparatus for compensating 3D image data in response to hard cut detection, comprising means for preparing 3D image data for display from a first sequence of images and a second sequence of images computed from at least two respective images of the first sequence by using an approximation variable that changes based on a relationship between at least two images of the first sequence, means responsive to occurrence of a hard cut in images of the first sequence for setting the approximation variable to a value that provides a low degree of parallax between respective images of the first and second sequences of images or the appearance of a 2D image.
Another aspect relates to a method for generating 3D images, comprising preparing 3D image data from a first sequence of images and a second image sequence of images computed from at least two images of the first sequence using an approximation variable that changes based on a relationship between at least two images of the first sequence, and reducing the value of the approximation variable upon the occurrence of vertical disparity between at least two images of the first sequence.
Another aspect relates to an apparatus for generating 3D images, comprising means for preparing 3D image data from a first sequence of images and a second image sequence of images computed from at least two images of the first sequence using an approximation variable that changes based on a relationship between at least two images of the first sequence, and means for reducing the value of the approximation variable upon the occurrence of vertical disparity between at least two images of the first sequence.
An aspect of the invention relates to a method of converting 3D image data for a 3D image that includes image data representative of respective left and right eye views to 2D image data for a 2D image, comprising                i) using the 3D image data representative of one of the left and right eye views to provide first image data representative of a 2D image, and        ii) computing computed image data representative of a subsequent 2D image from the first image data.        
Another aspect relates to a device for converting 3D image data, which includes image data representative of respective left and right eye views of a 3D image, to 2D image data, comprising
i) an output providing the image data that is representative of one of the respective left and right eye views of the 3D image data as 2D output image data for a 2D image, and
ii) a computation device to compute from said image data representative of said one of said respective left and right eye views computed image data that is representative of a subsequent 2D image.
Another aspect relates to a method of providing images for display, comprising
i) receiving image data in either of 2D format or 3D format, and
ii) selectively outputting in either 2D format or 3D format output image data for display, and wherein selectively outputting 2D format output image data from 3D image data includes outputting at least some output image data computed from one eye view of the 3D format image data.
Another aspect relates to a system for use in displaying images, comprising
i) an input for receiving image data in either of 2D or 3D format,
ii) a converter to convert 2D image data to 3D image data and to convert 3D image data to 2D image data, and
iii) a selector operable to select the format of output image data, including                2D output image data in response to 2D input image data,        3D output image data in response to 3D input image data,        2D output image data converted by the converter from 3D input image data, and        3D output image data converted by the converter from 2D input image data.        
Another aspect relates to a method for generating 3D images, comprising preparing 3D image data from a first sequence of images and a second sequence of images computed from at least two images of the first sequence using an approximation variable that changes based on a relationship between at least two images of the first sequence, and presetting the initial value of the approximation variable according to the anticipated amount of motion across at least two images of the first sequence.
To the accomplishment of the foregoing and related ends, the invention, then, comprises the features hereinafter fully described in the specification and particularly pointed out in the claims, the following description and the annexed drawings setting forth in detail certain illustrative embodiments of the invention, these being indicative, however, of but several of the various ways in which the principles of the invention may be suitably employed.
Other systems, methods, features, and advantages of the present invention will be or become apparent to one with skill in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
Although the invention is shown and described with respect to one or more embodiments, it is to be understood that equivalents and modifications will occur to others skilled in the art upon the reading and understanding of the specification. The present invention includes all such equivalents and modifications, and is limited only by the scope of the claims.