Depth or three-dimensional (3D) effects have been added to movies for many decades. Audience members wear special 3D glasses with a red filter for one eye, and a cyan filter for the other eye. The movie is simultaneously captured by two cameras, or altered to have two separate images. The two images are combined into a single image using color filters to create a different image for each eye. No special display equipment is needed but each eye will experience color loss in the image.
More recently, different polarizations are used rather than different colors for the two images. Polarized glasses rather than red/cyan glasses are worn.
Special liquid crystal display (LCD) glasses that alternately black out one eye and then the other eye may be used with special 3D TV's or displays. The LCD glasses are synchronized to the display, which alternates between 2 views. Side by Side and Top/Bottom formats may use active shutters or polarization.
Auto stereoscopic techniques do not use special 3D glasses. Instead, the display device is altered to project different images to the user's left and right eye. Viewers in different physical locations may have different perceptions.
Multi-view systems may display more views than two, such as 8 or 16 views per frame. For example, an array of sixteen cameras may simultaneously capture 16 video streams for display on a 16-view multi-view system.
Video captured by older stereoscopic cameras has only two view images per frame. Multi-view displays may have many more views, such as 8, 16, or 28. It is desired to generate or synthesize these multi views from the two views of a stereoscopic video, or from a single view image with a depth map of the image.
Older video does not have depth information or stereoscopic views. It is also desired to synthesize multi views from these single view legacy videos.
FIG. 1 shows a multi-view frame of 8 views being synthesized from a stereo-view frame of 2 views. A stereo view having 2 images per frame is the source or input. A depth map may be created from this stereo view. The depth map shows closer parts of the image as white, such as the man's hand, and objects in the background as black, such as the hallway behind the man's head. A multi-view image of 8 views is desired to be created from the generated depth map and the input stereo image.
Differences from one frame to the next frame in the video stream may cause the multi-view images to be unstable, resulting in visible flickering. Such flickering is distracting and undesirable.
Camera mismatch among the multiple cameras may yield poor matching results during video compression or other image processing. Camera calibration before video capture may not be precise. Texture within the image, or a lack of texture, may cause more mismatching. These problems may cause stray blocks or other artifacts to be visible to the viewers. Such image errors are undesirable.
Some applications require that the multi-view images be synthesized in real time, or with only a one-frame delay. Memory requirements should be reduced, especially for viewing on small portable devices. Thus storing only one or two frames is desirable.
Some newer multi-view displays may use masks to define each of the multiple views. These masks may be rotated. Such rotated masks may require too much processing power and memory for real time applications.
What is desired is a multi-view synthesizer that creates multi view images using only a one-frame delay. A multi-view generator with reduced visual artifacts in low-texture regions, and with reduced flickering, is also desired. A system that can handle rotated masks is also desired. It is desired to discard poor matching results to reduce artifacts and flickering.