Current mainstream 3D displays use polarization technology to separate the views each eye sees and thus require glasses for the 3D perception. Manufacturers and users of 3D display technology would prefer to eliminate the need for users to wear glasses. The glasses are inconvenient and sometimes uncomfortable, which tends to limit acceptance of 3D displays by many consumers. Also, the need for special purpose glasses generally limits viewing opportunities to special locations, typically movie theatres and home theatre systems. Displays in other environments would also benefit from 3-D displays. The inconveniences and limitations associated with displays requiring special-purpose glasses have lead to the development of autostereoscopic displays. Autostereoscopic displays obviate the need for glasses and likely represent the next generation of 3D mainstream displays.
Autostereoscopic displays typically include a regular LCD screen with a “lenticular sheet”, which distributes the light emitted from different pixels in different directions. A practical display should work well even when a viewer is not in a single particular location relative to the display. Practical uses should support multiple viewers having different relative locations and angular positions with respect to the display screen.
Accordingly, these displays should accommodate multiple points-of-view, typically 5, 8, 9 or more. This presents significant challenges in making autostereoscopic displays. A principle challenge involves generating or obtaining significant display data content to support display of multiple viewpoints of the same scene at the same time.
One type of solution involves complex data acquisition, specifically multiple cameras capturing each separate viewpoint to be supported. However, the video streams must be synchronized and this can involve the complex use of a synchronized camera rig with one camera for each viewpoint. Supporting 5 or more viewpoints quickly becomes expensive, difficult and unreliable. Widespread adoption of this approach is unlikely as it is costly to build and maintain, impractical to calibrate, synchronize and operate, and usually immobile.
For this reason, a more common approach is to capture the content with significantly fewer cameras (typically 2 or 3) and generate the other views as if they were shot with virtual cameras. Existing methods usually generate the desired virtual views on their own, and this can result in inconsistencies between the generated views. There are also methods that address this issue; however their approaches are rather explicit, which makes them suffer from long execution times.
Existing methods range between very fast but simplistic to very advanced but slow. A class of prior methods uses a circular camera setup where almost all of the occluded areas in one of the captured views appear in another. This type of camera setup allows a layered approach where a background layer can be extracted from the given stereo images and used to fill in missing regions. See, C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-Quality Video View Interpolation Using a Layered Representation,” Proc. ACM SIGGRAPH (August 2004); K. Muller, A. Smolic, and K. Dix, “View Synthesis for Advanced 3D Video Systems,” EURASIP Journal on Image and Video Processing (2008). This approach is especially useful when the virtual cameras are to be positioned arbitrarily in 3D space, as in the case of free viewpoint TV (FTV). With this type of approach, in the case of autostereoscopic screens, the virtual cameras need to be positioned along a line or a wide arc. As a result, the circular camera class of methods introduces unnecessary complexity both to depth estimation and view synthesis.
For rectified camera setups, there are view synthesis methods that process the missing regions pixel-by-pixel and also methods which use patch based in-painting. The pixel based methods tend to use interpolation or simple in-painting methods but they usually suffer from blur in the synthesized region. See, e.g., S. Zinger and L. Do, “Free-Viewpoint Depth Image Based rendering,” Journal Visual Communication and Image Representation, vol. 21, pp. 533-541 (2010); Y. Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, “View Generation with 3D Warping Using Depth Information for FTV,” IEEE Journal Signal Proc.: Image Communication, pp. 65-72, (2009). Some of the pixel based methods fill in the missing regions using more advanced methods and optimization techniques. A. Jain, L. Tran, R. Khoshabeh, and T. Nguyen, “Efficient Stereo-to-Multiview Synthesis,” Proceedings of International Conference on Acoustics, Speech, and Signal Processing (2011); L. Tran, C. J. Pal, and T. Q. Nguyen, “View Synthesis based on Conditional Random Fields and Graph Cuts,” Proceedings of the International Conference on Image Processing (2010). The more advanced methods yield better results, but patch-based methods are still usually superior. For example patch-based methods, see L. Tran, R. Khoshabeh, A. Jain, C. Pal, and T. Nguyen, “Spatially Consistent View Synthesis with Coordinate Alignment,” Proceedings of International Conference on Acoustics, Speech and Signal Processing (2011); P. Ndjiki-Nya, M. Koppel, D. Doshkov, H. Lakshman, P Merkle, K. Muller, and T. Wiegand, “Depth Image Based Rendering with Advanced Texture Synthesis,” IEEE International Conference on Multimedia and Expo, pp. 424-429 (2010). Tran et al. seeks to achieve inter-view consistency in a patch-based method and other patch-based methods seek to achieve temporal consistency. C. M. Cheng, X. A. Hsu, and S. H. Lai, “Spatio-Temporally Consistent Multi-view Video Synthesis for Autostereoscopic Displays,” Pacific Rim Conference on Multimedia (PCM), pp. 532-542 (2010).
There are also methods which seek to leverage the processing power of modern GPUs to speed-up the parallelizable processes. S. Rogmans, J. Lu, P. Bekaert, and G. Lafruit, “Real-time Stereo-based View Synthesis Methods: A Unifed Framework and Evaluation on Commodity GPUs,” Signal Processing: Image Communication, vol. 24, pp. 49-64 (2009); S. Rogmans, J. Lu, P. Bekaert, and G. Lafruit, “Real-time Stereo-based View Synthesis Methods: A Unifed Framework and Evaluation on Commodity GPUs,” Signal Processing: Image Communication, vol. 24, pp. 49-64, 2009. Although these methods can achieve very fast view synthesis, they are effective primarily with very simple approaches.