The mobile computing on smartphones, tablets and other smart mobile devices had become an increasingly important part of people's daily live. Smart mobile devices were getting more and more portable and powerful. People wanted to integrate as many as functions they needed into their smart devices so they can only carry one device. For example, high resolution screen, email, calendar, internet, messaging, GPS based map service, position sensor, voice recording, near field communication, finger print reader, as well as advanced camera for image and video capturing. By making smart device multi-functional, people gradually moved away from needing many previously dedicated devices, like point-and-shoot cameras, camcorders, radios, music players, TVs, and even laptop computers.
Almost all smartphones were equipped with cameras. The mobile cameras were getting more and more powerful. Not only did the number of image pixels increase greatly, but also the advanced features like auto-focusing, larger size CMOS sensors, high dynamic range, stronger flash lights, and better camera lenses were added in. Most high-end smartphones now had two cameras—a front camera and a back camera. There were tons of apps strived to improve mobile image or video capturing. They tried to improve the capturing process, for example, to make it easier and quicker to use, add more functions during the capturing, improve the captured image or video quality. One example to improve the content quality was to provide a set of digital filters to post process the captured images or videos. Among those the very successful companies were Instagram and Socialcam.
Instagram provided a simple way to capture and share mostly 2D still images. It also provided a lot of build-in filters and visual effects software to customize user photos. Socialcam provided the easiest way to capture, share and view 2D videos. It also provided user video filters, cloud storage and other video sharing services.
Beside 2D images and videos, there were a few smartphone cameras could be used to capture 3D and higher dimensional light field images or videos. The stereoscopic 3D images were actually two regular 2D images taken with a binocular offset that corresponded to the distance between human's two eyes. Stereoscopic 3D images could create or enhance the illusion of depth by means of stereopsis.
One typical solution named “3D Camera” let you manually control the binocular shifting to take two separate images sequentially using existing single 2D mobile camera to get the 3D image pair needed for stereoscopic 3D. It was actually a software solution as a mobile app. Here, a mobile app, short for mobile application, or just app, was application software designed to run on smartphones, tablet computers and other smart mobile devices to realize all those smart functions. Apps had become very popular so the newer smartphones were nicknamed “app phones” to distinguish them from earlier less-sophisticated smartphones. Study showed that more mobile subscribers used apps than browsed the web on their devices because apps were normally more correlated to the user, location and time, and more efficient for the specific functions.
The problems of the above software 3D camera were (1) two images were taken at the different time, as long as there was a relative movement from the scene objects to camera, the artifacts were inevitable; (2) it was slow and difficult to use due to manual shifting and registration control. Another solution was called “Poppy 3D”. It provided a bulky plastic housing of your phone for both 3D capturing and viewing. It used a set of fixed mirrors to capture two stereographic images using smartphone's single camera. The third example solution was “3D cone”. It used single mirror stereopsis method with a bulky plastic cone shaped mirror to divide the smartphone back camera image into left and right images. Both “poppy 3D” and “3D cone” solutions were not considered portable, and the image resolution would be less than half of the mobile camera's original resolution and viewing area. But the two images could be captured at the same time so quality is better than the software solution. The fourth solution was like “sthreem 3D” add-on system. It required a separate camera device that needed to attach to the mobile's interface port and work together with the build-in mobile camera to provide 3D capturing. The problem was expensive, power consuming and inconvenient to use. The last solution was called “3D scanner”. It was pure software solution that required mobile camera scanning a scene object constantly for a period of time, so it could acquire the image information at different angle and finer detail, and create a 3D model shape. The big problem was that it was very slow, so during the scanning time, any object movement would cause huge artifact in the final result. The quality was fairly low.
The last solution was to embed stereoscopic camera hardware directly inside the smartphone. The examples were HTC EVO 3D phone and LG Optimus 3D P920 phone. They all had a pair of matching mobile camera sensors for dedicated stereoscopic image and video capturing. However due to the same problems like adding cost, more power consumption, low sensor quality and inconvenience to use, the solution could not get popular enough to encourage the manufacturers continuing the product lines.
Virtual reality was the next big thing after 3D, 4K and mobile computing. 2016 was a pivot year for massive adaption of virtual reality technology. This included virtual reality content generation and display. Virtual reality (VR) referred to computer technologies that used software to generate realistic images, sounds and other sensations that replicated a real environment (or create an imaginary setting), and simulated a user's physical presence in this environment, by enabling the user to interact with this space and any objects depicted therein using specialized display screens or projectors and other devices. VR had been defined as “ . . . a realistic and immersive simulation of a three-dimensional environment, created using interactive software and hardware, and experienced or controlled by movement of the body” or as an “immersive, interactive experience generated by a computer”. Not only did Virtual reality let people see a 3D scene but also enabled them to look around and see a complete 360 degree in horizontal (360H) and 180 degree in vertical (180V) of the scene for every viewing location.
There were two ways to get a virtual reality content that contained a complete 360H+180V degree 3D information for every viewing location—one was using game engine to create a complete set of 3D models of all scene objects; then the computer generated a 2D projected image frame for each of the viewing position in real time. This was called CG content. For example, all VR games and 3D animation movie data. Another way was using a special camera to capture and store the light field information of the real scene. This was called live action content. The light field was a vector function that described the amount of light flowing in every direction through every point in space. The direction of each ray was given by the 7D plenoptic function, and the magnitude of each ray was given by the radiance. The camera that could capture 7D light field information was called light field camera.
There was yet smartphone equipped with a light field camera. This was because of the optical and electronic limitation in a smartphone form factor and the computation power for light field data processing in current mobile devices. Some early companies were working on a type of light field camera for mobile based on integral photography (IP). Integral imaging was an autostereoscopic and multiscopic three-dimensional imaging technique that captured and reproduced a light field by using a two-dimensional array of microlenses, sometimes called a fly-eye lens, normally without the aid of a larger overall objective or viewing lens, and putting it in front of a camera CMOS sensor. These technologies required a special light field mobile camera module being built into the smartphone. So, if an existing smartphone user needed to capture a light field content, he or she must buy a new smartphone.
Once a virtual reality content was generated, people needed a way to view it. A typical virtual reality viewer comprised a formation of two convex lenses that were put in front of a 5 to 6 inches wide digital display screen so that each eye looked at one half of the display through a lens respectively. The head position and viewing direction was tracked and the display was updated in real time accordingly.
Currently there were three ways to make this happen. The first way was to have a head mount lens box with digital display screens and position sensors built in. The head mount lens box was tethered to a personal computer. The position sensors normally referred to GPS, gyro and accelerometer sensors. The personal computer received the position sensors data from the head mount box, and handled all the computation and render out each frame of the virtual world. The rendered frames were sent to the head mount box for display. Since the computation and rendering happened in the relatively powerful personal computer, the virtual reality content display quality was the best.
The second way is to use a smartphone for both rendering computation and display. There was also a head mount lens box with or without position sensors. The smartphone could be put inside the box. If the box had position sensors, which were normally more accurate than those in the smartphone, then the sensors data was transmitted to the smartphone. Otherwise the smartphone internal position sensors data was read out. The smartphone handled all the computation and rendered out each frame of the virtual world and displayed it to the smartphone screen for display. Since the computation and rendering happened in the phone, the quality was normally the worst.
The third way was a compromise of the above two methods. A dedicated embedded computing device, a set of position sensors, and a pair of display screens were used in the head mount box. The computing device, position sensor and display screens normally had better performance than those in the smartphone. The sensors detected user's head positions and sent the measured data to the dedicated VR computing device. The computing device handled all the computation and rendered out each frame of the virtual world and sent it to the display screen for display. Since the computation and rendering happened in the dedicated better computing device, the quality was normally higher than the smartphone solution and lower than the PC solution.
All the above wearable solutions of virtual reality content viewing required a fairly large, heavy and inconvenient head mount box worn on user's head. Not only was the wearing being uncomfortable, for example, many of the head mount VR viewers could not accommodate prescription glasses very well and made people who wore glasses feeling uncomfortable in viewing virtual reality contents, sometimes steamy on lenses and blurry in images, but also no matter what material the head mount box was made of, they were normally too big and heavy to carry conveniently in pockets or bags. If the virtual reality box was a separate item to pocket besides smartphone, people would feel inconvenient and became reluctant to carry or forgetting to bring it with them all the time. In the end people would not be able to enjoy virtual reality contents at anywhere and at anytime even they selected mobile VR solution. Same problems were also with mobile virtual reality cameras. There is yet a solution of a universal light field camera for all smart devices.
The present disclosure provides the method or apparatus to add new imaging functions to an existing smart device: (1) to turn any smartphone with gyroscope sensor into an extremely portable and highly available wearable virtual reality content viewer by overcoming the above-mentioned limitations; (2) to achieve stereoscopic 3D, wide field virtual reality and light field 4D virtual reality image and video capturing by using existing hardware without requiring a new smart device. The two methods could be implemented into one apparatus so people can achieve both purposes in one device.