Virtual Reality (VR) Head Mounted Displays (HMDs) allow users to experience a full 360 degree of a virtual environment. With Computer Generated Graphics such as those from computer games, the user is enabled to interact with the scene and to move freely in the VR environment (i.e., six degrees of freedom, 6DoF).
Increasing numbers of people record videos with 360 degree cameras, which allow viewers to look around, but only from the camera location. Typically, only a regular 2D video is recorded, but there are also 3D stereoscopic capturing systems. However, 360 video today can only allow three degrees of freedom (3DoF) for the audience, since the audience must follow the movement of the camera.
It is possible to capture the 360 video from different positions (free viewpoint content) as described in “DVB, Virtual Reality—Prospects For DVB Delivery,” Report of the DVB CM Study Mission on Virtual Reality, Draft 012—June 2016.
FIGS. 1A-1E illustrate several conventional alternatives for recording 360 video with an omnidirectional camera. FIG. 1A illustrates an alternative that captures the video from a fixed viewpoint position in a single viewing direction, resulting in a 180-degree panorama. FIG. 1B illustrates an alternative that captures the video from a fixed viewpoint position while viewing in the left or right directions, resulting in a 360-degree panorama, which may not capture the poles. FIG. 1C illustrates an alternative that captures the video from a fixed viewpoint position while viewing from any angle, providing fixed-position spherical content. Videos optimized for horizontal stereoscopic viewing require correction at the poles to avoid distortion due to parallax. FIG. 1D illustrates an alternative that captures the video from a movable viewpoint where the content is captured from different viewpoints, enabling the user to change his/her head position. This typically means recording different versions of the content from different positions. FIG. 1E illustrates an alternative that captures the video from a free viewpoint using 3D modeling or computer generated imagery (CGI) or using camera arrays known as light fields. This alternative is known as 6DoF. While this is quite mature for CGI content, light fields are typically used in laboratory environments.
360 video and VR are currently explored in many different environments and many different combinations. One use-case is to create a virtual 3D representation of a certain event or place (for example a museum) and the viewer can make a virtual tour of the museum on an HMD.
Approaches for real-time navigation in indoor and outdoor environments are considered today. In particular, for indoor environments where a GPS signal is not used, approaches such as Simultaneous Localization and Mapping (SLAM) are deployed. This process is described in further detail in J. Biswas and M. Veloso, “Depth camera based indoor mobile robot localization and navigation,” 2012 IEEE International Conference on Robotics and Automation, Saint Paul, Minn., 2012, pp. 1697-1702. Basically, SLAM attempts to create a map of the environment by scanning the environment and localizing an object in the environment by comparing depth camera images against the scanned environment. 3D representations such as point clouds are commonly used for scanning the environment.
A 360 video gives a viewer the ability to move his/her head during the playback of the video, and explore the surroundings to a limited extent. The usage of HMDs with 360 videos brings an immersive experience to viewers and separates it from conventional video recording, where the cameraman is controlling the field of view. In 360 video, the viewer has the freedom to look wherever desired.
However, current 360 video recording does not give the viewer a transparent experience (i.e., interacting with the offering) to move around the recorded scene and explore the surroundings. This is because the video is bounded by the directions the cameraman took during the recording. Even for free viewpoint recording, interactivity is not considered as providing different videos depending on which events the viewer is encountering. Point clouds can be used to create position maps. However, an event map, meaning what events are available for the user at a current position, is not really considered.
FIG. 2 illustrates a conventional video recording of a virtual tour of a building 10 that includes two intersecting corridors and four Halls. The cameraman records the tour by moving in sequence to locations 1-7. This recording sequence restricts the tour to that sequence. This means the viewer must follow the video recording timeline, going through locations 1-7 to explore the tour with the same sequence as the video was captured. Thus, the viewer will sequentially go through Hall-1, Hall-2, Hall-3, and Hall-4. If the viewer desires to visit the halls in a different order, for example visit Hall-4 at the beginning of the tour, the viewer has to search forward to near the end of the video until he/she reaches the timeline where Hall-4 is viewed. If the viewer desires to visit Hall-1 after Hall-4, he/she has to search backward again to the beginning of the video. This type of scenario greatly interrupts the experience of the viewer. To avoid back-and-forth searching, the only option the viewer has is to follow the video capturing sequence and waste time looking at views that are less interesting to him/her.