This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
A video coding system may comprise an encoder that transforms an input video into a compressed representation suited for storage/transmission and a decoder that can uncompress the compressed video representation back into a viewable form. The encoder may discard some information in the original video sequence in order to represent the video in a more compact form, for example, to enable the storage/transmission of the video information at a lower bitrate than otherwise might be needed.
Volumetric video data represents a three-dimensional scene or object and can be used as input for virtual reality (VR), augmented reality (AR) and mixed reality (MR) applications. Such data describes the geometry, e.g. shape, size, position in three-dimensional (3D) space, and respective attributes, e.g. color, opacity, reflectance and any possible temporal changes of the geometry and attributes at given time instances, comparable to frames in two-dimensional (2D) video. Volumetric video is either generated from 3D models through computer-generated imagery (CGI), or captured from real-world scenes using a variety of capture solutions, e.g. multi-camera, laser scan, combination of video and dedicated depth sensors, and more. Also, a combination of CGI and real-world data is possible.
Typical representation formats for such volumetric data are triangle meshes, point clouds (PCs), or voxel arrays. Temporal information about the scene can be included in the form of individual capture instances, i.e. “frames” in 2D video, or other means, e.g. position of an object as a function of time.
Delivering and rendering detailed volumetric content on mobile devices can be challenging for augmented reality and virtual reality applications. Especially in augmented reality applications processing power of a processing unit (CPU) and/or a graphics processing unit (GPU) of a rendering device is spend on doing real-time image processing operations in order to determine viewing direction and position. Due to this reason there may be less processing power available for actual rendering of the augmented reality content.
Typical augmented reality content is a virtual character displayed on a display overlaid to the camera viewfinder image. Using low resolution/low complex content may not be appealing to the end-user and high quality AR content is desirable. Due to the limited CPU/GPU budget it may be difficult to create such AR content.
A problem with compressed color+depth video encoding is the quality of the encoded content. Depth encoding may play important role as it may produce highly visible unwanted visual artefacts during the playback phase. As an example, if a depth video is encoded so that a maximum depth value is used to determine if the depth value should be rendered or not, then if any depth max depth value changes its value during encoding/decoding, visible artefacts will appear during the rendering. An example of this is illustrated in FIG. 5 where the black points 502 illustrate some examples of the maximum depth value which does not represent a correct maximum depth value after video encoding but some other value less than the maximum depth value. If 8 bits are used in encoding the maximum depth value, the value of the point 502 should be 255 but due to erroneous behavior the value is less than 255. This causes that this point is rendered as a visible depth sample.