Point clouds have been considered as a candidate format for transmission of 3D data, either captured by 3D scanners, LIDAR sensors, or used in popular applications such as Virtual Reality/Augmented Reality (VR/AR). Point Clouds are a set of points in 3D space. Besides the spatial position (X,Y,Z), each point usually has associated attributes, such as color (R,G,B) or even reflectance and temporal timestamps (e.g., in LIDAR images). In order to obtain a high fidelity representation of the target 3D objects, devices capture point clouds in the order of thousands or even millions of points. Moreover, for dynamic 3D scenes used in VR/AR application, every single frame often has a unique dense point cloud, which results in the transmission of several millions of point clouds per second. For a viable transmission of such a large amount of data, compression is often applied.
In 2017, MPEG issued a call for proposal (CfP) for compression of point clouds. After evaluation of several proposals, MPEG is considering two different technologies for point cloud compression: 3D native coding technology (based on octree and similar coding methods), or 3D to 2D projection, followed by traditional video coding. In the case of dynamic 3D scenes, MPEG is using a test model software (TMC2) based on patch surface modeling, projection of patches from 3D to 2D image, and coding the 2D image with video encoders such as HEVC. The method has proven to be more efficient than native 3D coding and is able to achieve competitive bitrates at acceptable quality.
When coding point clouds, TMC2 encodes auxiliary information related to the patch projection, such as patch position in the 2D canvas image and bounding box size. For temporal coding of auxiliary information, patch matching between patches from current point cloud and patches from the immediately decoded point cloud is used for prediction. The procedure is limited to the immediate neighbor and includes performing delta coding for all the frames in the sequence.
The state-of-the-art in point cloud compression using video encoders represents point clouds as 3D patches and encodes a 2D image formed by the projection of geometry and attributes into a 2D canvas. The packing of projected 3D patches into a 2D image is also known as 2D mapping of 3D point cloud data. Currently, the process allows for patch placement such that there might be some ambiguity in the ownership of the block. The ownership of the blocks is sent to the decoder as auxiliary information. However, the information is not being coded using video encoders, but directly using the entropy coding proposed in the standards.