Engineers use compression (also called source coding or source encoding) to reduce the bit rate of digital media content. Compression decreases the cost of storing and transmitting media information by converting the information into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original information from the compressed form. A “codec” is an encoder/decoder system.
Often, compression is applied to digital media content such as speech or other audio, images, or video. Recently, compression has also been applied to point cloud data. A point cloud represents one or more objects in three-dimensional (“3D”) space. A point in the point cloud is associated with a position in the 3D space. If the point is occupied, the point has one or more attributes, such as sample values for a color. An object in the 3D space can be represented as a set of points that cover the surface of the object.
Point cloud data can be captured in various ways. In some configurations, for example, point cloud data is captured using special cameras that measure the depth of objects in a room, in addition to measuring attributes such as colors. After capture and compression, compressed point cloud data can be conveyed to a remote location. This enables decompression and viewing of the reconstructed point cloud data from an arbitrary, free viewpoint at the remote location. One or more views of the reconstructed point cloud data can be rendered using special glasses or another viewing apparatus, to show the subject within a real scene (e.g., for so-called augmented reality) or within a synthetic scene (e.g., for so-called virtual reality). Processing point cloud data can consume a huge amount of computational resources. One point cloud can include millions of occupied points, and a new point cloud can be captured 30 or more times per second for a real time application.
Some prior approaches to compression of point cloud data provide effective compression in terms of rate-distortion performance (that is, high quality for a given number of bits used, or a low number of bits used for a given level of quality). For example, one such approach uses a graph transform and arithmetic coding of coefficients. Such approaches are not computationally efficient, however, which makes them infeasible for real-time processing, even when powerful computer hardware is used (e.g., graphics processing units). Other prior approaches to compression of point cloud data are simpler to perform, but deficient in terms of rate-distortion performance in some scenarios.
Prior approaches to compression and decompression of point cloud data also fail to provide a bitstream of encoded data that can readily be decoded with devices having diverse computational capabilities or quality requirements. For scalable compression and decompression of images and video, a bitstream of encoded data can be separated into partitions representing different regions of an image, different spatial resolutions within an image or region, different levels of quality (often called signal-to-noise ratio (“SNR”) scalability), or different temporal resolutions (i.e., frame rates). Attempts to provide such scalability for point cloud compression and decompression have not been successful, however, in large part due to the complexity of integrating scalability features with underlying compression/decompression operations. In particular, prior approaches to compression and decompression of point cloud data do not support spatial location (random access) scalability or signal-to-noise ratio scalability, and they are impractical for temporal scalability and spatial resolution scalability in many scenarios.