In recent years, with the increase of visual information in digital format, there is a growing need for more immersive application, demanding better representations of light in the space. A full description of the light rays present in the space is provided by the Plenoptic Function, a theoretical vector function with 7 dimensions (7D) that could describe the light intensity passing through every viewpoint, in each direction, for every wavelength, and for every time instant. By restricting the spectral information components, it is assumed no variation in time and it is considered the intensity of each light ray as being constant along its path, the 7D function can be simplified to a four-dimensional (4D) representation, which is called the light field.
The light field is among the most efficient ways of representing three-dimensional (3D) naturalness of visible reality. It has become a great trend for being used in many images applications, including high-resolution microscopy, computer vision, velocimetry, health, and more. For instance, Google has been investing in light field technologies for applications with an extremely high-quality sense of presence by producing motion parallax and extremely realist textures and lighting. Moreover, a recent marketing research report entitled “Light field market by technology (imaging solution, display), vertical (healthcare and medical, defense and security, media and entertainment, architecture and engineering, industrial), and geography-global forecast to 2023” has announced that the light field market was valued at USD 924.7 Million in 2018 and is expected to reach USD 1,822.3 Million (>USD 1 Billion) by 2023, at a compound annual growth rate (CAGR) of 14.5% between 2018 and 2023. These remarkable amounts are driven by AR/VR industries, game developers, 3D animation vendors, 3D robotics, Industry 4.0, and movie industry.
Considering the demand of the industry for light fields technologies, it is expected a huge growth of light field content and the consequent increase of the generated light field data. Moreover, because light fields are able to capture the intensity of objects and record information of light rays, there is a generation of a massive amount of data during light field imaging, which implies in a large storage consumption. Therefore, anticipating both high demand for light field contents and high volume of light field data that will be produced, the Joint Picture Experts Group (JPEG) standardization committee has issued a call for proposal (CfP) on light field coding technologies, called JPEG Pleno.
JPEG Pleno is a standardization activity launched in 2014. Its goal is to create a standard framework for efficient storage and transmission of plenoptic imaging (light field, point-cloud, and holographic contents). In particular, JPEG Pleno aims to find an efficient way to represent plenoptic content. A call for proposals for compressing light fields obtained from both lenslet and high-density cameras, aiming at the definition of a standard for compression of plenoptic content, has been issued during the 73rd JPEG Meeting, ISO/IEC JTC 1/SC29/WG1 JPEG, “JPEG Pleno call for proposals on light field coding” (Doc. N73013, Chengdu, China, October 2016). Among the proposals submitted to the committee, the following three proposals provided the best performances:    1) Zhao et al., “Light field image coding via linear approximation prior” (in IEEE International Conference on Image Processing 2017-Light Field Coding Grand Challenge, Beijing, China, September 2017);    2) Tabus et al., “Lossy compression of lenslet images from plenoptic cameras combining sparse predictive coding and JPEG 2000” (in IEEE International Conference on Image Processing 2017-Light Field Coding Grand Challenge, Beijing, China, September 2017);    3) Graziozi et al, patent application US 2015/0201176 A1, entitled “Methods for Full Parallax Compressed Light Field 3D Imaging Systems”. 
The proposal of Zhao et al divides the light field view images into two complementary sets. The views in the first set are converted to a pseudo video sequence to be lossy compressed by a video compressor, like HEVC. The decoded views are then used as references to encode the second set of views. For each view in the second set a predicted view is created as being a linear combination of the reference views from the first set. The difference between the original views and the respective predicted ones is evaluated resulting in a set of residue views. These residue views are then encoded using the JPEG standard. This method can be employed to attain both lossy and lossless compression.
The proposal of Tabus et al presented a lenslet image compression method that is scalable from low bitrates to fully lossless bitrates. The lenslet dataset is also partitioned into two sets: the reference sub-aperture images (views) that are encoded using the JPEG2000 standard and a set of dependent views that are reconstructed from the reference views. Their reconstruction is performed by employing flexible interpolators implemented by sparse predictors. These are based both on the scene geometry extracted from the depth maps and the geometry of the micro lens array. In addition to the reference views, the depth map is encoded along with the displacement vectors and the coefficients of the sparse predictors from each region.
The proposal of Graziozi et al. attempted to find an optimal subset of light field samples to be encoded, while the remaining samples are generated using multi-reference depth-image based rendering.
Differently from the above proposals, this invention brings a new competitive way of encoding light fields to the JPEG Pleno standardization activities. The method of the present invention interprets the whole light field data in its native four-dimensional form, while the others employ scanning procedures to reduce the four-dimensional light field to a sequence of two-dimensional views. In them, a sequence of views can be directly encoded by a video codec, or some views are chosen as references while others are synthesized as linear combinations of possibly warped versions of the reference images. The methods that rely on warping have the disadvantage of depending on depth or disparity maps besides they are not always available. Moreover, depth map depending methods may not be robust and require high computational cost. Further, the quality of the maps has enormous influence on the performance of the compression method. The present invention, on the other hand, uses four dimensional transforms to explore the interview redundancy and achieves very competitive results.
The following solutions and technologies are found in the prior art:
The paper entitled “A Study on the 4D Sparsity of JPEG Pleno Light Fields Using the Discrete Cosine Transform”, by G. Alves, M. P. Pereira, M. B. Carvalho, F. Pereira, C. L. Pagliari, V. Testoni, and A. da Silva, in 25th IEEE International Conference on Image Processing (ICIP), pp. 1148-1152, 2018, presents an exploratory analysis of the 4D sparsity of light fields in the 4D-DCT space. This paper was developed to investigate the suitability of 4D-DCT for compressing both lenslets-based and High-Density Two-dimensional Camera Array (HDCA) JPEG Pleno datasets. In this paper, the results disclose that the lenslet datasets exhibit high 4D redundancy, with a larger inter-view sparsity than the intra-view one. For the HDCA datasets, there is also 4D redundancy worthy to be exploited, yet in a smaller degree. Unlike the lenslets case, the intra-view redundancy is much larger than the inter-view one. The paper was a first investigation concerning the suitability of 4D transforms for light field coding. However, differently from the present invention the paper did not disclose a complete codec.
The paper entitled “The 4D DCT-Based Lenslet Light-Field Codec” by M. B. Carvalho, M. P. Pereira, G. Alves, E. A. da Silva, C. L. Pagliari, F. Pereira, V. Testoni, on 25th IEEE International Conference on Image Processing (ICIP), pp. 435-439, 2018, proposes a preliminary light field codec that fully exploits the 4D redundancy of the light field data by using the 4D discrete cosine transform (DCT) and encoding of coefficients using bit-planes and hexadeca-tree-guided partitioning. However, this paper does not disclose all the features of the present invention. The paper partitions the four-dimensional light field using fixed-size blocks and encodes each of them with four-dimensional DCTs. The present invention uses a four-dimensional variable block-size partitioning structure, whereby a 4D hyper-rectangular region is either transform coded as it is or is partitioned into four hyper-rectangular sub regions in the spatial dimension or is partitioned into 4 hyper-rectangular regions in the views dimension. Also, in the paper, the hexadeca-tree partition is signaled by a binary flag that indicates whether a four-dimensional block is partitioned into 16 fixed hyper-rectangles, and the partition is determined always only by the magnitude of the coefficients. However, the present invention signals the optimized hexadeca-tree partition using a ternary flag where the encoding decisions are made by Lagrangian optimization based on a rate-distortion (R-D) criterion.
The paper “Lossy Compression of Lenslet Images from Plenoptic Cameras Combining Sparse Predictive Coding and JPEG 2000”, by I. Tabus, P. Helin, P. Astola, 24th International Conference on Image Processing (ICIP), pp. 4567-4571, 2018, describes a method for compressing light field data by selecting some reference views and making use of disparity maps and views synthesis that is refined by 4 four-dimensional sparse predictors. Differently, the invention proposed on this document compresses light field data using a four-dimensional block transform that does not rely on either depth maps or view synthesis.
Patent document EP 0855838 A2 entitled “A method for digital image compression using Discrete Wavelet Transform DWT” filed on Jul. 29, 1998, by CANON INFORMATION SYST RESEARCH AUSTRALIA PTY LTD, proposes an image encoding algorithm that encodes the positions of the non-zero transform coefficients of an image (2D) using the discrete wavelet transform and quadtrees, that is the recursive division of a rectangular image (2D) region into 4 rectangular image (2D) regions. The present invention encodes the positions of the non-zero coefficients of a four-dimensional (4D) transform of a light field (4D) using hexadeca-trees. That is, the recursive division of four-dimensional regions (4D hyperrectangles) into 16 four-dimensional hyperrectangles. On Claim 1, patent document EP0855838A2 informs that the method is to represent a digital image (a two-dimensional array of pixels), but the present invention is to represent a light field (a four-dimensional tensor). On Claim 2, patent document EP0855838A2 informs the use of a two-dimensional discrete wavelet transform, but this invention uses a four-dimensional transform. On Claim 11, patent document EP0855838A2 informs that each bit-plane of a two-dimensional region is scanned recursively, but this invention may either scan the bit-planes of a 4D region or mark the entire 4D region as discarded (all coefficients set to zero) if a rate-distortion criterion is met, which is equivalent to encode the positions of the non-zero coefficients in a lossy manner according to a rate-distortion criterion.
Patent U.S. Pat. No. 6,263,110 B1 entitled “Method for compression data”, filed on Sep. 29, 1998, by Canon Kabushiki Kaisha, proposes an image coding algorithm that encodes the positions of the data coefficients transformed into wavelets a non-null image (2D) using quadtrees, which is the recursive division of a rectangular image region into 4 rectangular image (2D) regions. The present invention encodes the positions of non-zero coefficients of four-dimensional transform of a light field using hexadeca-trees, which is the recursive division of four-dimensional regions (hyper-rectangles) into 16 four-dimensional hyper-rectangles. It is worth to emphasize that patent U.S. Pat. No. 6,263,110 B1 discloses the use of a two-dimensional (2D) discrete wavelet transform, but the present invention uses a four-dimensional (4D) block transform. Patent U.S. Pat. No. 6,263,110 B1 describes a method for compressing digital 2D images, but the present invention is conceived to compress the 4D light field data. The patent document U.S. Pat. No. 6,263,110 B1 sets n to claims 3 and 4, a method to round the coefficients of a region at a minimum bit-plane, but the present invention uses the same minimum bit-plane for the whole light field and in addition may either scan the bit-planes of a region or mark the entire region as discarded (all coefficients set to zero) if a rate-distortion criterion is met, which is equivalent to encode the positions of the non-zero coefficients in a lossy manner according to a rate-distortion criterion. The patent document U.S. Pat. No. 6,263,110 B1 defines on its claim 5 the use of a two-dimensional discrete wavelet transform, but the present invention uses a four-dimensional (4D) block transform. On Claim 6, patent document U.S. Pat. No. 6,263,110 B1 defines that the method is to represent a digital image (a two-dimensional array of pixels), but the method of the present invention is to represent a light field (a four-dimensional array of pixels)
The patent document U.S. Pat. No. 6,266,414 B1 entitled “Method for digital data compression”, filed on Sep. 29, 1998, by Canon Kabushiki Kaisha, proposes an image encoding algorithm that encodes the positions of the non-zero transform coefficients of an image (2D) using quadtrees, that is equivalent to the recursive division of a rectangular image region into 4 rectangular image (2D) regions. The present invention proposes the encoding of the positions of the non-zero coefficients of a four-dimensional (4D) transform of a light field using hexadeca-trees that represents the recursive division of four-dimensional regions (hyperrectangles) into 16 four-dimensional hyperrectangles. The patent document U.S. Pat. No. 6,266,414 B1 defines on its claim 1 the use of wavelet decomposition, but the present invention uses a four-dimensional (4D) block transform. On Claim 21, patent document U.S. Pat. No. 6,266,414B1 defines that the method is to represent a digital image (a two-dimensional array of pixels), but the present invention is to represent a light field (a four-dimensional (4D) array of pixels).
Patent document U.S. Pat. No. 6,389,074 B1 entitled “Method and apparatus for digital data compression”, filed on Sep. 28, 1998, by Canon Kabushiki Kaisha, proposes an image encoding algorithm that encodes the positions of the non-zero transform coefficients of an image (2D) using quadtrees and also proposes the use of Lagrangian optimization to find the optimum quadtree partition that encodes the positions of non-zero transform coefficients of an image, video or frame difference data, in a rate-distortion sense, but the present invention proposes the use of Lagrangian optimization to find the optimum hexadeca-tree partition in order to locate the non-zero transformed coefficients in the 4D light field data. On Claim 1(a), patent document U.S. Pat. No. 6,389,074 B1 defines the use of discrete wavelet transform, but the present invention uses a four-dimensional (4D) block transform. On Claim 1 (b), patent document U.S. Pat. No. 6,389,074 B1 defines the use of variable quantization with a quantization factor, but the present invention uses the same number of bit-planes (equivalent to the quantization factor) for the whole light field. In addition, the invention proposed herein may either scan the bit-planes of a region or mark the entire region as discarded (all coefficients set to zero) if a rate-distortion criterion is met, which is equivalent to encode the positions of the non-zero coefficients in a lossy manner according to a rate-distortion criterion. On Claims 6, 7 and 8, patent document U.S. Pat. No. 6,389,074 B1 defines that input data can be two-dimensional image data, two-dimensional video data or two-dimensional video frame difference data, but the present invention is for light field data, which consist of four-dimensional data.
The patent documents U.S. Pat. No. 5,315,670 A entitled “Digital data compression system including zerotree coefficient coding”, U.S. Pat. No. 5,321,776 A entitled “Data compression system including successive approximation quantizer”, U.S. Pat. No. 5,412,741 A entitled “Apparatus and method for compressing information”, GB 2303030 A entitled “Data compression using reversible wavelet transforms and an embedded codestream”, U.S. Pat. No. 5,867,602 A entitled “Reversible wavelet transform and embedded codestream manipulation” and U.S. Pat. No. 5,966,465 A entitled “Compression/decompression using reversible embedded wavelets” propose two-dimensional image encoding algorithms that use zero-trees to encode the positions of the non-zero coefficients of regions within an image. The present invention proposes the encoding of the positions of the non-zero coefficients of a four-dimensional (4D) transform of a light field using hexadeca-trees, that is equivalent to the recursive division of four-dimensional regions (hyperrectangles) into 16 four-dimensional hyperrectangles. On Claim 1 from patent document U.S. Pat. No. 5,315,670 A, Claim 1 from patent document U.S. Pat. No. 5,321,776 A, Claim 1 from patent document U.S. Pat. No. 5,412,741 A, Overview of The System of Present Invention, FIG. 1 and Applications from patent document GB 2303030 A, Overview of the Present Invention from U.S. Pat. No. 5,867,602 A and Claim 8 from patent document U.S. Pat. No. 5,966,465 A, it is informed that they target the representation of two-dimensional image data (a two-dimensional array of pixels), but the present invention is targeted at representing four-dimensional light field data. On Claim 1 from patent document U.S. Pat. No. 5,315,670 A, Claim 1 from patent document U.S. Pat. No. 5,321,776A, Claim 1 from patent document U.S. Pat. No. 5,412,741 A, “Coefficient Trees” section from patent document GB 2303030 A, “Coefficient Trees” section from U.S. Pat. No. 5,867,602 A and “Overview of The Present Invention” from patent document U.S. Pat. No. 5,966,465 A, there is a description of the zero-tree as a structure composed of a zero wavelet transform coefficient at a coarse level of information as a root to zero wavelet transform coefficients at the corresponding positions at all the finer levels of the wavelet transform coefficients (sub bands), but the present invention uses blocks of transform coefficients arranged in a hierarchical four-dimensional structure in the space-view called hexadeca-tree. On “Detailed Description” from patent document U.S. Pat. No. 5,315,670 A, “Detailed Description” from patent document U.S. Pat. No. 5,321,776 A, “Detailed Description” from patent document U.S. Pat. No. 5,412,741 A, “The Encoding and Decoding Process of Present Invention” section from patent GB 2303030 A, “Parser” section from patent document U.S. Pat. No. 5,867,602 A and “Detailed Description” from patent document U.S. Pat. No. 5,966,465 A, the patents inform the coding of coefficients for all wavelet transform levels according to a bit-plane scanning order until the available bit budget is exhausted or the entire image is coded, but this invention scans the coefficients up to a minimum bit-plane, determined using a rate-distortion (R-D) criterion valid for the whole light field and in addition may either scan the bit-planes of a 4D region or mark the entire 4D region as discarded (all coefficients set to zero) according to the same rate-distortion criterion, which is equivalent to encode the positions of the non-zero coefficients in a lossy manner according to this rate-distortion criterion.
Patent document US 20040114807 A1, entitled “Statistical representation and coding of light field data” filed on Jun. 17, 2004, by Lelescu et al, proposes the use of a two-dimensional Statistical Analysis Transformation in each view to represent and compress a light field. This is essentially a two-dimensional transformation of each view alone, whose basis functions are computed using the Principal Component Analysis (PCA) based upon the estimation of the autocorrelation function of the stochastic process consisting of views of the light field. This two-dimensional transformation is used to reduce the dimensionality of each view prior to encoding, but the present invention computes a four-dimensional block transform of the whole light field and encodes the positions of the non-zero coefficients of this four-dimensional block transform using hexadeca-trees, that are equivalent to the recursive division of four-dimensional regions (hyperrectangles) of light field coefficients into 16 four-dimensional hyperrectangles. On Claim 3, patent document US20040114807A1 defines the use of Principal Component Analysis PCA), but the present invention uses a four-dimensional block transform.
Patent document US 20140232822 A1 entitled “Systems And Methods For Generating Compressed Light Field Representation Data Using Captured LightFields, Array Geometry, And Parallax Information” filed on Aug. 21, 2014, by Pelican Imaging Corporation, proposes the compression of a light field using a view prediction scheme employing reference images and depth map information. In the present invention there is no view prediction step and a four-dimensional transform is applied directly to 4D-blocks of the 4D light field. In the patent document US 20140232822 A1 the views are reconstructed using pixel interpolation and the residual information generated by the prediction process, but in the present invention there is no need for prediction, pixel interpolation or use of depth maps. On Claim 1, the patent document US 20140232822 A1 defines the use of depth maps to guide the interpolation of the intermediate views, but the present invention encodes the whole light field data using a four-dimensional block transform, and thus does not rely on depth maps.
Patent document US 20150201176 A1 entitled “Methods for Full Parallax Compressed Light Field 3D Imaging Systems”, filed on Jul. 16, 2015, by OSTENDO TECHNOLOGIES INC, proposes a method for compressing light field data using depth-image based rendering (DIBR), enabled by a selected set of reference views, depth maps and view synthesis through warping schemes, but the present invention compresses light field data using a four-dimensional block transform of the light field, and does not rely on either depth maps or view synthesis. On Claim 10, the patent document US 20150201176 A1 defines that it uses selected views as references but the present invention encodes the whole light field data using a four-dimensional block transform. On Claim 11, the patent document US 20150201176 A1 defines the use of depth maps to guide the interpolation of the intermediate views, but the present invention encodes the whole light field data using a four-dimensional block transform, and thus does not rely on depth maps. On Claims 12 and 17, the patent document US 20150201176 A1 defines the use of depth-image based rendering to interpolate intermediate views based on warping, but the present invention encodes the whole light field data using a four-dimensional block transform, and thus does not need to render intermediate views.
The patent documents WO 2016090568 A1 entitled “Binary tree block partitioning structure” filed on Jun. 16, 2016, by MEDIATEK SINGAPORE PTE LTD, and WO 2016091161 A11 entitled “Method of video coding using binary tree block partitioning” proposes a two-dimensional block partition structure for coding of two-dimensional images and two-dimensional videos called QuadTree plus Binary Tree (QTBT), but the present invention uses a four-dimensional block partitioning structure for light field coding whereby a four-dimensional hyperrectangular region is either transform coded as it is, or is partitioned in 4 hyperrectangular sub regions in the spatial dimension or is partitioned in 4 hyperrectangular regions in the views dimension. This partition is encoded as a quadtree structure using a ternary flag signaling the transformation without segmentation, or the spatial dimension segmentation, or the views dimension segmentation, optimized based on a rate-distortion criterion computed using Lagrangian optimization. On Claim 1 of patent document WO2016090568A1 and on Claim 1 of patent document WO2016091161A11, it is informed that the method is for two-dimensional image or video coding, but the present invention is for four-dimensional light field data.