This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present invention. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Conventional image capture devices render a three-dimensional scene onto a two-dimensional sensor. During operation, a conventional capture device captures a two-dimensional (2-D) image representing an amount of light that reaches a photosensor (or photodetector) within the device. However, this 2-D image contains no information about the directional distribution of the light rays that reach the photosensor (which may be referred to as the light field). Depth, for example, is lost during the acquisition. Thus, a conventional capture device does not store most of the information about the light distribution from the scene.
Light field capture devices (also referred to as “light field data acquisition devices”) have been designed to measure a four-dimensional (4D) light field of the scene by capturing the light from different viewpoints of that scene. Thus, by measuring the amount of light traveling along each beam of light that intersects the photosensor, these devices can capture additional optical information (information about the directional distribution of the bundle of light rays) for providing new imaging applications by post-processing. The information acquired/obtained by a light field capture device is referred to as the light field data. Light field capture devices are defined herein as any devices that are capable of capturing light field data. There are several types of light field capture devices, among which:                plenoptic devices, which use a microlens array placed between the image sensor and the main lens, as described in document US 2013/0222633;        a camera array, where each camera image onto its own image sensor.        
The light field data may also be simulated with Computer Generated Imagery (CGI), from a series of 2-D images (called views when two differing images representing a same scene are captured with different viewing points) of a scene each taken from a different viewpoint by the use of a conventional handheld camera.
Light field data processing comprises notably, but is not limited to, generating refocused images of a scene, generating perspective views of a scene, generating depth maps of a scene, generating extended depth of field (MOE) images, generating stereoscopic images, and/or any combination of these.
The present disclosure focuses more precisely on light field based image captured by a plenoptic device as illustrated by FIG. 1 disclosed by R. Ng, et al. in “Light field photography with a hand-held plenoptic camera” Standford University Computer Science Technical Report CSTR 2005-02, no. 11 (April 2005).
Such plenoptic device is composed of a main lens (11), a micro-lens array (12) and a photo-sensor (13). More precisely, the main lens focuses the subject onto (or near) the micro-lens array. The micro-lens array (12) separates the converging rays into an image on the photo-sensor (13) behind it.
Contrary to the plenoptic device, camera array devices, such as the Pelican Imaging® camera, deliver directly matrices of views (i.e. without de-mozaicing).
Generally, the four-dimensional (4D) light field is processed by using a focal stack, which comprises a collection of images each of them being focused at a different focalization distance. Such a focal stack allows a user to change a focal point of the images by post-processing.
Dataset of the light field image or video (whether acquired by a plenoptic camera, a camera array or simulated with Computer Generated Imagery (CGI)) is reorganized to form a light data volume in the vicinity of the focal plane of a front lens similar to the light field generated by a lens in the vicinity of its focal plane. Such a focal stack 100 is schematically illustrated in FIG. 2.
A conventional focusing with a camera is simulated by selecting one of the images 101, 102, 103 within the focal stack 100, which corresponds to moving the focalization plane perpendicularly to the main optical axis z of the camera.
Among the many new light-field imaging functionalities provided by these richer sources of data, is the ability to manipulate the content after it has been captured; these manipulations may have different purposes, notably artistic, task-based and forensic. For instance, it would be possible for users to change, in real time, focus, field of depth and stereo baseline, as well as the viewer perspective. Such media interactions and experiences are not available with conventional imaging formats that would be obtained by using the conventional standard image or video codecs to encode/decode light field based images.
Moreover, an AIF (All In Focus) image may be generated by focus fusion: the in-focus region is detected in each focal stack image, then all these in-focus regions are fused to form an AIF image.
State of Art methods for encoding such light field based images consists in using standard image or video codecs (such as JPEG, JPEG-2000, MPEG4 Part 10 AVC, HEVC). However, such standard codecs are not able to take into account the specificities of light field imaging (aka plenoptic data), which records the amount of light (the “radiance”) at every point in space, in every direction.
Indeed, applying the conventional standard image or video codecs (such as JPEG, JPEG-2000, MPEG4 Part 10 AVC, HEVC) delivers conventional imaging formats.
In particular, using traditional inter-frame encoding schemes results in a plenoptic view being encoded using information from its past, future, or temporally neighbouring image (from the same point of view) without taking into account of the knowledge provided by other views (taken from other points of view).
As an alternative, using multiview encoding methods, such as MPEG MVC, consists in obtaining prediction from one view to another, but is not suitable to encode the depth provided by the 4D light field.
As a consequence, after the decoding of 4D light field data encoded with traditional standard image or video codecs, the reconstruction of plenoptic images may be inaccurate. Obtaining the AIF image from such data may thus be impossible.
It would hence be desirable to provide a technique for encoding/decoding light field based images that would avoid at least one drawback of the prior art.