Perspective Distortion
Photography provides a 2D representation of the 3D world. This 3D to 2D transformation is achieved via projection of the 3D scene on a 2D sensor using a lens.
At large distance and if camera optical axe is perpendicular to the extension of an object, the perspective projection provides a pleasant picture “in line with human expectation”. But, if there is a significant ratio between the distances to the closest parts and the furthest parts of the object, parts at close distance will appear at different magnification than parts at larger distance and this magnification variation causes a perspective distortion in the photo of the object.
There are a number of commonly known perspective distortion effects. When the extension of a large, object is not orthogonal to the optical axis, parallel lines in the object are not parallel on the photo since the closest end is magnified more than the distant end. This is for example the case when a skyscraper is photographed from a low angle. Another often encountered effect is when the distance between camera and object in the photo is in the same order of magnitude as the depth of the topology of the object; close parts appear out of proportion with distant parts. This is for example the situation in selfies (self-portraits taken by cameras held in the hand of the subject) where the arms-length distance (30 to 50 cm) between the camera and the head of the subject is about the same distance order of magnitude as the distance between the nose and the ears, so that the nose appears unnaturally large.
Thus, perspective distortion can affect any photo where objects or scenes involves a large magnification variation.
Perspective Correction
The perspective correction problem has been partially solved in some specific use cases, most of the time with tools requiring user interaction. Current tools allows to correct low and high-angle shot photos, but these corrections are global corrections based either on the knowledge of the camera position and orientation, or on an assumed geometry of the scene (e.g. parallel lines in a building).
Most of the currently available correction solutions only provides correction of optical distortion introduced by camera optics, such as fish-eye, barrel, or pincushion correction. In these cases, the optical distortion is modelled and a global correction is applied.
DxO ViewPoint 2 (http://www.dxo.com/intl/photography/dxo-viewpoint/wide-angle-lens-software) represents the current state of the art within perspective distortion correction. The DxO ViewPoint application allows correction of perspective distortion introduced by the camera optics when a wide-angle lens is used. It also allows correction of vanishing lines by performing a different projection, but the correction is independent of the distance between the camera and the object(s) or scene, and cannot simultaneously correct close and large distance deformations. The applied correction is global and is applied independently of the topology of the scene or object in the photo.
Global corrections are based on smooth canonical function defined by a small number of parameters, on camera extrinsic and intrinsic parameters, and/or on “a priori” or user defined data. As an example, FIGS. 1A-D illustrates image corrections made by DxO ViewPoint with the same correction applied on a photo and a checkerboard pattern. FIG. 1A shows the original photo and pattern; in 1B a lens distortion correction is applied; in 1C perspective correction (natural mode) is applied; and in 1D perspective correction (complete mode) is applied. The problem with global corrections such as those in DxO ViewPoint is that they are applied over the entirety of the image, independently of the subject. The correction can be tuned through user interaction to a given object in a given plane at a given location, but if the scene contains several objects at different distances from the camera, the correction cannot be good for all of them.
There are a number of smartphone applications such as SkinneePix and Facetune that allows you to improve the appearance of selfies. SkinneePix performs a simple geometric warping of the photo and is dependent on the photo containing only one face which is centered in the photo, i.e. implying a known geometry. Facetune allows local changes in shape, but it is essentially a simplified Photoshop tool specialized to photos of faces, where the user can control a local warping. It is not a perspective correction tool and does not rely on photo depth information.
There exist software for creating 3D models from multiple cameras (e.g. RGB+TOF camera, stereo camera, or projected light systems such as Microsoft Kinect). Another example is the paper “Kinect-Variety Fusion: A Novel Hybrid Approach for Artifacts-free 3DTV Content Generation” by Sharma Mansi et al describes a way to extract depth information using fusion of multiple sensors and improve depth map extraction using the Microsoft Kinect camera and projected light structure technology. The second part of the paper relates to 3DTV content generation where new views of a scene are generated using multiple images of a scene captured by cameras as they move in relation to the scene (p. 2277, Section B). These stereo vision techniques do not provide a way to automatically correct perspective distortion in a single photo. Neither do 3D GFX libraries such as OpenGL provide a way to perform perspective correction using depth information.
As of today, the available tools for perspective corrections are either global corrections implying a certain geometry of the object (straight lines, centered face), recording situation (low-angle, close up), or local corrections requiring the user's interaction and his/hers knowledge of the natural or a desired appearance of the scene or object (typically referred to as photoshopping).