Three-dimensional (3D) or depth sensing cameras, namely, structured light 3D cameras, time-of-flight 3D cameras, and stereo-vision 3D cameras, are imaging devices that acquire depth images. A depth image represents distances from the scene to the 3D. 3D camera devices, and the depth images they provide, are used to analyse static and dynamic 3D elements within a captured scene, such as, objects and users.
Analysis of a captured scene may include detection, localisation, and identification of objects and/or users and their respective analysis. One common problem which occurs during such analysis is the unknown orientation of the camera. For example, a vertically-oriented object in the scene may appear to be horizontal in the depth image if the camera is rotated by 90 degrees clock- or counter-clockwise around its optical axis. It is therefore advantageous to know the parameters relating to the camera so that better results can be obtained when analysing a captured scene.
Camera calibration is the process in which the true parameters of the camera are determined. These true camera parameters are usually used as correction parameters and may, for most part, be represented by a linear transformation, namely, a camera calibration matrix which can be used, for example, to denote a projective mapping from the real world coordinate system to a camera coordinate system for that particular camera.
Camera parameters include intrinsic and extrinsic parameters, and these are widely addressed in the literature, for example, in “A Four-step Camera Calibration Procedure with Implicit Image Correction” by Janne Heikkila, or in “Calibration Method for an Augmented Reality System” by S. Malek et al.
Intrinsic parameters encompass imaging device optical specifications, such as, image format, principal point and focal length. They may be modelled and integrated in a transformation matrix applied to data related to the camera coordinate system in order to correct some potential distortions during scene capture. Lens distortion may also be taken into account as a non-intrinsic parameter, but will not be directly incorporated in the transformation matrix as it is a non-linear transformation.
Extrinsic parameters encompass 3D position and 3D orientation of the camera relative to a world coordinate system. A camera coordinate system is associated with the camera and a transformation matrix is defined in order to provide projection of data measurements from the camera coordinate system to the world coordinate system.
By considering a 3D camera as a simple pin-hole, extrinsic parameters may be the only relevant parameters that need to be determined and applied to provide a convenient correction and/or transformation.
In order to find extrinsic parameters, camera vertical, lateral and longitudinal axes, respectively yaw, pitch and roll axes have to be considered as they define the camera coordinate system. More precisely, the yaw axis is an axis drawn from top to bottom of the camera, and perpendicular to the other two axes. The pitch axis is an axis running from the camera left to right, and parallel to a Y-axis of the camera sensor. The roll axis is an axis drawn in the normal direction of the camera body from back to front along its optical axis. Basically, the camera coordinate system origin is located on the sensor chip, for example, at the top left corner of the sensor chip or the centre of the sensor chip. This is described in more detail below with reference to FIG. 6.
In addition, the camera position within the scene needs to be considered. This position needs to be estimated by finding or defining a reference point in the scene, the reference point being set as the origin of the real world coordinate system.
Several methods for calibrating cameras are known. Most of these methods concern two-dimensional (2D) cameras, and a few concern three-dimensional (3D) cameras. Furthermore, calibration is most often performed off-line, and not in real-time, on a static camera. Markers located in the scene may also be used to help the calibration process. Such a calibration process often includes several steps and requires user interaction.
In addition, these methods tend not to correct for all possible orientations of the 3D camera, and, they are often limited to being used with cameras in specific orientations, for example, generally downwardly- or generally horizontally-oriented cameras.
US-A-2010/0208057 discloses a method of determining the pose of a camera with respect to at least one object in a real environment. The method includes analysing a captured image of the object to determine distance data relating to the location of the camera with respect to the object and orientation data, the distance data and orientation data being used to provide pose information relating to the camera.
In an article entitled “Vision and Inertial Sensor Cooperation Using Gravity as a Vertical Reference” by Jorge Lobo and Jorge Dias, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 25, No. 12, December 2003, a method is described for using inertial sensor data in sensor systems. Vanishing points and vanishing lines are used with the inertial sensor data to determine a ground plane from which a mapping between ground plane points and image points can be derived.
Vieville et al. in “Computation of ego-motion and structure from Visual and Inertial Sensors Using the Vertical Cue”, describe a method of recovery of three-dimensional data structure and motion of a scene using visual and odometric sensors by building a three-dimensional depth and kinematic map of the environment. A vertical in the image is used to align with the true orientation three-dimensional vertical.
US-A-2010/0103275 describes a still 2D digital camera with integrated accelerometers in which roll and pitch and variations are measured and used as an input so that the displayed image is corrected and aligned with conventional horizontal and vertical display device directions. The method described only applies to variations to the horizontal and/or vertical camera axis so as to allow switching between landscape and portrait modes.
WO-A-2010/082933 describes a system where markers or objects in an image of a scene are aligned with corresponding markers or objects in the original scene in order to perform a geometric camera calibration. Camera parameters are thus determined by a method which analyses mismatch between target model and the target itself.
US-2011/0128388 discloses a camera calibration system including a coordinate data generation device and a coordinate data recognition device. The coordinate data generation device generates a plurality of map coordinate data corresponding to a plurality of real positions in a real scene. The coordinate data recognition device receives an image of the real scene and the map coordinate data from the coordinate data generation device. It determines image positions corresponding to real positions and then calculates image coordinate data corresponding to those image positions. From the image coordinate data and the map coordinate data, a coordinate transform matrix is determined.
EP-A-2154650 describes a method of real-time or near real-time calculation of scene coordinates from image data acquired by a 3D camera using a transformation matrix from the camera coordinate system to the world coordinate system. The method relies on detecting one or more planar surfaces within an acquired 3D image, selecting one of these planar surfaces as being a reference plane, for example, the ground. The position, roll and pitch orientation parameters of the 3D camera are then determined in relation to the selected reference plane. Such calibration is carried out by executing a few steps with a limited amount of human intervention once the 3D camera is installed in its proper position, that is, the floor has to be in the frustum of the camera and seen by the camera so that a random sample consensus (RANSAC) based plane detection can be used to detect it. Once the calibration matrix is set, it is then used until the camera setup changes. At that time, a new calibration process has to be launched manually.