Camera calibration is a large research area in computer vision. A calibrated image is important in many scientific disciplines such as photogrammetry, vision, robotics, and consumer applications. In photogrammetry, a calibrated image enables the measurement of radiance at a particular point. This is used for modeling appearance and geometry, for example. In vision, calibrated images enable 3D reconstruction and texturing; in robotics, calibration is used for robot localization and object avoidance. For consumer applications, calibrated images are useful for geo-spatially organizing captured photos and for providing spatial context.
Calibrating (also referred to as geo-positioning) an image involves computing the parameters of a pinhole camera model that best describe how the image, from a still or video camera, was imaged from the 3D world. In other words, the pinhole camera model describes how a 3D point in the world projects to a 2D pixel in the image. The pinhole camera model represents most imaging devices. The basic pinhole camera model has two parts: intrinsic parameters (intrinsics) and extrinsic parameters (extrinsics). The intrinsics are the focal length, principal point, and pixel skew. These parameters describe properties internal to the camera. The extrinsics are 3D position and orientation. These six parameters describe how the camera is posed in the world. Calibrating an image is equivalent to finding the intrinsics and extrinsics of the associated pinhole camera model.
In practice, calibrating an image (e.g., computing the intrinsic and extrinsic effects) is a tedious process. In most common approaches, the user needs to supply a set of 3D-to-2D correspondences of points and or lines so that the system can estimate the parameters that best fit these measurements. Usually, for uncalibrated cameras (e.g., cameras of unknown intrinsic parameters), there is a need for more than six point correspondences, and in general, a larger set is supplied to minimized the error. For video cameras, more correspondences are needed to find the changing position of the camera over time. Typically, for video cameras, the pinhole camera model has fixed intrinsics and time-varying extrinsics (e.g. pose and orientation that change over time).
Conventional approaches introduce problems related to the user having to enter a large number of correspondences, the system solving for camera parameters and then accessing the result of the calibration. Moreover, there is no direct feedback, thus, the process requires experience in choosing the right set of points that is well spread over the image and requires experience to identify bad matches. Additionally, the user has a limited number of correspondences: one, two, or three, for example. It is desired that for every input of the user, the best position possible can be obtained, even if provided partial information. Still further, the above process needs a set of points of known 3D positions, sometimes referred to as “ground control points”. Collecting such accurate ground control points is not an easy task for the layperson.