The image formation model allows mathematically characterizing the process by which a point of the scene is projected onto an image.
Depending on the required accuracy and the application different types of formation models such as “pinhole” camera models, thin lens or thick lens are used.
The “pinhole” camera model (FIG. 3) is the simplest and the most used in the area of computer vision.
Among the main model parameters are the optical center or center of projection and the focal length which is the distance between the image plane and the optical center. The optical center is generally used as the origin of the coordinate system relative to a camera. In this model all points of FIG. 3 contained in any of the dotted lines are projected onto the same point on the image plane. Therefore, each point in the image represents a line in space that contains all the points projected on it. In stereo-vision this important property is used to be able to obtain the 3D coordinates of a point by triangulation (a straight line for each camera).
The “pinhole” camera model (“Multiple View Geometry in Computer Vision”, Hartley, R. I. and Zisserman, A.) may be too simple when the lens used in the camera produces aberrations in the projected image. Some of the most common aberrations include: spherical aberration, astigmatism, radial/tangential distortion and field curvature.
The set of all the parameters which allow modelling the projective process and the distortions caused by the lenses are known as intrinsic parameters of the camera.
Depending on the camera type the information projected in each point of the image may have different nature. For example, in an RGB image, each pixel provides information about the interaction zone of the incident light that is reflected to the plane of the camera (FIG. 5). In X-ray images, each pixel corresponds to the attenuated intensity due to absorption and diffraction phenomena between two surfaces limiting a volume.
The depth cameras are sensors that allow to create two-dimensional images in which each pixel contains the information of the distance between the point of the scene represented, and the plane of the camera. Sometimes, after a calibration process, it is possible to obtain the three spatial coordinates of the scene points with depth information.
There are multiple techniques for obtaining the depth of a scene. A first classification would distinguish between passive or active techniques. In the first case depth is obtained in most cases by using triangulation. For this, the images obtained by two or more RGB cameras are used and the correspondence problem is solved. The fundamental advantage of the passive methods is that special lighting conditions are not required and they are suitable for working outdoors in daylight. The disadvantage of passive cameras is that the correspondence problem is difficult to solve in both intensity and color homogenous zones.
In the case of active techniques the scene is illuminated artificially with a light pattern that by means of adequate processing, allows to determine the depth.
One of the pioneering techniques in this field illuminates the scene through a linear light beam. The deformation of the projection of the beam that impacts with objects in the scene can be related to the depth by triangulation if the position of the light source and the camera that captures the image are known. By the relative movement of the light beam with respect to the objects to be measured it is possible to obtain a set of profiles that form the depth image. The drawback of this technique is that the acquisition time of a depth image is large because in every instant solely one intensity profile is obtained.
An alternative way to obtain the depth image with a single image is to use structured light. In this case a known pattern of light such as a set of horizontal or vertical lines is usually projected. Again the analysis of pattern deformations allows to know the depth in many profiles.
Time of flight cameras employ an alternative technique similar to the one used in radar systems. In this case a specific sensor is used to measure the time of flight of a light pulse. The advantage over radar systems is that it is possible to obtain the depth of all the image points simultaneously and a sweeping of a spot beam is not necessary.
Recently a new type of low cost depth cameras appeared in the market. They use a different type of structured light known as coded light. Although these cameras were initially designed for leisure-related applications, its low cost has made possible that a great amount of new applications in many different areas appear (U.S. 2010/0199228 A1, Kinect patent). Such cameras are also known as RGB-D, this is because each point provides information about its color and depth. This is possible because the coded light pattern is in the near infrared.
3D cameras have numerous applications in fields such as industrial design, and medicine. In these cases either the cameras are used for registration or for object modelling. In other fields of application such as the video surveillance or assisted driving, the depth information is very useful to overcome ambiguities that are very difficult to solve using only the information of a conventional RGB image.
The set of systems for the reconstruction of the surface of an object is what is called “depth camera”. If this measuring system includes the texture of the object it is called “texture and depth camera”.
The X-rays (and gamma rays) techniques are commonly used in non-destructive analysis (Industrial Radiology: Theory and Practice R. Halmshaw; Niet-destructief onderzoek ISBN 90-407-1147-X (Dutch). WJP Vink, Non-destructive analysis. Application of machine vision to food and agriculture: a review Davies ER) since the early twentieth century, both in clinical diagnosis and inspection of objects and have led to major technological advances in the development of detectors and production methods. The radiographic images are obtained by placing a natural or artificial source of gamma or X rays passing through a part or all the examined object, and a generally flat or linear detector at the other side. Absorption differences due to the nature of the material and material thickness generate an image of intensities on the detector.
The fundamental difference between X-rays and gamma-rays is that the first are derived from a source which generates a continuous spectrum of photons, while gamma rays from natural de-excitation of atomic nucleus or deep layers of electrons in the atom have known energies, namely monochromatic sources.
The use of radiographic X-ray sources has the advantage that intensity can be modulated and the emission can automatically be cut, while gamma sources cannot cut their emissions because this is a natural process that follows a varying intensity over time:I=I0(t0)e−wt where “w” is the inverse of the half-life of the isotope, “I0 (t0)” is the source intensity measured at time “t0” and “t” is the time where the current measurement is performed.
This feature allows applications such as measuring the diffusion of tracers in live systems and allows applications such as the SPECT (cameras that detect the projection of the isotope in a plane) or PET (geometrically paired cameras that detect coincidences of photons from positrons produced by the decay of the nucleus).
There are two ways commonly used to obtain the intensity of the source that passes through the scene, or the total intensity is measured without discriminating the individual energy of every photon or the individual energy of every photon is counted and measured by using gamma cameras. This latter type of techniques is applied in the aforementioned PET and SPECT applications. When X-rays interact with matter they are partly absorbed and partly transmitted. The probability of interacting with the material depends on the electron density which is a function that depends on the incident photon energy, and on the elemental composition (Z or atomic number) of the material.
Thus the absorption of X-rays depends on the distance they cross, and the characteristics of the material. The transmitted intensity is determined by the following expression:I=I0∫e−k(r)xdx where “I0” is the incident intensity, “k(r)” is a constant that depends on the electron density of the material and “x” is the distance crossed. Because the sources are usually punctual, the radiation intensity “I” must be multiplied by a geometric factor that depends on the square of the inverse of the distance “r” to the source. The absorption coefficient is additive, then in a material composed by different elements:k=Σωiki where “ω” refers to the fraction of each component and the subscript “i” indicates the “k” characteristic for each of them. To find the distance “x” in the acquisition of planar X-ray images is not possible (“The X-ray Inspection” Dr. Ing. M. Purschke. Castell-Verlag GmbH) unless the geometry is known.
From the point of view of the end user, it is necessary to establish an optimization protocol of both, the maximum energy of the beam and its intensity, in order to prevent the saturation of the image by an excess of intensity or the lack of contrast by default therein. Optimal working parameters depend on the density or density variations expected.
In clinical diagnostic environments, for example, the calibrations are performed by means of mannequins with known densities. Once the equipment is calibrated, the defined parameters are used to highlight different injuries and to analyze the state of bones and tissues, or to locate foreign bodies.
In some cases to calibrate the detector response, easily detectable elements with different thicknesses are included in the scene either manually or automatically to establish a link between the measured intensity in the pixel of the detector and the material thickness.
Another technique is the use of X-ray sources that emit at two different energies (multi-voltage). These techniques rely on the fact that X-ray absorption in the medium is different and depend on the electron density of the material and the energy of the X-ray beam. Comparing the images of the same scene acquired at different voltages one can get information on the material composition, being the method able to perform a densitometric study. This method that allows the evaluation by comparing (FIG. 1) the distance travelled by the radiation in the material is inaccurate because the measures obtained are relative. It also requires extra steps and to have either time for the two measurements or to have two radiological devices.
In general, the image of X-rays or gamma rays obtained in a detector is a mixture of different frequencies (colors) from the emitting source. It is a multispectral image where multiple wavelengths overlap. For applications where it is necessary to select a small continuous band of wavelengths, diffraction techniques or radioactive sources of known energy are applied. The former have the disadvantage of drastically reducing the light intensity of the beam and the second has the disadvantages associated with the natural sources of gamma rays.
The resulting image has a camera equation characterized by the intrinsic parameters, i.e. distance from source to detector, radial corrections in case of use of image intensifiers, and some extrinsic parameters which locate the radiological system to a coordinate system.
When two radiographic images registered from different locations are combined, triangulation methods suitable to the visible light can be applied. This manual or automatic identification of scene elements in the two projections allows the reconstruction of the spatial position of the element (“Trajectory triangulation: 3D reconstruction of moving points from a monocular image sequence” Avidan and Shashua). Thus, the pixels obtained in the image cuts generate, by means of the implementation of algebraic methods, a stereoscopic image with three-dimensional information (FIG. 2).
The 3D image reconstruction can be performed from 2D images using image acquisitions from different angular positions (“tomography systems”). In these devices, the pixel size in the micrometer range is termed “micro-tomography systems.” The image acquisition can be based on “gantry systems” with a detector or detectors and a source or sources of X-rays that rotates on its axis or in systems in which the scene rotates on its axis.
Sensors which provide planar images are detectors sensitive to the radiation intensity detected either in a plane or linearly arranged, but relying on the relative movement of the object and the detector, and synchronizing the displacement speed with the speed of reading. Although the calibration treatment of these images is slightly different to obtain the register, as in the first case you have a different focal for each vertical-horizontal orientation of the image, the mathematical differences in their treatment are not relevant.
The assembly between the image sensor and the source of X-rays or gamma rays is called ‘radiological system’, and provides ‘radiological images’. If these images contain information related to a ‘reference frame’ they are called ‘registered radiological images’.
As already mentioned, there are well-known techniques for image registration in the state of the art. The depth camera systems require knowing the calibration parameters of the camera to reconstruct the image. There are techniques that use identifiable and recognizable symbols by means of image processing which allow obtaining this matrix. These methods allow allocating a line in space for each pixel, on the basis of the distance between the focus and the center of the camera, as well as the rotation and translation matrix and the distortion coefficients. These systems are called “reference frame for depth cameras” in this document.
For the same purpose, in X-ray the calibration is performed using fiducial or marks systems that identify coordinates which are known and visible on different imaging modalities and are adapted to the various setting modes. In the case that the acquisition is made by volume, a technique used in medicine is the use of frames calibrated and adapted to the patient (FIG. 4). These systems are called “reference frame for radiological image” herein named as “reference frame”.
The surface information obtained with different wavelengths in the visible region of the spectrum may be also used for texture analysis, i.e. to study periodic spatial distribution patterns that form the surface topography of the object locally.
Throughout herein, when two systems or devices images provide spatial information in the same coordinate system will be referred to as “registered systems or devices” and the images captured by such systems will be referred to as “registered images”.
However in the state of the art devices integrating radiological systems and depth cameras working together to provide densitometric images of objects, scenes or individuals, are not found.