An image is made up of visual elements, wherein a visual element is defined as a region in an image sample. The image sample may be a complete image frame captured by a camera or any portion of such an image frame. In one arrangement, a visual element is an 8 by 8 block of Discrete Cosine Transform (DCT) coefficients, as acquired by decoding a motion-JPEG frame. In other arrangements, a visual element may be implemented as, for example: a pixel, such as a Red-Green-Blue (RGB) pixel; a group of pixels; or a block of transform coefficients, such as Discrete Wavelet Transformation (DWT) coefficients as used in the JPEG-2000 standard. Global alignment is the process of determining the correspondence between visual elements in a pair of images that have common subject matter. The alignment is also referred to as shift. The terms ‘shift’ and ‘alignment’ are used interchangeably throughout this specification to describe a translation between two images.
Global alignment involves determining the parameters of a translation from one image to another image. Global alignment is an important task for many imaging applications, such as image quality measurement, video stabilisation, and moving object detection. For applications executed on embedded devices, the alignment needs to be both accurate and fast. Given the alignment between consecutive frames from a panning camera, a panoramic image can be constructed during image capturing. Overlapping images are stitched along a seam that is selected to avoid cutting through moving objects, as well as minimising the intensity mismatch of the images on either side of the seam.
A correlation-based global alignment approach has good robustness against difficult imaging conditions, such as low light, camera motion blur, or motion in the scene. However, the computational expense of the correlation-based global alignment approach is high.
A Fast Fourier Transform (FFT) based two dimensional (2D) correlation approach applies a Fast Fourier Transform (FFT) on images and computes 2D phase correlation. This approach requires O(N2 log N2) computations for N×N pixel images. The computational complexity can be reduced to O(N log N), if the correlation is performed on one dimensional (1D) image projections only. This approach is suitable for images with strong gradient structures along the projection axes. Most indoor and natural landscape scenes contain enough horizontal and vertical details for this purpose.
A projection-based correlation approach uses projections of the gradient energy along four directions 0°, 45°, 90°, and 135°. Gradient energy is the sum of the square of the gradient on a horizontal and a vertical axis. The projection of the gradient energy along one angle is the sum of the gradient energy along the angle. The use of gradient energy rather than intensity improves the alignment robustness under local lighting changes. This approach is used for viewfinder alignment, in which motion is restricted to a small translation, such as less than 10% of the frame, and a small rotation, such as less than 1°. The approach is not suitable in the case of larger translations (or occlusions) and rotations.
For panoramic image construction, one approach is to use camera calibration, pairwise 2D projective alignment, bundle adjustment, deghosting, feathering blend, and cylindrical coordinate mapping. However, this approach is typically too complex and computationally too expensive for embedded devices or for cloud computing applications where a large number of images need to be processed simultaneously.
Other approaches use low cost sweep panorama functionality, but result in low quality panorama images, due to artefacts such as ghosting and truncation of moving objects.
Despite having a speed advantage, previous projection-based alignment algorithms have a number of limitations. First, the image pair must have a substantial overlap (more than 90% of the frame area) for the alignment to work. This is because the image data from non-overlapping areas adds perturbation to the projections, eventually breaking their correlation. Second, previous gradient projection methods are not robust to low lighting conditions. The low energy but dense gradient of dark current noise often overpowers the stronger but sparse gradient of the scene structures when integrated over a whole image row or column. For a similar reason, gradient projection methods are also not robust against a highly textured scene like carpet or foliage. Finally, heavy JPEG compression creates strong blocking artefacts that bias the shift estimation towards the DCT (Discrete Cosine Transform) grid points.
Thus, a need exists to provide an improved method and system for determining a shift between a first image and a second image.