High dynamic range (HDR) images have been gaining wide applications in the field of image processing, computer graphics and virtual reality to simulate the real world. The most popular approach for HDR generation is to synthesize an HDR image from several low dynamic range (LDR) images with different exposures. It is crucial to align the LDR images due to camera movement to avoid blurred HDR images.
Image alignment or registration has been a fundamental problem in image processing and computer vision. Although a large number of techniques have been proposed, the registration methods may be generally classified into two categories: pixel-based (intensity-based/area-based) algorithms and feature-based algorithms. Intensity-based methods use pixel-to-pixel matching to find a parametric transformation between two images. For feature-based methods, distinctive features from each image is extracted first before matching and warping the features and images according to parametric transformations estimated from those correspondences. As feature-based methods do not work directly with image intensities, it is more suitable for situations when illumination (intensity) changes are expected.
Different from the conventional cases of varying illumination, in which each image is best captured, the underlying images with different exposures for HDR synthesis have great variations of intensities to represent the whole dynamic range of real scenes. Specifically, the sequence of images for HDR generation contains severely under-exposed and over-exposed images. The large variations of intensities pose difficulty in using intensity-based method for image alignment. Also, the saturation or near saturation offers great challenge to use feature-based method because one feature detected in an image may not occur in another one. The situation is more challenging for both intensity-based method and feature-based method if dynamic scenes are considered, because the content change leads to difficulty in detecting consistent features.
Currently, some techniques have been adopted to align a set of differently exposed images. The SIFT (scale-invariant feature transform) method was employed to detect the feature points (key points) in the LDR images, then the RANSAC (RANdom SAmple Consensus) method was used to find the best pairs of key points and derived the transform parameters. An improved SIFT method has been proposed to detect corners as the feature points. To alleviate the effect of intensity variation on extracting feature points, both SIFT methods work in contrast domain. Meanwhile, intensity-based methods were employed for the extraction of feature points. To cope with intensity difference, researchers proposed to convert all LDR images to the identical exposure via the camera response function (CRF). This implies that the CRF has to be known before registration in the two approaches, which is not the usual case in HDR composition. Conventionally, LDR images are correlated by the empirical “preferred” comparametric model, and the spatial and tonal registrations were simultaneously performed by optimization, for example, a Levenberg-Marquardt algorithm. It is noted that this method contains large number (9(q−1) parameters where q is the number of LDR images) of parameters to be estimated simultaneously. Generally, the optimization in high dimensions may not guarantee global solution and the search speed is very slow. To mitigate the computation burden, an improved solution using the piecewise linear comparametric model was proposed.
Different from aforementioned registrations, a scheme called median threshold bitmap (MTB) was proposed to convert each LDR image into a binary one, based on which alignment was performed. This algorithm is popular to align differently exposed images because it is fast and useful to solve image alignment for translational movement. It is then extended to rotational alignment. However, the MTB has the following drawbacks. Firstly, much information of the original images is lost by the simple median threshold. Secondly, the conversion is very sensitive to noise, especially for the pixels near the median value. Thirdly, such conversion is definitely not accurate if the median value is saturated for over-exposed and under-exposed images.