Generating entirely well-focused images in automatic imaging large three-dimensional scene is to automatically acquire images from the large scene and produce an all-focused high-resolution image of the whole scene using many snapshots of portions of the scene. For example, in unattended automatic optical imaging of a large three-dimensional scene, a series of consecutive sections of the scene is to be imaged and automatic focusing is needed to generate entirely well-focused images. There are two typical scenarios:
1) In light microscopy: automatic imaging of the whole or a large portion of specimen slides using light microscopes, i.e., the so-called high throughput scanning (HTS), or whole-slide imaging;
2) In classical photography: unattended continuous scene capturing and tracking using normal (still/video) cameras, such as in video surveillance, close-up photography and vision-guided robotics, where focus is adjusted automatically, repeatedly and dynamically.
In optical imaging, two important concepts are field of view (FOV) and depth of field (DOE). Consider the imaging system is described in an X-Y-Z three-dimensional coordinates with the Z being the optical axis and the imaging plane being the X-Y plane. FOV is the two-dimensional area in the X-Y plane that an imaging system can see in a single shot. DOF is the range of distances that the imaging system can maintain focus at certain depth along the optical axis.
A singly well-focused image only contains a single focus. A focused image may have multiple focuses. In comparison, an entirely well-focused image (also called all-focused image (AFI)) is an image whose every portion is well-focused. Therefore, the optimal focus of each portion, possibly with different degrees, has to be found. Finding the optimal focus requires the evaluation of the degree of focus using a stack of images, called Z-stack images since they are acquired at different positions along the Z-axis. These positions are called nodes herein and they are determined by using some sampling rates along this axis. Such techniques are also referred to as extended focal imaging (EFI) or focus enhancement. The computation cost increases dramatically if the number of portions is high. In typical cases, such a number is in the order of a few hundreds or even thousands. FIG. 1 illustrates conceptually that an image plane 100 is partitioned into 6×4 non-overlapped grids e.g. 102 of the same size Ng×Ng. Each grid 102 is a portion on which the focus value is evaluated, called an evaluation square, and the center of the square called the evaluation point (EP).
Where the scene is large, it is impossible to capture the entire scene in a single snapshot due to relatively limited field of view of the optical imaging system. Hundreds even thousands of snapshots are needed, each on a small region of the large scene, called a tile or a section. These tiles may be overlapped or non-overlapped.
Marcial et al [Marcial Garcia Rojo, Gloria Bueno Garcia, Carlos Peces Mateos, Jesus Gonzalea Garcia and Manuel Carbajo Vicente, “Critical Comparison of 31 Commerically available Digital Slide Systems in Pathology,” International Journal of Surgical Pathology, Vol. 14, No. 4, pp. 285-305, 2006] presented a review of the existing work in this field. They compared critically 31 commercially available related systems that are able to perform a whole slide digitization or assistance in complete slide review
The autofocusing involved for obtaining entirely well-focused images is a technique which finds automatically the best focused depth among a curve of focus values FV(k), computed from such a Z-stack of images βk, k=1, . . . , K, at different depths indexed by k.
Online direct search methods, such as the hill-climbing search, the Fibonacci search or the golden ratio search, are widely used [Dmitry Fedorov, Baris Sumengen, and B. S. Manjunath, “Multi-focus Imaging Using Local Estimation and Mosaicking,” IEEE International Conference on Image Processing 2006 (ICIP06), Atlanta, Ga. USA, October 2006] to numerically solve this nonlinear optimization problem. They are online search methods as they decide the search direction just after the comparison of FVs and the positions where the image to be acquired are decided online.
Both the Fibonacci search method and the golden ratio search method belong to the so-called bracketing search methods. They first evaluate two focus values at the two ends of an interval Γp and then only one focus value is evaluated for each subsequent iteration in a new interval Γn, a portion of Γp. The ratio r between the length of Γn and that of Γp is fixed. The termination criteria of all three methods, i.e. hill-climbing search, the Fibonacci search and the golden ratio search, are the same.
All these methods are based on a unimodal assumption that the actual focus values of the windowed image have a shape that peaks at exactly one point and monotonically decreases away from this point. Should the values exhibit such a property, they are rather efficient and can converge to the true peak eventually, likely very fast. However, FV depends on both the image content and the imaging system. The content may be from a scene with multiple layered components. Also, there could be noise in FVs. Therefore, an actual FV curve may not be smooth and may have multiple peaks. As a result, these methods may be trapped into a local optimum, which depends on the initial conditions for the searching starting point. Although there are some variations of the approaches, they basically find local optimum.
Exhaustive search is a method to find the global optimum. Given a grid, suppose the largest possible search range Γ=[kmin, kmax] is known. It is then partitioned into intervals separated by designed nodes. Typically, the equally-spaced nodes at position k1, . . . , kNΓ, are used such that the interval is ΔΓ=length(Γ)/(NΓ−1). Each node corresponds to a certain optical depth and NΓ images are captured at these NΓ nodes for this grid. The optimal focus depth is the one having the maximal FV among these images. This method will not be trapped into local maximums and is more robust to noise.
The drawback of this method is that the designing of the nodes may be inappropriate. Theoretically, one should follow the Shannon's theorem so that the sampling interval ΔΓ is smaller than a minimum value. To know such a value, one should know the cut-off frequency of the three-dimensional specimen along the optical axis and know the influence of the optical system point spread function (PSF). However, it is hard to know them before we have any examination over the scene (such as a specimen). As a result, either over sampling or under sampling happens in practice.
To avoid loss of information, the highest sampling rate will be used. Given a search range Γ=[kmin, kmax], the number of nodes NΓ is thus the largest. Since one has to evaluate FV at all nodes, the computation cost is proportional to NΓ. It is large and the whole search process is slow, compared with the hill-climbing search and the Fibonacci/the golden-ratio search. The amount of memory required to store the images is also large. In practice, one may only store the current maximum FV and its depth index and drop those images captured. However, as a result, one has to capture the images again in case they need to be retrieved later. Another issue is to make sure the actual focuses range is within Γ=[kmin, kmax]. To assure this, the largest depth range along the Z-axis is chosen.
The exhaustive method can find the global optimum among the examined images. However it is time-consuming. One way to address this issue is to enlarge the sampling interval ΔΓ. However, to avoid down-sampling, prior knowledge is required about the spectrum of the scene along the optical axis before image acquisition and focus search.
It has also been proposed to use an estimation of focus surface for neighboring tiles. Reference [Ilya Ravkin and Vladimir Temov, “Automated microscopy system for detection and genetic characterization of fetal nucleated red blood cells on slides,” Proceedings of SPIE, Vol. 3260, pp. 180-191, 1998] introduces a microscopy imaging system developed by Applied Imaging for the detection and genetic characterization of fetal nucleated red blood cells on slides. Z-stack images are acquired to evaluate the FV over a subsampled image or limited to the region of interest within the whole image hence to find a single focus for each image (tile). A second order polynomial function is further developed to predict the focus surface for the new stage position hence new tile dynamically to reduce the error due to the tilt of moving stage, wedging of the slide and cushioning of the cover slip due to mounting media in their application in reference [Volker Hilsenstein, “Robust Autofocusing for Automated Microscopy imaging of Fluorescently Labelled Bacteria,” Proceedings of the Digital Imaging Computing: Techniques and Applications (DICTA 2005), pp. 1-7], Volker Hilsenstein also notices the possible tilt of stage and proposes to fit a plane model for the slide to estimate the tilt and the variability of the focus positions, which constrains the search range to the locally optimal range for each field. Although both prior arts have not discussed how to generate an entirely well-focused image, their techniques are in any event limited to predicting focus surfaces by considering structure factors in the imaging system. In other words, those techniques have not considered the influence of factors coming from the scene to be imaged itself, hence offer little advantage where structural factors may not be the only or major influence, for example where the three-dimensional nature of a large scene with varying depth distribution across the scene contributes significantly to varying depths of focus across the scene, or in a calibrated imaging system where the imaging geometry has been corrected.
Even in a single AFI in EFI, there are many evaluation tiles at each of which a local focus needs to be found. Within a single AFI, reference [T. T. E. Yeo, S. H. Ong, Jayasooriah and R. Sinniah, “Autofocusing for tissue microscopy,” Image and Vision Computing, Vol. 11, No. 10, December 1993, pp. 629-92910] reviews the basic techniques to find them on each EP, such as Fibonacci search and the exhaust search. For non-EPs, bi-linear interpolation is suggested to decide their local focus surfaces. A recent reference [Dmitry Fedorov, Baris Sumengen, and B. S. Manjunath, “Multi-focus Imaging Using Local Estimation and Mosaicking,” IEEE International Conference on Image Processing 2006 (ICIP06), Atlanta, Ga. USA, October 2006] discusses multi-focus imaging, a simplified version of EFI and utilizes local estimation but without involving focus surface predication for neighboring grids/tiles. Both above works discuss forming a single AFI, but do not touch on imaging a large scene.
Sharing the same optical imaging principle, techniques for producing entirely well-focused images in microscopy are also applicable to photography, notably, the methods for the calculation of focus values and finding optimal depths. More generally, autofocusing is much related to the technique of “shape from focus” as well as that of “depth from focus/defocus.”
A need therefore exists to provide a method and system for generating an entirely well-focused image of a large three-dimensional scene which seek to address at least one of the above mentioned problems.