1. Field of the Invention
The present invention relates generally to computer-implemented adjustment and matching of stereoscopic images of the eye fundus.
2. Description of Background Art
Retinal photography has long been an important tool for general ophthalmology, but the necessity of using traditional still photography has made analysis and detection of pathologies both time-consuming and subject to human error.
Telemedicine systems such as the xe2x80x9cOphthalmic Imaging Networkxe2x80x9d developed by Ophthalmic Imaging Systems, Inc., of Sacramento, Calif., are still in the experimental phase. They offer physicians a system for evaluating a patient""s retinal status. Using these systems, a single computerized retinal image is captured in the physician""s office and is electronically transmitted to a reading center, where it is analyzed, and the results returned to the physician for evaluation. However, this scheme requires experienced and certified readers, and involves a significant degree of error, resulting in the need for images having to be reread.
The automated evaluation of eye fundus topography and other features associated with severe ophthalmology diseases such as diabetic retinopathy and glaucoma could save millions of people from blindness.
Diabetic retinopathy alone is the number two leading cause of blindness in the United States, after macular degeneration, causing 10% of new cases of blindness each year. Diabetes is a common disease affecting about 2% of the population. Of these cases, 10-15% are insulin dependent (type 1) diabetics, and the remainder are non-insulin-dependent (type 2) diabetics. After living with diabetes for 20 years, nearly all patients with type-1 diabetes and over 60% of patients with type-2 diabetes show some degree of retinopathy. A diabetic is 25 times more likely to go blind than is a non-diabetic. Because of this increased risk, diabetics require periodic retinal screening, which should be part of the routine care of all patients, because it can significantly reduce the risk of developing diabetic eye disease.
Another disease of the eye is glaucoma. Almost 80,000 Americans are blind as a result of glaucoma, and another one million are at risk for vision loss and may not even be aware of the risk. It fact, glaucoma is one of the leading causes of preventable blindness in the United States, and the single-most common cause of blindness among African-Americans. Glaucoma is often called the xe2x80x9csneak thiefxe2x80x9d of sight because the most common type causes no symptoms until vision is already damaged. For that reason, the best way to prevent vision loss from glaucoma is to know its risk factors and to have medical eye examinations at appropriate intervals.
However, with the telemedicine systems of today requiring human reading, it is difficult to achieve regular, accurate, and inexpensive screening of diabetics and people at risk for glaucoma and other diseases.
The most important parameter in retinal examination is fundus topography. For this reason, ophthalmologists prefer to analyze fundus images in stereo. For example, it is impossible to see diabetic macula edema, which is the swelling of the most sensitive area of the retina, without stereo photographs. Cystoid foveal changes at the central focusing area of the retina are also difficult to detect without a stereo view. Changes to the optic nerve due to glaucoma are also hard to observe using just one picture. Using a stereo image pair, however, 3D information can be extracted from the images and used for imaging and measurements of fundus topography.
At present, automated evaluation of fundus topography is performed with scanning laser systems, which use multiple laser scans to render 3D volumes and extract depth information for the fundus features. One of the products available on the market is the TOPSS scanning laser tomography system of Laser Diagnostic Technologies, Inc., of San Diego, Calif. The system is a multiple-purpose tomograph for imaging and measurement of fundus topography. The system can image and measure topographic features and changes of the optic nerve head, macula holes, tumors, and edemas. With respect to digital images, it is able to enhance image visualization, make volumetric measurements, draw topographic profiles and compare images. However, while scanning laser systems are able to provide reliable information about retinal topography, they are very expensive and use narrow laser beams, which may be harmful to the eye. It would be desirable to extract the same information at low cost from regular stereo photographs made with digital fundus cameras. However, this requires extraction of 3D information from 2D photographs. This is made difficult or impossible by differences in illumination between images, and stereoscopic distortions leading to an inability to match points in one image to corresponding points in other images, as further described below.
Usually, stereo photographs of the human eye fundus are taken with one camera shifted by a small distance, illuminating the fundus through the pupil as illustrated in FIG. 1. The shape of the eye fundus is generally spherical, so the difference in color and brightness between stereo images depends on the position of the camera, which illuminates the fundus through the pupil at different angles. For example, FIGS. 2a and 2b show a left and a right image of an ocular nerve, respectively. In the figures, the left part of the left image in FIG. 2a is darker than the left part of the right image in FIG. 2b, and the right part of the right image is darker than the right part of the left image. In order to be able to perform a matching analysis on the images and create a topographical representation of the fundus, these illumination errors must be substantially reduced or eliminated. In addition, it is often desirable to compare two images of the same fundus taken at different times, or with different cameras. This additionally presents a situation where the illumination in each image may be different, requiring correction before appropriate analysis can take place.
It is possible to adjust the brightness, contrast and color of two images using a histogram adjustment method, as proposed by Kanagasingam Yogesan, Robert H. Eikelboom and Chris J. Barry in their paper, xe2x80x9cColour Matching of Serial Retinal Images,xe2x80x9d Lions Eye Institute and Centre for Ophthalmology and Visual Science, Perth, Western Australia, Australia, published in xe2x80x9cVision Science and its Applications,xe2x80x9d OSA Technical Digest (Optical Society of America, Washington D.C., 1999, pp.264-267), which is incorporated by reference herein in its entirety. In their paper, the authors propose a color-matching algorithm that equalizes the mean and standard deviation of each of the three colors in the image. First, the entire image is split into the colors red, green and blue; the mean and standard deviation are calculated, and then the histograms of both images are adjusted to equalize the images. The color image is reconstituted by recombining the three channels. The problem with this method of adjustment is that the equalization adjustment is made for the whole image, so the differences in illumination within the images remain unchanged. For example, consider the points 202a and 202b in FIGS. 2a and 2b, respectively. From the figures, it can be seen that point 202a is much darker than point 202b. However, since 202a and 202b actually are the same point on the eye fundus, both points should ideally be illuminated equivalently. Because the Kanagasingram et al. method uses a histogram to adjust the brightness of the whole image, if FIG. 2a were lightened, for example, points 202a and 202b might end up being equally bright, but point 204a, which was originally lighter than point 204b, would now be even brighter, causing increased differences in illumination between the points 204a and 204b. Thus, adjusting the entire image to compensate for different illumination is not a satisfactory solution. What is needed is a way of adjusting differently illuminated images of an eye fundus to compensate for different lighting conditions, so that accurate matching can be performed.
Epipolar Line Adjustment
In addition, for both real-world and computer-generated imaging applications, there is a growing need for display techniques that enable determination of relative spatial locations between objects in an image. This is particularly helpful for extracting the 3D topographical information from the stereo image pairs.
One method used to determine spatial relations between objects is binocular stereo imaging. Binocular stereo imaging is the determination of the three-dimensional shape of visible surfaces in a static scene by using two or more two-dimensional images taken of the same scene by two cameras or by one camera at two different positions. Every given point, A, in the first image has a corresponding point, B, in the second image, which is constrained to lie on a line called the epipolar line of A. As soon as the correspondence between points in the images is determined, it is possible to recover a disparity field by using the displacement of corresponding points along the epipolar lines in the two images. For example, if two cameras are parallel, the disparity is inversely proportional to the distance from the object to the base line of the cameras, and the general equation in this case is:
D=fb/Z.xe2x80x83xe2x80x83(1)
Here, D is the disparity, f is the focal length of the cameras (it is the same for both cameras), b is the distance between cameras (the base), and Z is the distance from the object to the baseline. Thus, disparity approaches zero as depth approaches infinity. Once the disparity field is generated and the points in the images are matched, the spatial characteristics of the objects in the images can be calculated using Euclidean geometry.
A related problem in the field of stereo imaging is object recognition and localization. Object recognition and localization includes identifying an object or a particular class of objects, such as identifying a chair, and determining the location of the object in order to maneuver or manipulate the object accordingly. One of the first steps in computer object recognition is connecting as much information as possible about the spatial structure of the object from the analysis of the image. The spatial structure of the object is also important for many other applications, such as three-dimensional object modeling, vehicle navigation and geometric inspection.
Unfortunately, it is very difficult to recover three-dimensional information from a set of 2D images as this information was lost when the two dimensional image was formed.
Most algorithms assume that the epipolar lines are given a priori, and thus pose the stereo matching problem as a one-dimensional search problem. In order for such an assumption to work, the two cameras must be set mechanically to have parallel optical axes such that the epipolar lines are horizontal in both images. However, even if one tries carefully to arrange the imaging geometry in such a way, there is still some degree of error, and the corresponding points are not strictly on the same horizontal lines. In the general case, calibration is necessary to recover the epipolar geometry accurately. Possible reasons that the pixels in one image do not have matching pixels lying along the same row are that the optical axes are not parallel, the base line is not horizontal, the sensors that are used to create the image do not coincide, or the cameras have different lens distortion, etc.
The matching problem can be simplified to a one-dimensional problem if the underlying epipolar geometry were known. What is further needed, then, is a system and method for determining the epipolar geometry between two or more images; as well as a system and method for aligning the images to the same epipolar line to complete the transformation.
Occlusion Detection
Another major obstacle to properly matching points in the images is caused when occluding contours coincide. Occluding contours coincide when a point that is visible in the right image is not visible in the left image, and therefore does not really have a matching point. Alternatively, occluding errors can also occur at the borders or edges of an object that are captured by a camera facing the object at different angles (called xe2x80x9coccluding boundariesxe2x80x9d). This is caused by the traditional correspondence procedure which will be described in further detail below.
The most standard situation where occluding contours occur is when other objects in the scene block the point of interest. When this occurs, area-based matching algorithms often give wrong disparity estimates near the contour. When the classical stereo correlation technique is applied and the search is made in the left image, the contour usually xe2x80x9cleaksxe2x80x9d to the right of the object boundary as illustrated in FIG. 16. Another set of errors is shown in the top left corner of FIG. 16 and is associated with out-of-focus objects that cannot be matched correctly.
The conventional solutions used to successfully detect occlusions and avoid false correspondence require three or more cameras. In the simplest case, several cameras may be used to capture an image of the scene from equal angles along a hemisphere that surrounds the scene. Thus, if a point is not included in the second image, the first image may be matched to the third image and used to xe2x80x9ccompletexe2x80x9d the occluded area in the second image. If not positioned properly, however, multiple camera stereos can increase the area of occlusion and may still lead to false correspondence. More specifically, depth maps generated from a polynocular stereo image often have blurred object shapes caused by the false correspondence at occluding boundaries.
Another set of solutions involves creative manipulation of a matching algorithm. Some matching algorithms may be better at avoiding false correspondence problems, but none solves the problem completely. For example, feature-based matching algorithms, which try to correspond points only at object edges, may be used to avoid occlusion to an extent. Other binocular stereo algorithms have also been adapted to try to detect xe2x80x9chalf-occludedxe2x80x9d regions in order to improve the correspondence search. In both cases, however, the algorithms fail to measure the depth in these regions. More recently, new algorithms were developed for multiple camera devices, which may provide better results in occlusion detection.
In each conventional solution, either multiple cameras are needed to prevent occluded regions or the method is extremely time intensive and, in both cases, the resulting correspondence errors prevent creation of a complete depth map of the scene. Using multiple cameras increases the cost, burden and complexity of the imaging system, and the resulting images are still not amenable to depth analysis. It would be desirable, therefore, to have a new method for detecting and eliminating occlusions and out-of-focus errors thereby enabling the creation of an accurate depth map of the scene without requiring significant time and effort to accomplish.
Thus, what is needed is a system and method for accurately recovering the topography of an eye fundus from 2D stereo images of the fundus.
In accordance with the present invention, there is provided a system and method for automated adjustment of images of the eye fundus. First, the images are adjusted to compensate for differences in illumination. Next, an epipolar line adjustment is made to correct for vertical displacement errors. Image occlusion errors are then detected and removed, and a matching algorithm can then be run to recreate a topographical map of the fundus from the stereo image pair.
The first step requires adjusting differently illuminated images of an eye fundus (106) to reduce and eliminate illumination errors. In one embodiment, two or more images (206, 208) are obtained by an image-receiving device (502) that is coupled to a processing computer (500). In another embodiment, the images exist on film or paper, and are converted into computer-readable form by a scanning device. Pixels within each image are assigned to groups (306, 308) of a selected width. Each group forms a line through the image. The lines may be either straight or curved, although a selection of longitudinally curved lines allows for greater reduction in illumination errors. Each group (306) in the first image (302) is associated with a corresponding group (308) in the other images. Next, the intensity level for at least one color channel is determined for each pixel in each group (306, 308). From this data, the mean intensity level for each group (306, 308) is then determined. In one embodiment, the variance of each group (306, 308) is additionally determined. The mean intensity levels for each group (306, 308) are compared in each image (302, 304), and the intensity level of pixels in one or more images are then adjusted so that the nth group in each image will have approximately equal mean intensity levels.
The next step involves determining the epipolar geometry between two or more images (910), (920) taken of the same scene. First, points in the images (910), (920) are matched using an enhanced matching method (1300). This method (1300) provides highly accurate matching results in an efficient manner.
Once the points are matched, the images (910), (920) are adjusted so that the epipolar geometry of both images (910), (920) are aligned. The images (910), (920) may then be combined into a single stereo image. The present invention can then use other stereo imaging methods to provide alternate views of the same object, thereby enabling determination of object characteristics, such as size and distance.
The next step is the elimination of correspondence errors associated with image occlusions. In one embodiment of the invention, the method first applies traditional correspondence methods for matching points in two images, a left image 10A and a right image 10B, taken of the same scene. Ideally, the initial search is performed by matching each point (1710) in the right image 10B with a xe2x80x9cbest matchxe2x80x9d point (1720) in the left image 10A. Once an initial set of matching points (1710, 1720) is generated, a second search is performed by using the best match point (1720) in the right image 10B as the basis for an additional correspondence search in the left image 10A. While the first search was performed without restriction, the second search is explicitly limited by the starting point (1710) used in the first search. A second xe2x80x9cbest matchxe2x80x9d point (1730) is generated. The point (1730) generated in the second search may be the same point (1710) that was used in the first search or may be a different point altogether. This results in a second set of points that represents the most accurate match between points.
As will be further described below with reference to FIG. 17, limiting the search window on the second search results from the way in which occlusions manifest themselves as errors during correspondence. More specifically, incorrectly matched points often cause leakage in a particular direction depending on the direction of image used in the first search. If the initial points used in the first search are points in the right image 10B being matched to the xe2x80x9cleftxe2x80x9d image 10A, then the first search will generate good matches for points on the left edge of objects in the image, with a poor match on the right edge of the object. In this scenario, the second search will generate good matches for points on the right edge of any objects in the image. By placing the additional limitations on the second correspondence search, the poor match points on the left side of the object will be avoided while still picking up the correctly selected correspondence points on the right edge of the object. This limitation also speeds up the correspondence process significantly, as only a portion of the points in the row are used during the correspondence search. Thus, the best points from each of the searches are used to establish correspondence in the fastest possible fashion.
In another embodiment of the invention, the restrictions placed on the second search are removed and the resulting points used to accurately identify the occluded areas. These results may be used in conjunction with the results of the first embodiment to generate an error map that accurately identifies potentially problematic areas. More specifically, the results of correspondence search in the second embodiment avoid any xe2x80x9cfalse positivesxe2x80x9d and can be used to further modify the results of the first embodiment.
Steps for removing any additional errors in the final images are also provided. For example, each stereo image could be broken down into separate images for each color coordinate. The correspondence search could be run on each image separately with the results used to create a separate disparity map for each color coordinate.
After the images have been adjusted as described, a conventional matching algorithm may be used to extract topographical information from the stereo image pairs in order to evaluate the eye fundus.