This invention relates to a method and system for improving the ability to quantitate the amount of flux or material in localized collections of such flux or material typically called a "spot". These spots, encountered in such fields as astronomy, chemistry, biology, and the like, may be distributed in patterns, constellations or other configurations. The problems are essentially the same in all of these fields-each spot must be located, resolved from possible overlapping spots and quantitated.
In the area of biotechnology, recent advances in separation technology, in particular two-dimensional gel electrophoresis, have made it possible to separate large numbers of different components that may be present in a complex protein mixture. Typically the separation is carried out on the basis of molecular charge in one dimension and molecular weight in the other. Protein spots thus separated may be stained and viewed directly. Scientists have estimated that 30,000 to 50,000 human-protein gene products may exist. In identifying a given protein after electrophoresis-gel separation, the researcher identifies spots of interest by their known placement in the characteristic overall spot pattern. Quantities of protein are typically judged in a subjective fashion that is not quantitatively accurate. Visual analysis of high resolution gels is laborious and time-consuming owing to the large number of spots that may be present, some of them barely visible. It is easy to overlook changes in the pattern that may be important. Furthermore, accurate determination of the amount of protein present in certain spots is necessary for a number of experiments, including longitudinal studies of clinical patients, and the kinetics of blood chemistry.
In order to improve upon the efficiency and precision by which a spot is analyzed, scientists have utilized automated gel quantitation systems incorporating computers to process the spot pattern images. Several analysis methods that have been used with 2-D gels work well with isolated spots, but are unable to accurately allocate protein between two spots that show overlap. Although a number of the spots on a typical gel are relatively free of overlap with neighbors, enough spots show overlap to make resolvability a very important consideration.
A typical gel quantitation system has four principal components: scanner, software, computer and display. A scanner, typically transmissively or reflectively, converts the image into an array of numerical gray level measurements usually termed pixels, each pixel representing an element of the array, suitable for computer manipulation.
The spots or pixels are obtained in several different ways. For example, the proteins to be separated may be radiolabeled. The radiation flux from these proteins after separation may be used to expose photographic film, thereby forming an autoradiogram of the gel. Or, the proteins may be stained by an optically absorbing material after separation. By either method, a pattern of optically absorbing spots is produced that may be measured with the aid of a scanner or scanning densitometer. The output of such a scanner is typically a series of measurements of the optical density of the stained gel or autoradiogram sample, these measurements being regularly spaced in a rectangular array that covers substantially the entire surface of the sample. The software is a collection of computer programs designed to analyze the data supplied by the scanner. The computer analyzes the pixel data by executing the instructions provided by the software. A display system demonstrates the results of this analysis, and permits the user convenient interaction with data and results if necessary.
Many improvements have been made in scanners, computers, and display systems to help quantitate spots, but the area that needs more attention is the quantitation of the spots. A spot can be considered to be a three-dimensional mountain peak with peak height corresponding to maximum spot density, i.e., quantity of protein, surrounded by neighboring peaks of varying width and amplitude. In the prior art there are two general approaches used to quantitate spots. The first approach deals with simple segmentation and contour following; the second approach deals with modeling. Of the two approaches, only modeling can adequately resolve overlapping spots.
A paper by Lutin, W. A., Kyle, C. F. And Freeman, J. A., "Quantitation of Brain Proteins By Computer-Analyzed Two Dimensional Electrophoresis", in Catsimpoolas, N, (Ed.), Electrophoresis '78, (Elsevier North Holland, New York 1979), pp. 93-106, describes one such modeling approach. As taught by Lutin et al., the raw scanner data are acquired in the form of pixels or numbers representing the gray level intensity of the spots. These pixels are then processed by the computer to determine which levels of intensity represent the background values. The background pixels are fitted by least squares to a two-dimensional polynominal. The image is then corrected for background variation by subtracting the polynominal value at each pixel location. In addition, the corrected image is smoothed by convolution.
The corrected data now are searched for a maximum value, the tacit assumption being that this value must be at the approximate center of a peak. Once the peak center has been found, inflection points of the peak are sought by scanning the data in all four raster directions away from the maximum found.
The average height of the four inflection points relative to the peak height is compared to that expected for a true gaussian. If a serious discrepancy is noted, the peak is assumed to be subject to interference from a neighboring peak and a Gaussian estimate is made from the inflection point value and maximum values obtained. Inflection points not lying in a plane, or an inflection point plane that is significantly titled, are also indications that an estimate is necessary. If, on the other hand, the inflection points are those expected for a single isolated gaussian, a weighted least-squares Gaussian fit is performed over the two-dimensional region bounded by the inflection points in order to obtain the Gaussian parameters.
The Gaussian parameters, whether obtained by estimation or fitting, are then used to create a gaussian that is subtracted from the surface. The data are searched again for a maximum. The previously found maximum is, of course, no longer present so a new peak is located. This procedure is continued until the maximum found is below a preset threshold level. When this occurs, the first gaussian is regenerated from its parameters, which have been stored in a list. This gaussian is then added to the surface. Inflection points are tested as before, and the gaussian is either fitted or estimated, then subtracted again. This time, however, the gaussian is found to be less influenced by neighboring peaks, the largest ones have already been removed, at least to a good approximation. The fit or estimate is thus more likely to be accurate. By this means, one is able to obtain good fits while treating only one gaussian at a time. The process is repeated for all gaussians already on the parameter list. After the last gaussian is processed, additional image maxima are determined. Additional gaussians are thus found and subtracted until a lower threshold is reached. Three such passes are made through the list. Each time, the threshold is lowered according to a predetermined sequence of values. After the third pass, a fourth and final pass is made in which estimation is not allowed and a least squares fit is forced for all gaussians.
Although this algorithm is an improvement over the prior art, it requires multiple passes to complete the process, each gaussian being treated several times. This excessive computation results in overall loss of efficiency. In addition, the prior art does not address the problem of negative residuals. These negative residuals are a result of subtracting a Gaussian model whose value in places exceeds the value of the data being modeled. The appearance of a negative residual is an indication that the parameters that were estimated or fitted to the actual peak represent a larger volume than is actually present. In other words, the values used to describe the spot (peak) are in error. Other problems with the prior art include numerical failures that occur when the matrix derived from the least-square normal equations becomes ill-conditioned.
An improved approach to quantitating spots is described in a paper by Jansson, P. A., Grim, L. B., Elias, J. G., Bagley, E. A. and Lonberg-Holm, K. K., "Implementation and Application of a Method to Quantitate 2-D Gel Electrophoresis Patterns" Electrophoresis 1983, 4, pp. 82-91, in which the analysis achieves greater stability. Jansson, et al., felt that one aspect of the prior art that needed adaptation was the cut-off criterion. This is the pre-determined density at which the algorithm specifies that peak subtraction is to stop, and peak addition, refitting and re-subtraction is to begin. This process is repeated until a new and lower cutoff is reached. Jansson, et al., noted that the arbitrary cutoff previously employed was too high, so that weak spots would be missed. At other times it was too low, so that large numbers of minute gaussians were fitted to noise. The solution to this problem was the introduction of a mathematical expression that specified this cut-off point by referring to known background parameters. This modification, although it yields an improvement in time efficiency, still does not address the problem of negative residuals, which introduce error in the quantitation of a protein spot. Also, this system is still iterative, utilizing multiple passes to treat each gaussian which is, in turn, time-consuming. This time burden can be relieved by the use of large main-frame computers. The use of these computers is, however, undesirable due to expense and lack of convenience.