1. Field of the Invention
The invention relates to method and apparatus for analyzing images, and more particularly to comparing related or similar images (black and white or color). The invention relates in preferred embodiments to a computer-based process, a computer-based system, a computer program, and/or a computer program product for analyzing gel electrophoresis images to identify new proteins or proteomes, and/or to compare gel images to identify quantitative or qualitative changes in proteins.
2. Related Art
Because of acidic and/or basic side chains, proteins include positively and/or negatively charged groups. The behavior of a protein in an electric field is determined by the relative numbers of these positive and/or negative charges, which in turn are affected by the acidity of the solution. At the isoelectric point (pI), the positive and negative charges are exactly balanced and the protein shows no net migration. Proteins with a pI less than about 7.5 are usually analyzed in isolelectric focusing (IFE) gels. The pH gradient in these gels is usually preformed and the sample applied at the neutral end of the gel (all acidic proteinsxe2x80x94proteins with a pI smaller than 7.5xe2x80x94will be charged negatively). At the start of electrophoresis, all the negatively charged proteins will start to migrate towards the anode, i.e. through the gel. As each protein reaches its pI, the electrical forces on the protein from the cathodic and anodic sides become equal and the protein thus xe2x80x9cfocuses.xe2x80x9d Proteins with a pI greater than 7 are best resolved on non-equilibrium pH gradient electrophoresis (NEPHGE) gels. Hence the proteins are applied again at the neutral pH region and all basic proteins (with a pI greater than 7) will be charged positively. During electrophoresis, they will therefore move towards the cathode, i.e., through the gel.
Obviously, the same separation could be obtained in an immobilized pH gradient electrophoresis gel system (IPG) where the pH gradient is covalently bound to the support matrix (polyacrylamide). Different proteins have different proportions of acidic and/or basic side chains, and hence have different isoelectric points. In a solution of a particular hydrogen ion concentration (pH), some proteins move toward a cathode and others toward an anode. Depending upon the size of the charge as well as upon molecular size and/or shape, different proteins move at different speeds. This difference in behavior in an electric field is the basis of the electrophoresis method of separation and/or analysis of proteins.
Two-dimensional gel electrophoresis (2DGE) is a particularly effective tool for analyzing proteins. Cell extracts from any prokaryotic or eukaryotic cell are put onto a gel, and the individual proteins are separated first by the pI (first dimension) and then by size (second dimension). The result is a characteristic picture of as many as 1000 to 5000 spots, each usually a single protein. Resolution is improved by increasing gel composition or size, and/or by enhancing the sensitivity through the use of radiochemical methods, silver staining, and/or the reduction in thickness of the gels to 1.5 mm or less. Jungblut et al. have reported up to 5000 protein spots of mouse brain on gels of size 23xc3x9730 cm (Journal of Biotechnology, 41 (1995) 111-120).
High resolution 2DGE has been used for analyzing basic as well as acidic proteins. Isoelectric focusing (IEF) in the first dimension can be combined with sodium dodecyl sulfate (SDS) gel electrophoresis in the second dimension (IEF-SDS). Alternatively, nonequilibrium pH gradient electrophoresis (NEPHGE) in the first dimension can be combined with SDS gel electrophoresis in the second dimension (NEPHGE-SDS). Such procedures are known in the art, e.g., as described in O""Farrell, J. Biol. Chem. 250, 4007-4021 (1975) and O""Farrell et al., Cell, 12:1133-1142 (1977), the entirety of which are incorporated herein by reference. Gels cannot be used for the determination of absolute isoelectric points, but rather to an observed value due to the fact that the running conditions, e.g., high urea concentration, are not ideal (in the physical-chemistry meaning of the wordxe2x80x94the pKa values of the side chains are rather different between water and high concentrations of urea). NEPHGE gels cannot be used for the determination of absolute isoelectric points of proteins. The isoelectric point of a protein is usually determined in reference to a stable pH gradient formed during isoelectric focusing. As discussed in O""Farrell (1977), the best resolution of acidic proteins is obtained with equilibrium IEF since the region of the gels containing acidic proteins is compressed in NEPHGE. The best resolution of basic proteins is with a pH 7-10 NEPHGE gel. For the highest resolution of the entire range of proteins, two gels are preferably used: (1) an IEF gel for acidic proteins; and (2) a NEPHGE gel for basic proteins. Of course, the precise image obtained depends upon many factors including, but not limited to, chemicals such as ampholytes, physical parameters such as gel dimensions and/or temperature and/or the electrical parameters employed.
Once a 2DGE gel is run, the image can be revealed by a number of ways including: staining (e.g., with Coomassie blue, silver or fluorescent or immunological staining); direct measurement of the radioactivity (using devices that can detect the radioactivity (including so-called xcex2-imagers, or phosphor image technologies); or photographic film sensitive to the radioactivity); fluorescent enhancement of the radiographic emissions (using various fluorophores) or combinations of the above. In addition, samples can be treated before electrophoresis (e.g., with monobromobimane and related compounds) so that the proteins are detectable after electrophoresis using some of the above methods. For example, after electrophoresis, a 2DGE gel can be fixed with methanol and acetic acid, treated with AMPLIFY(copyright) (Amersham), and dried. The gel is then placed in contact with X-ray film and exposed. The gel can be exposed for multiple time periods to compensate for the lack of dynamic range of X-ray films. Each film image comprises a multiplicity of xe2x80x9cspotsxe2x80x9d of differing position, size, shape, and/or optical density. The spots on the image are analyzed to determine the correspondence between spots and proteins.
Manual visual inspection and analysis of gel images can be done under controlled conditions and within specific ranges (Jungblut et al., Quantitative analysis of two-dimensional electrophoretic protein patterns: Comparison of visual evaluation with computer-assisted evaluation. In: Neuhoff, V. (Ed.) Electrophoresis ""84, Verlag Chemie GmbH, Weinheim, 301-303 and Andersen et al., Diabetes, Vol. 44, April 1995, 400-407). Analysis of one film can take in excess of eight hours, even for one having significant skill and experience in this art. Typically, certain methods of obtaining images of the gels are non-linear (e.g., silver staining and/or X-ray film) and it is often necessary to make corrections for this if the full range of protein expression is to be covered (this can be up to a ratio of 1:100,000 for the protein of lowest to highest expression). Further, quantification with visual analysis is limited. Typically, visual analysis only detects changes in protein amounts of a factor greater than or equal to 2.
Various computer programs and computer evaluation systems have been developed to improve quantification and assist in evaluation of gel films, e.g., PDQUEST (Protein Database Inc., New York), BioImage (Millipore, Bedford, Mass.), Phoretix (Phoretix International, Newcastle, UK), and Kepler (Large Scale Biology Corporation, Rockville, Md.). To use a computer program such as BioImage (BioImage, Ann Arbor, Mich.), the image of the gel on the film is scanned into a computer. The digitized gel image is analyzed by the computer program. Each spot is assigned an intensity value, such as an integrated optical density percentage (IOD %), and a position on the gel, such as an xe2x80x9cX,Yxe2x80x9d Cartesian-type coordinate. Computer programs such as BioImage require the highest qualities in resolution and reproducibility of the spot position. Because the gel medium is so elastic, gel patterns are not identical, i.e., two gels, run under essentially identical conditions, will not have each protein spot located in exactly the same position. If two gels are run under conditions that are not essentially the same, then the variations in position of corresponding protein spots will be even greater.
Computer evaluation systems such as those described above have improved the quantification of spot intensities and IOD % for generation of a xe2x80x9cspot listxe2x80x9d for a gel image. However, computer evaluation systems such as those described above still require significant operator effort for editing. A gel image to be evaluated is input to a computer, such as by scanning. The digitized image is searched to locate spots having an intensity or optical density above a sensitivity threshold. The operator must then edit the gel image. For example, if two very big spots are close together, the computer may have identified the two spots as one elongated spot. The computer may not be able to resolve that there are actually two spots. The operator would then be required to manually edit the image to divide the spot into two spots. As another example, the computer may incorrectly identify as a protein spot a non-protein spot on the gel image, such as a high intensity streak. The operator would then be required to manually edit the image to delete the non-protein spot. It can take from six to eight hours for a skilled operator to edit a gel image evaluated using a conventional computer evaluation system.
As reported in Jungblut et al. (1995), numerous researchers have used conventional computer evaluation systems to produce 2DGE databases for various tissues or cell types. However, these systems require significant effort on the part of the operator to produce an accurate spot list for a new gel image. More importantly, conventional computer evaluation systems do not provide an analysis and/or interpretation tool that uses information from other gel images of the same cell type to allow an operator to quickly and/or efficiently analyze and/or interpret a new gel image. Conventional computer evaluation systems cannot be used to reliably detect proteins only present in small amounts. Thus, there is a need in the art for a computer-based analysis system that reduces the effort required by the operator, and increases the speed with which new gel images can be analyzed and/or interpreted. There is a further need in the art for a computer-based analysis system for analyzing and/or interpreting new gel images that uses information from other gel images of the same cell type.
Conventional computer evaluation systems also do not provide the full analysis tools for statistical comparison between groups of gel images. Thus, there is a further need in the art for a computer-based analysis system that is capable not only of analyzing and/or interpreting a new gel image, but also of executing statistical comparisons between various groups of gel images.
Citation of any document herein is not intended as an admission that such document is pertinent prior art, or considered material to the patentability of any claim of the present application. Any statement as to content or a date of any document is based on the information available to applicant at the time of filing and does not constitute an admission as to the correctness of such a statement.
The present invention is directed to methods, compounds, compositions and apparatus for analyzing images of electrophoresis gels. These images can be in black and white or in color (a colored image can have three grey scale ranges for the primary colors and can thus be analyzed in the same way as described below). For the purposes of this description only, one grey scale is considered although for one skilled in the art, there would be no difficulty to extend the description to the three primary colors, or combinations thereof.
Any type of image could be analyzed in the a manner described belowxe2x80x94although the method would be best suited for the analysis or comparison of any type of situation where two or more similar images are obtained, for example sequential satellite photographs of clouds or land surfaces to chart weather patterns or in geological surveying. Other examples would include serial sections of biopsies, serial scanning runs from CT or CAT devices in hospitals. In biotechnology, applications can include, but not be limited to, Northern, Southern or Western blots, 1-D gels and/or 2DGE gels. The present invention is described below with respect to analyzing gel electrophoresis images to identify proteins, and to compare gel images to identify changes in proteins or nucleic acids.
In one aspect of the invention, a method for analyzing images is provided. The method comprises one or more of the following steps, such as, but not limited to (1) at least one of (3), (4), (5), (6), (7), (8), (9), (10), (11) or (12):
(1) capturing a new image, wherein the new image contains a plurality of new image spots, each new image spot having a spot number, an integrated optical density percentage (IOD %) and a position;
(2) generating a master composite image for use in analyzing the new image, wherein the master composite image contains a plurality of master composite spots, each master composite spot having a spot number, an IOD % and a position;
(3) generating a master composite, wherein the master composite comprises the spot number, the IOD %, the position, the variability of the spot (for example the standard deviation expressed as a percentage) for the position and IOD %, and a saturation value (corresponding to the value of the maximum pixel intensity found in any of the spots (from the original images which were used to derive the spot in question) (this value is expressed as a fraction on a scale from white (0) to black (1)) for each of the plurality of master composite spots;
(4) aligning the new image with the master composite image;
(5) selecting a set of anchor points from the master composite;
(6) detecting new image spots that have a position that is within a position tolerance of the position of corresponding anchor points and that have an IOD % that is within an IOD % tolerance of the IOD % of corresponding anchor points, and matching the detected new image spots to the corresponding anchor points to form a set of matched new image spots;
(7) calculating a set of vectors linking spots of the same number in the master composite image and in the new gel image; and determining for each vector the length and angle;
(8) calculating a vector difference for each of the set of matched new image spots corresponding to the difference between the vector in question and the vectors originating from a number (for example about 2-500) of the nearest spots to the spot in question. This will generate a vector difference for each of the new matched new image spots and in a subsequent step, removing from the set of matched new image spots those matches for which the vector differences are greater than a predetermined percentage of the best (shortest length and numerically smaller angle) vector differences. A manner by which these vector differences can be used to quality check the alignment of the images and to guide the correction of mismatches in a reiterative manner until an optimal match is obtained;
(9) selecting a set of well-defined spots from the master composite, detecting new image spots that have a position that is within a position tolerance of the position of corresponding well-defined spots, matching the detected new image spots to the corresponding well-defined spots, and adding the matched new image spots to the set of matched new image spots;
(10) selecting a set of saturated spots from the master composite, detecting new image spots that have a position that is within a position tolerance of the position of corresponding saturated spots, matching the detected new image spots to the corresponding saturated spots, and adding the matched new image spots to the set of matched new image spots;
(11) selecting a set of weak spots from the master composite, detecting new image spots that have a position that is within a position tolerance of the position of corresponding weak spots, matching the detected new image spots to the corresponding weak spots, and adding the matched new image spots to the set of matched new image spots; and
(12) searching the new image outside the set of matched new image spots to locate unidentified new image spots.
In another aspect of the present invention, the method further includes comparing a first set of images to a second set of images.
In yet a further aspect of the present invention, the new image is aligned with the master composite image through the use of a common anchor point. Common anchor points correspond to spots present in both the new image and the master composite image. Anchor points selected from the master composite can include primary anchor points and secondary anchor points. Primary and secondary anchor points are obtained at different stages in the image processing using different selection criteria to select the master composite proteins to be used.
In still a further aspect of the present invention, well-defined spots have a saturation value S in the range of about 0.2 less than S less than 0.8. Saturated spots have a saturation value Sxe2x89xa70.8. Weak spots have a saturation value Sxe2x89xa6 about 0.2.
In still a further aspect of the present invention, a computer program product is provided that has computer program logic recorded thereon for enabling a processor in a computer system to analyze two-dimensional electrophoresis gel images of cell proteins. Such computer program logic includes the one or more of the following:
at least one algorithm, sub-routine, routine or image capturing means for enabling the processor to receive data representing a new gel image, wherein the new gel image contains a plurality of new gel image spots that correspond to proteins on the new gel image, each new gel image spot having an integrated optical density percentage (IOD %) and a position;
at least one algorithm, sub-routine, routine or master composite image generating means for enabling the processor to generate a master composite image for use in analyzing the new gel image, wherein the master composite image contains a plurality of master composite spots that correspond to proteins on the master composite image, each master composite spot having a spot number, an IOD % and a position;
at least one algorithm, sub-routine, routine or master composite generating means for enabling the processor to generate a master composite, wherein the master composite comprises a spot number, the position, the IOD %, and a saturation value (corresponding to the value of the maximum pixel intensity found in any of the spots (from the original images which were used to derive the spot in question) (this value is expressed as a fraction on a scale of from white (0) to black (1)) for each of the plurality of master composite spots;
at least one algorithm, sub-routine, routine or aligning means for enabling the processor to align the new gel image with the master composite image;
at least one algorithm, sub-routine, routine or selecting means for enabling the processor to select a set of anchor points from the master composite;
at least one algorithm, sub-routine, routine or anchor point detecting and matching means for enabling the processor to detect new gel image spots that have an IOD % that is within an IOD % tolerance of the IOD % of corresponding anchor points, and match the detected new gel image spots to the corresponding anchor points to form a set of matched new gel image spots;
at least one algorithm, sub-routine, routine or means for enabling the processor to calculate a set of vectors linking spots of the same number in the master composite image and in the new gel image; and determining for each vector the length and angle;
at least one algorithm, sub-routine, routine or vector differencing means for enabling the processor to calculate a vector difference for each of the set of matched new image spots corresponding to the difference between the vector in question and the vectors originating from a number (for example about 1-500) of the nearest spots to the spot in question. (This will generate a vector difference for each of the new matched new image spots and in a subsequent step, removing from the set of matched new images spots those matches for which the vector differences are greater than a predetermined percentage of the best (shortest length and/or numerically smallest angle) vector differences. For highly saturated spots (less than about  greater than 3-40 saturated pixels, less than about  greater than 9 pixels), the position of the spot center is determined using. e.g., the spot shape characteristics. Allowances can be made for cases where the saturation areas covers more than one spot and the saturation area is divided up into a number of areas corresponding to the number of spots existing in the MCI and spot centers can be calculated based on these allowances, as further known in the art or as described herein);
at least one algorithm, sub-routine, routine or means by which these vector differences can be used to quality check the alignment of the images and to guide the correction of mis-matches in a reiterative manner until an optimal match is obtained;
at least one algorithm, sub-routine, routine or means for enabling the processor to calculate how these vector differences can be used to quality check the alignment and to guide the correction of mismatches in a reiterative manner until an optimal match is obtained;
at least one algorithm, sub-routine, routine or well-defined spot detecting and matching means for enabling the processor to select a set of well-defined spots from the master composite, to detect new gel image spots that have a position that is within a position tolerance of the position of corresponding well-defined spots, to match the detected new gel image spots to the corresponding well-defined spots, and to add the matched new gel image spots to the set of matched new gel image spots;
at least one algorithm, sub-routine, routine or saturated spot detecting and matching means for enabling the processor to select a set of saturated spots from the master composite, to detect new gel image spots that have a position that is within a position tolerance of the position of corresponding saturated spots, to match the detected new gel image spots to the corresponding saturated spots, and to add the matched new gel image spots to the set of matched new gel image spots;
at least one algorithm, sub-routine, routine or weak spot detecting and matching means for enabling the processor to select a set of weak spots from the master composite, to detect new gel image spots that have a position that is within a position tolerance of the position of corresponding weak spots, to match the detected new gel image spots to the corresponding weak spots, and to add the matched new gel image spots to the set of matched new image spots;
at least one algorithm, sub-routine, routine or unidentified spot locating means for enabling the processor to search the new gel image outside the set of matched new gel image spots to locate unidentified new gel image spots; and
at least one algorithm, sub-routine, routine or comparing means for enabling the processor to compare a first set of gel images to a second set of gel images.
In still a further aspect of the present invention, the comparing means includes the following:
at least one algorithm, sub-routine, routine or first set analyzing means for enabling the processor to compute a statistical average of IOD % for each spot in the first set;
at least one algorithm, sub-routine, routine or second set analyzing means for enabling the processor to compute a statistical average of IOD % for each spot in the second set; and
at least one algorithm, sub-routine, routine or statistical average differencing means for enabling the processor to determine if each spot in the first set is statistically different from each spot in the second set.
In still a further aspect of the present invention, the comparing means provides the possibility to combine different exposures of the same gel or different gels of the same sample in different amounts in such a way that the effective dynamic range of the protein detection system (selected from the list given above) can be increased.
Features and Advantages
It is a feature of the present invention that it can analyze and/or interpret new gel images, and/or also conduct statistical comparisons between groups of gel images.
It is a further feature of the present invention that it uses information from a single gel (using default tolerance values) or a master composite image to guide the analysis and interpretation of new or additional gel images.
It is yet a further feature of the present invention that it uses the integrated optical density percentage, as well as the position, in locating spots in new gel images.
It is an advantage of the present invention that new gel images can be analyzed and/or interpreted with minimal input from an operator.
It is a further advantage of the present invention that new gel images can be analyzed and/or interpreted quickly and/or efficiently.
It is a still further advantage of the present invention that it can reliably detect proteins that are present in small amounts.
It is yet a further advantage of the present invention that it is not limited to analysis and/or interpretation of two-dimensional gel electrophoresis images, and can be used to compare any two similar images, whether black and white or color or in any situation where image interpretation and/or recognition is involved. This process could include the comparison of xe2x80x9cfreshly derived imagesxe2x80x9d from any image capture device with an image recovered from a computer memory device. In another embodiment, this could include the comparison of two sequential images (e.g., a video recording or real time video image) in order to identify the differences between the two.