Proteomics is capable of generating new hypotheses about the mechanisms underlying physiological changes. The perceived advantage of proteomics over gene-based global profiling approaches is that proteins are the most common effector molecules in cells. Changes in gene expression may not be reflected by changes in protein expression. See, Anderson, L. & Seilhammer, J. A. A comparison of selected mRNA and protein abundances in human liver. Electrophoresis 18, 533-537 (1997). See also, Gygi, S. P., Rochon, Y., Franza, B. R. & Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell Biol. 19, 1720-1730 (1999). However, the large number of amino acids and post-translational modifications make the complexity inherent in analyzing proteomics data greater than for genomics data.
Several methods have been developed for separating proteins extracted from cells for identification and analysis of differential expression. One of the most widely used is 2-dimensional gel electrophoresis (2DE). See Klose, J. Protein mapping by combined isoelectric focusing and electrophoresis of mouse tissues. A novel approach to testing for induced point mutations in mammals. Humangenetik 26, 231-243 (1975). See also, O'Farrell, P. H. High resolution two-dimensional electrophoresis of proteins. J. Biol Chem 250, 4007-4021 (1975). In this method, proteins are first separated in one direction by their isoelectric points, and then in a perpendicular direction by molecular weight. As 2DE-based proteomic studies have become more complex, one of the major challenges has been to develop efficient and effective methods for detecting, matching, and quantifying spots on large numbers of gel images. These steps extract the rich information contained in the gels, so are crucial to perform accurately if one is to make valid discoveries.
In current practice, the most commonly used spot detection and quantification approach involves three steps. First, a spot detection method is applied to each individual gel image to find all protein spots and draw their boundaries. Second, spots detected on individual gel images are matched to a master list of spots on a chosen reference gel image, requiring specification of vertical and horizontal tolerances since spots on different gel images are rarely perfectly aligned with one another. Third, “volumes” are computed for each spot on each gel image by summing all pixel values within the defined spot regions.
Unfortunately, these methods lack robustness. Errors are frequent and especially problematic for studies involving large numbers of gels. The errors consist of three main types, spot detection, spot matching, and spot boundary estimation errors. Detection errors include merging two spots into one, splitting a single spot into two, not detecting a spot, and mistaking artifacts for spots. Also, automatically detected spot boundaries can be inaccurate, increasing the variability of spot volume calculations. Matching errors occur when spots on different gel images are matched together but do not correspond to the same protein. These errors are pervasive and can obscure the identification of differential protein expression. Almeida, et al. list mismatched spots as one of the major sources of variability in 2DE, and Cutler, et al. identify the subjective nature of the editing required to correct these errors as a major problem. Almeida, J. S., Stanislaus, R., Krug, E. & Arthur, J. M. Normalization and analysis of residual variation in two-dimensional electrophoresis for quantitative differential proteomics. Proteomics 5, 1242-1249 (2005); Cutler, P., Heald, G., White, I. R. & Ruan, J. A novel approach to spot detection for two-dimensional gel electrophoresis images using pixel value collection. Proteomics 3, 392-401 (2003). Extensive hand editing is needed to correct these various errors and can be very time-consuming, taking 1 to 4 hours per gel image. Id. Taken together, these factors limit throughput and bring the objectivity and reproducibility of results into question. Also, one must decide what to do about missing values caused by spots that are matched across some, but not all gel images. A number of ad hoc strategies have been employed, but all have weaknesses and bias quantifications.