This invention relates to computer-assisted methods and apparatus for efficiently and systematically studying molecules that are present in biological samples and determining their role in health and disease. In particular, this invention relates to the emerging field of proteomics, which involves the systematic identification and characterization of proteins that are present in biological samples, including proteins that are glycosylated or that exhibit other post-translational modifications. The proteomics approach offers great advantages for identifying proteins that are useful for diagnosis, prognosis, or monitoring response to therapy and in identifying protein targets for the prevention and treatment of disease.
Recent advances in molecular genetics have revealed the benefits of high-throughput sequencing techniques and systematic strategies for studying nucleic acids expressed in a given cell or tissue. These advances have highlighted the need for operator-independent computer-mediated methods for identifying and selecting subsets or individual molecules from complex mixtures of proteins, oligosaccharides and other biomolecules and isolating such selected biomolecules for further analysis.
Strategies for target-driven drug discovery and rational drug design require identifying key cellular components, such as proteins, that are causally related to disease processes and the use of such components as targets for therapeutic intervention. However, present methods of analyzing biomolecules such as proteins are time consuming and expensive, and suffer from inefficiencies in detection, imaging, purification and analysis.
Though the genomics approach has advanced our understanding of the genetic basis of biological processes, it has significant limitations. First, the functions of products encoded by identified genesxe2x80x94and especially by partial CDNA sequencesxe2x80x94are frequently unknown. Second, information about post-translational modifications of a protein can rarely be deduced from a knowledge of its gene sequence, and it is now apparent that a large proportion of proteins undergo post-translational modifications (such as glycosylation and phosphorylation) that can profoundly influence their biochemical properties. Third, protein expression is often subject to post-translational control, so that the cellular level of an mRNA does not necessarily correlate with the expression level of its gene product.
Fourth, automated strategies for random sequencing of nucleic acids involve the analysis of large numbers of nucleic acid molecules prior to determining which, if any, show indicia of clinical or scientific significance.
For these reasons, there is a need to supplement genomic data by studying the patterns of protein and carbohydrate expression, and of post-translational modification generally, in a biological or disease process through direct analysis of proteins, oligosaccharides and other biomolecules. However, technical constraints have heretofore impeded the rapid, cost-effective, reproducible, systematic analysis of proteins and other biomolecules present in biological samples
The present invention is directed to efficient, computer-assisted methods and apparatus for identifying, selecting and characterizing biomolecules in a biological sample. According to the invention, a two-dimensional array is generated by separating biomolecules present in a complex mixture. The invention provides a computer-generated digital profile representing the identity and relative abundance of a plurality of biomolecules detected in the two-dimensional array, thereby permitting computer-mediated comparison of profiles from multiple biological samples. This automatable technology for screening biological samples and comparing their profiles permits rapid and efficient identification of individual biomolecules whose presence, absence or altered expression is associated with a disease or condition of interest. Such biomolecules are useful as therapeutic agents, as targets for therapeutic intervention, and as markers for diagnosis, prognosis, and evaluating response to treatment. This technology also permits rapid and efficient identification of sets of biomolecules whose pattern of expression is associated with a disease or condition of interest; such sets of biomolecules provide constellations of markers for diagnosis, prognosis, and evaluating response to treatment.
The high throughput, automatable methods and apparatus of the present invention further permit operator-independent selection of individual separated biomolecules (or subsets of separated biomolecules) according to pre-ordained criteria, without any requirement for knowledge of sequence information or other structural characteristics of the biomolecules. This in turn provides automated, operator-independent isolation and parallel characterization of a plurality of selected biomolecules detected in a biological sample. Thus, the present invention advantageously permits automated selection of biomblecules prior to sequencing or structural characterization. In one particular embodiment, the present invention provides a gel that is suitable for electrophoresis of biomolecules (such as proteins) and is bonded to a solid support such that the gel has two-dimensional spatial stability and the support is substantially non-interfering with respect to detection of a label associated with one or more biomolecules in the gel (e.g. a fluorescent label bound to one or more proteins). In another particular embodiment, the invention provides an integrated computer program that compares digital profiles to select one or more biomolecules detected in a two-dimensional array and generates instructions that direct a robotic device to isolate such selected biomolecules from the two dimensional array. In yet a further embodiment, the program also implements a laboratory information management system (LIMS) that tracks laboratory samples and associated data such as clinical data, operations performed on the samples, and data generated by analysis of the samples.