The disclosed invention is generally in the area of detection and profiling of proteins and peptides, and specifically in the area of microscale protein expression profiling.
The information content of the genome is carried as deoxyribonucleic acid (DNA). The size and composition of a given genomic sequence determines the form and function of the resultant organism. In general, genomic complexity is proportional to the complexity of the organism. Relatively simple organisms such as bacteria have genomes of about 1-5 megabases while mammalian genomes are approximately 3000 megabases. The genome is generally divided into distinct segments known as chromosomes. The bacterium Escherichia coli (E. coli) contains a single circular chromosome, whereas the human genome consists of 24 chromosomes.
Genomic DNA exists as a double-stranded polymer containing four DNA bases (A, G, C, and T) tethered to a sugar-phosphate backbone. The order of the bases along the DNA is the primary sequence of the DNA. The genome of an organism contains both protein coding and non-coding regions, including exons and introns, promoter and gene regulatory regions, and non-functional DNA. Genome analysis can provide a quantitative measure of gene copy number and chromosome number, as well as the presence of single base differences in the primary sequence of the DNA. Single base changes that are inherited are referred to as polymorphisms, whereas those that are acquired during the life of an organism are known as mutations. Genomic analysis at the DNA level does not provide a measure of gene expression (that is, the process by which RNA and protein copies of the coding sequences are synthesized).
All of the cells from a given organism are assumed to contain identical genomes, while genomes from different individuals of the same species are typically about 99.9% identical. The 0.1% polymorphism rate among individuals (Wang et al., Science 280: 1077 (1998)) is significant in that approximately three million polymorphisms are expected to be found upon complete sequencing of any two human genomes. If single base changes occur in protein coding segments, polymorphisms can alter the protein sequence and therefore change the biochemical activity of the protein.
The DNA genome consists of discrete functional regions known as genes. Genomes of simple organisms such as bacteria contain approximately 1000 genes (Fleischmann et al., Science 269: 496 (1995)), whereas the human genome is estimated to contain about 100,000 genes (Fields et al., Nature Genet. 7: 345 (1994)). Genomic analysis at the mRNA level can be used as a measure of gene expression. Expression levels for each gene are determined by a combination of genetic and environmental factors. The genetic factors include the precise DNA sequence of gene regulatory regions such as promoters, enhancers, and splice sites. Polymorphisms in the DNA are thus expected to contribute some of the differences in gene expression among individuals of the same species. Expression levels are also affected by environmental factors, including temperature, stress, light, and signals that lead to changes in the levels of hormones and other signaling substances. For this reason, RNA analysis provides information not only about the genetic potential of an organism, but also about changes in functional state (M. Schena and R. W. Davis, DNA Microarrays: A Praclical Approach. (Oxford University Press, New York, 1999) 1-16.)
The second step in gene expression is the synthesis of protein from mRNA. A unique protein is encoded by each mRNA, such that every three nucleotides of mRNA encodes one amino acid of the polypeptide chain, with the linear order of the nucleotides represented as a linear sequence of amino acids. Once synthesized, the protein assumes a unique three-dimensional conformation that is determined largely by the primary amino acid sequence. Proteins impart the functional instructions of the genome by performing a wide range of biochemical activities including roles in gene regulation, metabolism, cell structure, and DNA replication.
Individuals in a population may have differences in protein activity due to polymorphisms that either alter the primary amino acid sequence of the proteins or perturb steady state protein levels by altering gene expression. Similar to mRNA levels, protein levels can also change in response to changes in the environment; moreover, protein levels are also subject to translational and post-translational control which do not effect mRNA levels directly (Schena and David, 1999). Proteomics analysis provides data on when or if a predicted gene product is actually translated, the level and type of post-translational modification it may undergo and its relative concentration compared with other proteins (Humphrey-Smith and Blackstock, J. Protein. Chem. 16: 537-544 (1997)). After DNA is transcribed into mRNA, the exons may be spliced in different ways before being translated into proteins. Following the translation of mRNA by ribosomes, proteins are usually post-translationally modified by the addition of different chemical groups such as carbohydrate, lipid and phosphate groups, as well as through the proteolytic cleavage of specific peptide bonds. These chemical modifications are crucial to modulating protein function but are not directly coded for by genes. Furthermore, both mRNA and protein are continually being synthesized and degraded, and thus final levels of protein are not easily obtainable by measuring mRNA levels (Patton, J. Chromatogr. 722: 203-223, (1999); Patton et al., J. Biol. Chem. 270: 21404-21410 (1995)). So while mRNA levels are often extrapolated to indicate the levels of expressed proteins, it is not surprising that there is little correlation between the abundance of mRNA species and the actual amounts of proteins that they code for (Anderson and Seilhamer, Electrophoresis 18: 533-537; Gygi et al., Mol. Cell. Biol. 19: 1720-1730 (1999)).
A growing body of evidence suggests that changes in gene and protein expression may correlate with the onset of a given human disease (Schena and Davis, 1999). Proteomic analysis of disease tissues should allow the identification of proteins whose expression is altered in a given illness. Many small molecules may also alter protein expression at a global level. Combining information about altered expression in a disease state with the changes that result from treatment with a small molecule would provide valuable information about classes of molecules that may be effective in combating a given disease. Proteomics thus has a role in processes such as lead compound screening and optimization, toxicity, pharmacodynamics, and drug efficacy.
A pivotal component of proteomics is its ability to accurately quantify vast numbers of proteins accurately and reproducibly. Typically, proteomics entails the simultaneous separation of proteins from a biological sample, and the quantitation of the relative abundance of the proteins resolved during the separation. Proteomics currently relies heavily on two-dimensional (2-D) gel electrophoresis. However, obtaining information concerning global protein expression using 2-D gels is technically difficult, and semiautomated procedures to carry out this process are in their infancy (Patton, Biotechniques 28: 944-957 (2000)). Furthermore, the commonly used stains for evaluating protein expression in 2-D gels (such as Coomassie Blue, colloidal gold and silver stain) do not provide the requisite dynamic range to be effective in this capacity. These stains are linear over only a 10- to 40-fold range, whereas the abundance of individual proteins differs by as much as four orders of magnitude (Brush, The Scientist 12:16-22, 1998; Wirth and Romano J. Chromatogr 698: 123-143 (995)). In addition, low abundance proteins, such as transcription factors and kinases that arc present in 1-2000 copies per cell, often represent species that perform important regulatory functions. The accurate detection of such low-abundance proteins is an important challenge to proteomics. Methods have recently been introduced to directly quantify the relative abundance of proteins in two different samples by mass spectrometry. However, the linear dynamic range of these methods has been demonstrated over only a four- to ten- fold range (Gygi et al. 1999; Oda et al., Proc. Natl. Acad. Sci USA 96: 6591-6596 (1999)).
It has been noted recently that developing microarray technologies would make possible the simultaneous, ultra-sensitive measurement of hundreds or even thousands of substances in a small sample (Ekins, Clin. Chem. 44: 2015-2030 (1998)). This approach has been difficult to reduce to practice, however, because the extremely small volumes (about 0.5-5 nl) of sample used to create spots on these microarrays makes it necessary to utilize methods of analyte detection that are extremely sensitive. Rolling Circle Amplification (RCA) driven by DNA polymerase can replicate circular oligonucleotide probes with either linear or geometric kinetics under isothermal conditions (Lizardi et al., Nature Genet. 19: 225-232 (1998)). If a single primer is used, RCA generates in a few minutes a linear chain of hundreds or thousands of tandemly-linked DNA copies of a target which is covalently linked to that target. Generation of a linear amplification product permits both spatial resolution and accurate quantitation of a target. DNA generated by RCA can be labeled with fluorescent oligonucleotide tags that hybridize at multiple sites in the tandem DNA sequences. RCA can be used with fluorophore combinations designed for multiparametric color coding (Speicher et al., Nature Genet. 12:368-375 (1996)), thereby markedly increasing the number of targets that can be analyzed simultaneously. RCA technologies can be used in solution, in situ and in microarrays. In solid phase formats, detection and quantitation can be achieved at the level of single molecules (Lizardi et al., 1998).
It is therefore an object of the present invention to provide a method for detecting small quantities and concentrations of analytes.
It is a further object of the present invention to provide a method for detecting small quantities and concentrations of multiple analytes in samples.
It is a further object of the present invention to provide a method for amplifying the signal of an analyte to be detected.
It is a further object of the present invention to provide an automated method for detecting small quantities and concentrations of multiple analytes in samples.
It is a further object of the present invention to provide a method for profiling the presence of multiple analytes in a sample.
It is a further object of the present invention to provide a method for comparing profiles of the presence of multiple analytes in different samples.
It is a further object of the present invention to provide a method for assessing the interaction of compounds with molecules of interest.
It is a further object of the present invention to provide a method for detecting small quantities and concentrations of proteins and peptides.
It is a further object of the present invention to provide a method for detecting small quantities and concentrations of multiple proteins and peptides in samples.
It is a further object of the present invention to provide a method for amplifying the signal of a protein or peptide to be detected.
It is a further object of the present invention to provide an automated method for detecting small quantities and concentrations of multiple proteins and peptides in samples.
It is a further object of the present invention to provide a method for profiling the presence of multiple proteins and peptides in a sample.
It is a further object of the present invention to provide a method for comparing profiles of the presence of multiple proteins and peptides in different samples.
It is a further object of the present invention to provide a method for assessing the interaction of compounds with proteins and peptides of interest.
It is a further object of the present invention to provide compositions for detecting small quantities and concentrations of analytes.
It is a further object of the present invention to provide compositions for detecting small quantities and concentrations of proteins and peptides.
Disclosed are compositions and methods for detecting small quantities of analytes such as proteins and peptides. The method involves associating nucleic acid primer with the analyte and subsequently using the primer to mediate rolling circle replication of a circular DNA molecule. Amplification of the DNA circle is dependent on the presence of the primer. Thus, the disclosed method produces an amplified signal, via rolling circle amplification, from any analyte of interest. The amplification is isothermic and can result in the production of a large amount of nucleic acid from each primer. The amplified DNA remains associated with the analyte, via the primer, and so allows spatial detection of the analyte.
The disclosed method is preferably used to detect and analyze proteins and peptides. In preferred embodiments, multiple proteins can be analyzed using microarrays with which multiple different proteins or analytes are directly or indirectly associated (if they are present in the sample being tested). A rolling circle replication primer is then associated with the various proteins using a conjugate of the primer and a specific binding molecule, such as an antibody, that is specific for the protein to be detected. Rolling circle replication primed by the primers results in production of a large amount of DNA at the site in the array where the proteins are immobilized. The amplified DNA serves as a readily detectable signal for the proteins. Different proteins in the array can be distinguished in several ways. For example, the location of the amplified DNA can indicate the protein involved if different proteins are immobilized at pre-determined locations in the array. Alternatively, each different protein can be associated with a different rolling circle replication primer which in turn primes rolling circle replication of a different DNA circle. The result is distinctive amplified DNA for each different protein. The different amplified DNAs can be distinguished using any suitable sequence-based nucleic acid detection technique.
Another preferred embodiment of the disclosed method involves comparison of the proteins expressed in two or more different samples. The information generated is analogous to the type of information gathered in nucleic acid expression profiles. The disclosed method allows sensitive and accurate detection and quantitation of proteins expressed in any cell or tissue. The disclosed method also allows the same analyte(s) from different samples to be detected simultaneously in the same assay.