The disclosed processes and kits relate generally to the field of proteomics and molecular medicine, and more specifically to processes using mass spectrometry to determine the identity of a target polypeptide.
In recent years, the molecular biology of a number of human genetic diseases has been elucidated by the application of recombinant DNA technology.
More than 3000 diseases are known to be of genetic origin (Cooper and Krawczak, xe2x80x9cHuman Genome Mutationsxe2x80x9d (BIOS Publ. 1993)), including, for example, hemophilias, thalassemias, Duchenne muscular dystrophy, Huntington""s disease, Alzheimer""s disease and cystic fibrosis, as well as various cancers such as breast cancer. In addition to mutated genes that result in genetic disease, certain birth defects are the result of chromosomal abnormalities, including, for example, trisomy 21 (Down""s syndrome), trisomy 13 (Patau syndrome), trisomy 18 (Edward""s syndrome), monosomy X (Turner""s syndrome) and other sex chromosome aneuploidies such as Klinefelter""s syndrome (XXY).
Other genetic diseases are caused by an abnormal number of trinucleotide repeats in a gene. These diseases include Huntington""s disease, prostate cancer, spinal cerebellar ataxia 1 (SCA-1), Fragile X syndrome (Kremer et al., Science 252:1711-14 (1991); Fu et al., Cell 67:1047-58 (1991); Hirst et al., J. Med. Genet. 28:824-29 (1991)); myotonic dystrophy type I (Mahadevan et al., Science 255:1253-55 (1992); Brook et al., Cell 68:799-808 (1992)), Kennedy""s disease (also termed spinal and bulbar muscular atrophy (La Spada et al., Nature 352:77-79 (1991)), Machado-Joseph disease, and dentatorubral and pallidolyusian atrophy. The aberrant number of triplet repeats can be located in any region of a gene, including a coding region, a non-coding region of an exon, an intron, or a regulatory element such as a promoter. In certain of these diseases, for example, prostate cancer, the number of triplet repeats is positively correlated with prognosis of the disease.
Evidence indicates that amplification of a trinucleotide repeat is involved in the molecular pathology in each of the disorders listed above. Although some of these trinucleotide repeats appear to be in non-coding DNA, they clearly are involved with perturbations of genomic regions that ultimately affect gene expression. Perturbations of various dinucleotide and trinucleotide repeats resulting from somatic mutation in tumor cells also can affect gene expression or gene regulation.
Additional evidence indicates that certain DNA sequences predispose an individual to a number of other diseases, including diabetes, arteriosclerosis, obesity, various autoimmune diseases and cancers such as colorectal, breast, ovarian and lung cancer. Knowledge of the genetic lesion causing or contributing to a genetic disease allows one to predict whether a person has or is at risk of developing the disease or condition and also, at least in some cases, to determine the prognosis of the disease.
Numerous genes have polymorphic regions. Since individuals have any one of several allelic variants of a polymorphic region, each can be identified based on the type of allelic variants of polymorphic regions of genes. Such identification can be used, for example, for forensic purposes. In other situations, it is crucial to know the identity of allelic variants in an individual. For example, allelic differences in certain genes such as the major histocompatibility complex (MHC) genes are involved in graft rejection or graft versus host disease in bone marrow transplantation. Accordingly, it is highly desirable to develop rapid, sensitive, and accurate methods for determining the identity of allelic variants of polymorphic regions of genes or genetic lesions.
Several methods are used for identifying of allelic variants or genetic esions. For example, the identity of an allelic variant or the presence of a enetic lesion can be determined by comparing the mobility of an amplified ucleic acid fragment with a known standard by gel electrophoresis, or by hybridization with a probe that is complementary to the sequence to be identified. Identification, however, only can be accomplished if the nucleic acid fragment is labeled with a sensitive reporter function, for example, a radioactive (32P, 35S), fluorescent or chemiluminescent reporter. Radioactive labels can be hazardous and the signals they produce can decay substantially over time. Non-radioactive labels such as fluorescent labels can suffer from a lack of sensitivity and fading of the signal when high intensity lasers are used. Additionally, labeling, electrophoresis and subsequent detection are laborious, time-consuming and error-prone procedures. Electrophoresis is particularly error-prone, since the size or the molecular weight of the nucleic acid cannot be correlated directly to its mobility in the gel matrix because sequence specific effects, secondary structures and interactions with the gel matrix cause artifacts in its migration through the gel.
Mass spectrometry has been used for the sequence analysis of nucleic acids (see, for example, Schram, Mass Spectrometry of Nucleic Acid Components, Biomedical Applications of Mass SDectrometry 34:203-287 (1990); Crain, Mass Spectrom. Rev. 9:505-554 (1990); Murray, J. Mass SDectrom. Rev. 31:1203 (1996); Nordhoff et al., J. Mass Spectrom. 15:67 (1997)). In general, mass spectrometry provides a means of xe2x80x9cweighingxe2x80x9d individual molecules by ionizing the molecules in vacuo and making them xe2x80x9cflyxe2x80x9d by volatilization. Under the influence of electric and/or magnetic fields, the ions follow trajectories depending on their individual mass (m) and charge (z). For molecules with low molecular weight, mass spectrometry is part of the routine physical-organic repertoire for analysis and characterization of organic molecules by the determination of the mass of the parent molecular ion. In addition, by arranging collisions of this parent molecular ion with other particles such as argon atoms, the molecular ion is fragmented, forming secondary ions by collisionally activated dissociation (CAD); the fragmentation pattern/pathway very often allows the derivation of detailed structural information. Many applications of mass spectrometric methods are known in the art, particularly in the biosciences (see Meth. Enzymol., Vol. 193, xe2x80x9cMass Spectrometryxe2x80x9d (McCloskey, ed.; Academic Press, NY 1990; McLaffery et al., Acc. Chem. Res. 27:297-386 (1994); Chait and Kent, Science 257:1885-1894 (1992); Siuzdak, Proc. Natl. Acad. Sci., USA 91:11290-11297 (1994)), including methods for producing and analyzing biopolymer ladders (see, International PCT application No. WO 96/36732; U.S. Pat. No. 5,792,664). Despite the effort to apply mass spectrometry methods to the analysis of nucleic acid molecules, however, there are limitations, including physical and chemical properties of nucleic acids. Nucleic acids are very polar biopolymers that are difficult to volatilize.
Accordingly, a need exists for methods to determine the identity of a nucleic acid molecule, particularly genetic lesions in a nucleic acid molecule, using alternative methodologies. Therefore it is an object herein to provide processes and compositions that satisfy this need and provide additional advantages.
Processes and kits for determining the identity of a target polypeptide by mass spectrometry are provided. The processes include the steps of determining the molecular mass of a target polypeptide or a fragment or fragments thereof by mass spectrometry, and then comparing the mass to a standard, whereby the identity of the polypeptide can be ascertained. Identity includes, but is not limited to, identifying the sequence of the polypeptide, identifying a change in a sequence compared to a known polypeptide, and other means by which polypeptides and mutations thereof can be identified. Selection of the standard will be determined as a function of the information desired.
One process for determining the identity of a target polypeptide includes the steps of a) obtaining a target polypeptide; b) determining the molecular mass of the target polypeptide by mass spectrometry, and c) by comparing the molecular mass of the target polypeptide with the molecular mass of a corresponding known polypeptide. By comparing the molecular mass of the target with a known polypeptide having a known structure, the identity of the target polypeptide can be ascertained. As disclosed herein, the polypeptide is obtained by methods including transcribing a nucleic acid encoding the target polypeptide into RNA and translating the RNA into the target polypeptide. If desired, transcription of the nucleic acid or translation of the RNA, or both, can be performed in vitro.
A process as disclosed herein also can include a step of amplifying a nucleic acid encoding the target polypeptide prior to step a), for example, by performing the polymerase chain reaction (PCR) using a forward primer and a reverse primer. The forward primer or the reverse primer can contain an RNA polymerase promoter such as an SP6 promoter, T3 promoter, or T7 promoter. In addition, a primer can contain a nucleotide sequence for a transcription start site. A primer also can encode a translation START (ATG) codon. Accordingly, a target polypeptide can be translated from a nucleic acid that is not naturally transcribed or translated in vivo, for example, by incorporating a START codon in the nucleic acid to be translated, thereby providing a translation reading frame. Furthermore, a primer can contain a nucleotide sequence, or complement thereof, encoding a second peptide or polypeptide, for example, a tag peptide such as a myc epitope tag, a Haemophilus influenza hemagglutinin peptide tag, a polyhistidine sequence, a polylysine sequence or a polyarginine sequence. A process as disclosed herein can be performed in vivo, for example, in a host cell such as a bacterial host cell transformed with a nucleic acid encoding a target polypeptide or a eukaryotic host cell such as a mammalian cell transfected with a nucleic acid encoding a target polypeptide.
A process as disclosed is performed using a mass spectrometric analysis, including for example, matrix assisted laser desorption ionization (MALDI), continuous or pulsed electrospray ionization, ionspray, thermospray, or massive cluster impact mass spectrometry and a detection format such as linear time-of-flight (TOF), reflectron time-of-flight, single quadruple, multiple quadruple, single magnetic sector, multiple magnetic sector, Fourier transform ion cyclotron resonance, ion trap, and combinations thereof such as MALDI-TOF spectrometry. An advantage of using a process as provided is that no radioactive label is required. Another advantage is that relatively short polypeptides can be synthesized from a target nucleic acid, thus providing an accurate measurement of molecular weight by mass spectrometry, as compared to analysis of the nucleic acid itself.
An RNA molecule encoding a target polypeptide can be translated in a cell-free extract, which can be a eukaryotic cell-free extract such as a reticulocyte lysate, a wheat germ extract, or a combination thereof; or a prokaryotic cell-free extract, for example, a bacterial cell extract such as an E. coli S30 extract. If desired, translation and transcription of a target nucleic acid can be performed in the same cell-free extract, for example, a reticulocyte lysate or a prokaryotic cell extract.
A target polypeptide generally is isolated prior to being detected by mass spectrometric analysis. For example, the polypeptide can be isolated from a cell or tissue obtained from a subject such as a human. The target polypeptide can be isolated using a reagent that interacts specifically with the target polypeptide, for example, an antibody that interacts specifically with the target polypeptide, or the target polypeptide can be fused to a tag peptide and isolated using a reagent that interacts specifically with the tag peptide, for example, an antibody specific for the tag peptide. A reagent also can be another molecule that interacts specifically with the tag peptide, for example, metal ions such as nickel or cobalt ions, which interact specifically with a hexahistidine (His-6) tag peptide.
A target polypeptide can be immobilized to a solid support, such as a bead or a microchip, which can be a flat surface or a surface with structures made of essentially any material commonly used for fashioning such a device. A microchip is useful, for example, for attaching moieties in an addressable array. Immobilization of a target polypeptide provides a means to isolate the polypeptide, as well as a means to manipulate the isolated target polypeptide prior to mass spectrometry.
Methods are provided for sequencing an immobilized target polypeptide, including sequencing from the carboxyl terminus or from the amino terminus. Furthermore, methods of determining the identity of each of the target polypeptides in a plurality of target polypeptides by multiplexing are provided.
In particular embodiments, post translational capture and immobilization of a target polypeptide via a cleavable linker are provided in order to orthogonally sequence a polypeptide. These methods can include: 1) obtaining the target polypeptide; 2) immobilizing the target polypeptide to a solid surface; 3) treating the immobilized target polypeptide with an enzyme or chemical in a time dependent manner to generate a series of deleted fragments; 4) the cleaved polypeptide fragments are conditioned; 5) cleaving the linker and thereby releasing the immobilized fragments; 6) determining the mass of the release fragments; and 7) aligning the masses of each of the polypeptide fragments to determine the amino acid sequence. Variants of these methods in which one or more steps are combined or eliminated are also contemplated.
In one embodiment, the second step includes immobilizing the amino terminal portion of the polypeptide to a solid support via a photocleavable linker. In a more preferred embodiment, the solid support is activated as described in FIG. 2 and allowed to react with the amino group of a target polypeptide.
In another embodiment, the second step includes immobilizing the carboxy terminal portion of the polypeptide to a solid support via a photocleavable linker. In a more preferred embodiment, a photocleavable linker is a linker that can be cleaved from the solid support with light. In a more preferred embodiment, the solid support is activated as described in FIG. 3 and allowed to react with the carboxy group of a target polypeptide.
In another embodiment, the second step includes immobilizing either the carboxy or amino termini of group of different polypeptides to a solid support in an array format via a photocleavable linker. In a more preferred embodiment, discrete areas of a silicon surface are activated with the chemistry described din FIG. 2 and an array composed of from 2 to 999 positions.
In another embodiment, the second step includes immobilizing the amino terminal portion of the polypeptide to a solid support via a cleavable linker. In a more preferred embodiment, a cleavable linker is a silyl linker that can be cleaved from the solid support. In a more preferred embodiment, the solid support is activated as described in FIG. 2 and allowed to react with the amino group of a target polypeptide.
In another embodiment, the second step includes immobilizing the carboxy terminal portion of the polypeptide to a solid support via a cleavable linker. In a more preferred embodiment, a cleavable linker is a silyl linker that can be cleaved from the solid support. In a more preferred embodiment, the solid support is activated as described in FIG. 3 and allowed to react with the carboxy group of a target polypeptide.
In another embodiment, the second step includes immobilizing either the carboxy or the amino termini of a group of different polypeptides to a solid support in an array format via a cleavable linker. In a more preferred embodiment, discrete areas of a silicon surface are activated with the chemistry described in FIG. 2, thereby forming an array, preferably composed of from 2 to 999 positions.
In another embodiment, the third step includes immobilizing the amino terminal end of the target polypeptide(s) to the solid support and treating with an exopeptidase. In a preferred embodiment, exopeptidase digestion is carried out in a time dependent manner to generate a nested group of immobilized polypeptide fragments of varying lengths. In a more preferred embodiment, exopeptidase is selected from a group of one or more mono-peptidases and polypeptidases including carboxypeptidase Y, carboxpeptidase P, carboxypeptidase A, carboxypeptidase G and carboxypeptidase B.
In another embodiment, the exopeptidase is selected from a group of one or more mono-peptidases and polypeptidases including aminopeptidases including alanine aminopeptidase, leucine aminopeptidase, pyroglutamate peptidase, dipeptidyl peptidase, microsomal peptidase and other enzymes which progressively digest the-amino terminal end of a polypeptidase.
In another embodiment, the third step comprises a step where exopeptidase digestion is carried out under reaction conditions that remove any secondary or tertiary structure, leaving the terminal residues of the polypeptide inaccessible to exopeptidases. In a preferred embodiment, the reaction conditions expose the terminus of a target polypeptide(s) to temperatures over about 70xc2x0 C. and below about 100xc2x0 C. In a more preferred embodiment, the exopeptidase is a thermostable carboxypeptidase or aminopeptidase. In another preferred embodiment, the reaction conditions expose the terminus of a target polypeptide(s) to high ionic strength conditions. In a more preferred embodiment, the exopeptidase is a salt tolerant carboxypeptidase or aminopeptidase.
In another embodiment, the second step includes conditioning of polypeptide after enzymatic treatrnent or purification. In a more preferred embodiment, methods of conditioning include methods that prepare the polypeptide or polypeptide fragments in a manner that generally improves mass spectrometric analysis. In a more preferred embodiment, conditioning may include cation exchange.
Kits containing components useful for determining the identity of a target polypeptide based on a process as disclosed herein also are provided. Such a kit can contain, reagents for in vitro transcription and/or translation of the amplified nucleic acid to obtain the target polypeptide; optionally, a reagent for isolating the target polypeptide; and instructions for use in determining the identity of a target polypeptide by mass spectrometric analysis. The kits may also include, for example, forward or reverse primers capable of hybridizing to a nucleic acid encoding the target polypeptide and amplifying the nucleic acid. Such kits also can contain an organic or inorganic solvent, for example, a salt of ammonium, or a reagent system for volatilizing and ionizing the target polypeptide prior to mass spectrometric analysis. In addition, a kit can contain a control nucleic acid or polypeptide of known identity. A kit also can provide, for example, a solid support for immobilizing a target polypeptide, including, if desired, reagents for performing such immobilization. A kit further can contain reagents useful for manipulating a target polypeptide, for example, reagents for conditioning the target polypeptide prior to mass spectrometry or reagents for sequencing the polypeptide. A kit as disclosed herein is useful for performing the various disclosed processes and can be designed, for example, for use in determining the number of nucleotide repeats of a target nucleic acid or whether a target nucleic acid contains a different number of nucleotide repeats relative to a reference nucleic acid.
A target polypeptide can be encoded by an allelic variant of a polymorphic region of a gene of a subject, or can be encoded by an allelic variant of a polymorphic region that is located in a chromosomal region that is not in a gene. A process as disclosed herein can include a step of determining whether the allelic variant is identical to an allelic variant of a polymorphic region that is associated with a disease or condition, thereby indicating whether a subject has or is at risk of developing the disease or condition associated with the specific allelic variant of the polymorphic region of the gene. The disease or condition can be associated, for example, with an abnormal number of nucleotide repeats, for example, dinucleotide, trinucleotide, tetranucleotide or pentanucleotide repeats. Since trinucleotide repeats, for example, can be very long, determination of the number of trinucleotide repeats by analyzing the DNA directly would not be straightforward. Since a process for determining the identity of a target polypeptide as disclosed herein is based on the analysis of a polypeptide, particularly a polypeptide encoded essentially by trinucleotide repeats, determination of the number of trinucleotide repeats will be more accurate using the disclosed processes and kits. A disease or condition that can be identified using a disclosed process or kit includes, for example, Huntington""s disease, prostate cancer, Fragile X syndrome type A, myotonic dystrophy type I, Kennedy""s disease, Machado-Joseph disease, dentatorubral and pallidolyusian atrophy, and spino bulbar muscular atrophy; as well as aging, which can be identified by examining the number of nucleotide repeats in telomere nucleic acid from a subject. The disease or condition also can be associated with a gene such as genes encoding BRCA1, BRCA2, APC; a gene encoding dystrophin, xcex2-globin, Factor IX, Factor VIIc, ornithine-d-amino-transferase, hypoxanthine guanine phosphoribosyl transferase, or the cystic fibrosis transmembrane receptor (CFTR); or a proto-oncogene.
A process or a kit as disclosed herein can be used to genotype a subject by determining the identity of one or more allelic variants of one or more polymorphic regions in one or more genes or chromosomes of the subject. For example, the one or more genes can be associated with graft rejection and the process can be used to determine compatibility between a donor and a recipient of a graft. Such genes can be MHC genes, for example. Genotyping a subject using a process as provided herein can be used for forensic or identity testing purposes and the polymorphic regions can be present in mitochondrial genes or can be short tandem repeats.
A disclosed process or kit also can be used to determine whether a subject carries a pathogenic organism such as a virus, bacterium, fungus or protist. A process for determining the isotype of a pathogenic organism also is provided. Thus, depending on the sequence to be detected, the processes and kits disclosed herein can be used, for example, to diagnose a genetic disease or chromosomal abnormality; a predisposition to or an early indication of a gene influenced disease or condition, for example, obesity, atherosclerosis, diabetes or cancer; or an infection by a pathogenic organism, for example, a virus, bacterium, parasite or fungus; or to provide information relating to identity, heredity or compatibility using, for example, mini-satellite or micro-satellite sequences or HLA phenotyping.
A process as disclosed herein provides a means for determining the amino acid sequence of a polypeptide of interest. Such a process can be performed, for example, by using mass spectrometry to determine the identity of an amino acid residue released from the amino terminus or the carboxyl terminus of a polypeptide of interest. Such a process also can be performed, for example, by producing a nested set of carboxyl terminal or amino terminal deletion fragments of a polypeptide of interest, or peptide fragment thereof, and subjecting the nested set of deletion fragments to mass spectrometry, thereby determining the amino acid sequence of the polypeptide.
A process of determining the amino acid sequence of a polypeptide of interest can be performed, for example, using a polypeptide that is immobilized, reversibly, if desired, to a solid support. In addition, such a process can be performed on a plurality of such polypeptides, which can be, for example, a plurality of target polypeptides immobilized in an addressable array on a solid support such as a microchip, which can contain, for example, at least 2 positions, and as many as 999 positions, or 1096 positions, or 9999 positions, or more. In general, a target polypeptide, or the amino acids released therefrom, are conditioned prior to mass spectrometry, thereby increasing resolution of the mass spectrum. For example, a target polypeptide can be conditioned by mass modification. In addition, the amino acid sequences of a plurality of mass modified target polypeptide can be determined by mass spectrometry using a multiplexing format.