1. Introduction
The following description includes information that may be useful in understanding the present invention. It is not an admission that any such information is prior art, or relevant, to the presently claimed inventions, or that any publication specifically or implicitly referenced is prior art.
2. Background
Efficient, high fidelity detection and analysis of biomolecules (e.g., nucleic acids, proteins, carbohydrates, and lipids) represent a major challenge in biology. These challenges are particularly acute in the context of the analyzing biological samples, which by their nature are extremely complex, both in terms of the number of different molecular species present, as well as with regard to the numbers of molecules of the various particular species. Due to this complexity, extremely sensitive and selective methods are required in order to generate valid, reproducible results. Further complicating matters is the need to achieve such results in a commercially viable way, e.g., in terms of cost, time, etc.
The importance of adequately addressing these challenges is perhaps best considered in the context of the large-scale detection and analysis of nucleic acids, which store the genetic information of all living organisms (e.g., animals, plants, and microorganisms). Briefly, genetic information is generally encoded in deoxyribonucleic acid (DNA), although certain viruses comprise genomes made of ribonucleic acid (RNA). In humans, a complete haploid genome comprises about three billion nucleotides, and contains about 35,000 genes spread across 24 chromosomes (twenty two somatic chromosomes and two sex chromosomes). Naturally occurring DNA and RNA molecules are enzymatically synthesized as linear polymers of nucleotides, which differ from each other only in terms of the bases included particular nucleotides. In DNA, four different deoxyribonucleotides are found, designated “A”, “G”, “C”, and “T” due to the inclusion of an adenine, guanine, cytosine, or thymine base in the particular deoxyribonucleotide. Similarly, RNA is comprised of four different ribonucleotides, designated “A”, “G”, “C”, and “U” due to the inclusion of either an adenine, guanine, cytosine, or uracil base in the nucleotide. In nature, genomic DNA is typically double-stranded, with one DNA strand being hybridized to the other in an anti-parallel fashion according to canonical Watson-Crick base pairing, where the A's on one strand always hydrogen bond with T's on the other strand, and G's always pair with C's. The same base-pairing rules apply with RNA, except that in RNA, U replaces T and thus pairs with A (in either DNA or RNA).
In nature, the nucleotide sequence of a particular nucleic acid is not random, and it is the particular sequence of nucleotides that distinguishes one member of a species from another member of the same species, as well as one gene from another. Generally, each gene codes for a specific protein, although some genes ultimately encode several proteins due to differential splicing of messenger RNAs transcribed from the same gene. In any event, after a protein-encoding gene is expressed by transcription and translation, the encoded protein fulfills a specific function within a living cell.
It is known that for a given gene, or genetic locus, one or more different alleles may exist. Alleles for a given gene differ from one another by differences in the nucleotide sequence of each allele. Alleles of a given gene may arise from a substitution of one nucleotide for another at a given nucleotide position. Alternatively, allelic differences may be due to the insertion or deletion of one or more nucleotides in the different alleles. As a result of such differences in protein-encoding regions of a gene, the proteins encoded by the different alleles may differ in size and/or amino acid sequence. With regard to proteins that are enzymes, differences in amino acid sequence can result in differences in catalytic rates, substrate specificity, co-factor requirements, cellular localization, stability, pH optimums, etc., some or all of which may be relevant, for example, in the context of disease detection, prevention, and treatment (e.g., the suitability of administering a particular drug to a particular patient drug/protein interactions). On the other hand, if the difference(s) between alleles is(are) due to changes in a regulatory region of the gene, the level of expression of the proteins encoded by the particular alleles may differ, even markedly.
Changes in the nucleotide sequence of a genomic nucleic acid molecule occur as a result of mutations, where during replication copying of a template nucleic acid does not result in exact duplication of the template nucleic acid. Mutations can also occur during DNA repair, such that one or both strands of a DNA duplex differs in nucleotide sequence when compared before and after a repair reaction. As mentioned above, mutations during replication or repair include the deletion, insertion, and/or substitution of one or more nucleotides in one or both strands of a double-stranded DNA. Mutations that involve a substitution of one nucleotide for another (e.g., A for G) are termed “point mutations” since they occur at a particular nucleotide position. In protein coding regions, a point mutation can be a “missense” mutation, which results in a change in the amino acid encoded by the particular codon in which the mutation occurred; a “nonsense” mutation, where the change results in the codon changing from one that encodes an amino acid to one that codes for a stop codon and thereby leads to a truncated protein; or a silent mutation, which results in the codon coding for the same amino acid as before. Again, mutations can also occur in non-coding regions, as well. While such mutations do not alter the amino acid sequence of the protein encoded by the gene, they may affect regulation of the expression of the gene, the stability of the DNA or RNA molecule, etc.
Whether a particular mutation persists over time in the gene pool is determined by the process of natural selection, where changes that, over time, improve reproductive fitness survive, and those that do not disappear. Regardless of evolutionary effects and as noted above, mutations can result in proteins with altered, or, in some cases, even lost biochemical activities, which, in turn, can cause disease, an adverse reaction to a particular drug, etc. Similarly, mutations can cause aberrant regulation of gene expression, which can also lead to disease, altered drug sensitivity, etc. due the relative over- or under-abundance of one or more particular gene products.
Diseases caused by mutation, whether inherited or originating in the DNA of a particular subject, are said to be “genetic diseases” or the like. More than 4,000 genetic diseases are currently known to result from allelic differences, including hemophilias, thalassemias, Duchenne Muscular Dystrophy (DMD), Huntington's Disease (HD), Alzheimer's Disease, Cystic Fibrosis (CF), and sickle cell anemia. In addition to diseases caused by mutation that give rise a disease-associated alleles, genetic diseases can also be caused by larger genetic abnormalities, such as translocations, duplications, and deletions of some or all of a particular chromosome. Examples of such abnormalities include Trisomy 21 (the cause of Down's Syndrome), Trisomy 13 (which causes Patau Syndrome), Trisomy 18 (which causes Edward's Syndrome), Monosomy X (the cause of Turner's Syndrome), and other sex chromosome aneuploidies such as XXY (which causes Klinefelter's Syndrome). Further, it is known that certain DNA sequences predispose an individual to any of a number of diseases, such as diabetes, arteriosclerosis, obesity, various autoimmune diseases, and cancer (e.g., colorectal, breast, ovarian, lung, and prostate cancer), and as well can predict how a patient will respond to a particular drug (i.e., will s/he respond at all, and, if so, will the response be a positive or adverse reaction?). Genetic differences also have relevance in the area of organ and tissue transplantation, as a failure to “match” HLA (human leukocyte antigen) types can lead to organ or tissue rejection. Due to the genetic variation between individuals within a given species, DNA sequences can also serve as “fingerprints” to detect or identity different individuals, assess paternity or other aspects of relatedness among members of a species, etc.
Given the growing importance of nucleic acid analysis in a variety of fields, several methods for detecting and characterizing DNA have been developed. For example, nucleic acid sequences can be identified by comparing by gel electrophoresis the mobility of an amplified nucleic acid fragment with a known standard or by hybridization with a probe oligonucleotide that is complementary to the sequence to be identified. Detection, however, can only be accomplished if the nucleic acid fragment is labeled with a sensitive reporter function (e.g., a molecule that includes a radioactive isotope (e.g., 3H, 32P, or 35S) or that is fluorescent or chemiluminescent). Radioactive labels, however, can be hazardous, the signals they produce decay over time, and they require special disposal procedures. Non-isotopic labels (e.g., fluorescent labels) typically suffer from a lack of sensitivity and fading, particularly when high intensity lasers are used. Additionally, procedures that involve labeling, electrophoresis, and subsequent detection are laborious, time-consuming, and error-prone.
Mass spectrometry, on the other hand, allows individual molecules (e.g., nucleic acids, peptides, and proteins) to be “weighed” by ionizing the molecules in vacuo and making them “fly” by volatilization. Under the influence of combinations of electric and magnetic fields, the ions follow trajectories depending on their individual mass (m) and charge (z). Mass spectrometry has long been part of the routine physical-organic repertoire for analysis and characterization of low molecular weight organic molecules. Due to the analytical advantages of mass spectrometry in providing high detection sensitivity, accuracy of mass measurements, detailed structural information, and speed, as well as on-line data transfer to a computer, considerable effort has been devoted to the use of mass spectrometry for the structural analysis of nucleic acids. See, e.g., U.S. Pat. Nos. 6,706,530; 6,635,452; 6,602,662; 6,589,485; 6,569,385; 6,566,055; 6,558,902; 6,468,748; 6,436,635; 6,428,955; 6,300,076; 6,277,573; 6,268,144; 6,268,131; 6,258,538; 6,235,478; and 6,225,450. Today, advanced techniques for the ionization/desorption of samples containing large biomolecules such as polynucleotides have been developed, including electrospray/ionspray, and particularly, matrix-assisted laser desorption/ionization (MALDI). MALDI mass spectrometry typically uses a time-of-flight (TOF) configuration to analyze mass.
Another key advantage offered by mass spectrometry is that it provides a great ability to multiplex, i.e., it allows for many different molecules to be specifically and sensitively distinguished in a single analysis. Recently, systems that employ nonvolatile releasable tag molecules that contain releasable mass labels have been described. See, e.g., U.S. Pat. No. 6,635,452. In such systems, one or more detectable, nonvolatile mass label each specific for a particular target nucleic acid are released from probe molecules that specifically hybridize to particular nucleotide sequences. Mass spectrometry-based detection of a particular mass label thus provides indirect detection of the target molecule correlated with the particular mass label. Because of the sensitivity afforded by mass spectrometry, tens, hundreds, and even thousands of different probe species, each having a different releasable mass label, can be used in a single multiplexed reaction. Such systems, however, require the release of the detectable, nonvolatile mass labels from the probes. Thus, there remains the opportunity to develop other, perhaps even more efficient systems that allow for the simultaneous detection of a large number of different target biomolecules (e.g., nucleic acid molecules and/or proteins) in a biological sample. This will allow for the systematic, large-scale analysis of multiple target molecules with predetermined properties and/or functions.
3. Definitions
Before describing the instant invention in detail, several terms used in the context of the present invention will be defined. In addition to these terms, others are defined elsewhere in the specification, as necessary. Unless otherwise expressly defined herein, terms of art used in this specification will have their art-recognized meanings.
The term “allele” or “allelic variant” refers to alternative forms of a particular gene, and thus occupy the same locus or position on homologous chromosomes or extrachromosomal DNA. When a subject having a diploid genome has two identical alleles of a gene, the subject is said to be homozygous for the gene or allele. When a subject has two different alleles of a gene, the subject is said to be heterozygous for the gene. Alleles of a specific gene can differ from each other by one or more nucleotides, either or both in terms of number of nucleotides and/or nucleotide identity as specific nucleotide positions as a result of, for example, nucleotide substitutions, deletions, and/or insertions. Thus, an allele can also be a mutant form of a gene.
The term “amino acid” refers to naturally occurring and non-naturally occurring amino acids, as well as any modified amino acid that may be synthesized or, alternatively, obtained from a natural source.
An “amplicon” is a nucleic acid molecule generated in a nucleic acid amplification reaction, and which is derived from a target nucleic acid. An amplicon contains a target nucleic acid sequence that may be of the same or opposite sense as the target nucleic acid. An amplicon can also contain sequences not present in the nucleic acids from which the amplicon was derived.
An “amplification primer” or “primer” means an oligonucleotide capable of hybridizing to a primer binding site (i.e., a sequence of nucleobases complementary to the base sequence of the primer) and acting as a primer and/or a promoter template (e.g., for synthesis of a complementary strand, thereby forming a functional promoter sequence) for the initiation of nucleic acid synthesis. If the primer is designed to also encode a sequence to initiate RNA synthesis (e.g., a promoter), it is termed a “promoter-primer,” and it preferably contains, in addition to a region for hybridizing to a primer binding site, a base sequence that is non-complementary to the target nucleic acid but which is recognized by an RNA polymerase, such as a T7, T3, or SP6 RNA polymerase. An amplification primer may contain a 3′ terminus that is modified to prevent or lessen the rate or amount of primer extension (see, e.g., U.S. Pat. No. 5,766,849). Preferably, two or more different primers are used in amplification processes. A “universal” primer refers to a primer designed to hybridize to a primer binding site that is independent of the sequence to be amplified. As a result, universal primers are particularly useful in multiplex amplification reactions, wherein a number different target sequences can be amplified using a single pair of universal primers.
The terms “biological sample” refers to material obtained from any living (or formerly living) source (e.g., human, animal (e.g., mammals such as bovine, canine, equine, feline, ovine, and porcine animals, fish, birds, etc.), plant, bacteria, fungi, protist, or virus) and which contains one or more nucleic acids and/or populations of other target biomolecules. Biological samples can be made of solid materials (e.g., tissue, cell pellets, biopsies, etc.), or biological fluids (e.g., urine, blood, saliva, amniotic fluid, mouth wash, lymph, sweat, sputum, mucous, tears, etc.). Biological samples represent a sub-genus of “samples”, which can be any sample of material containing one or more target molecules that can be detected and/or analyzed using one or more target detection reagents according to the invention.
A “biomolecule” refers to a molecule that occurs naturally in a biological system (e.g., an organism). Representative classes of biomolecules include nucleic acids, proteins, peptides, antibodies, enzymes, carbohydrates, lipids, metals, and toxins. A “target” biomolecule is a biomolecule targeted by a target detection reagent of the invention.
The term “coding region” of a compomer template refers to a region that encodes a compomer or a cleavage substrate, as the case may be.
Two single stranded nucleic acid molecules are “complementary” when over at least a portion of their respective lengths there is a region of sufficient size (i.e., a number of nucleobase subunits, e.g., nucleotides) to allow sufficient hydrogen bonding between the two nucleic acids to stabilize a duplex formed by hybridization of the two nucleic acids. Thus, for the purpose of this invention, a first nucleic acid is deemed to be perfectly complementary to a second nucleic acid when each base in the first polynucleotide is paired with a complementary base in the second polynucleotide over the region of intended complementarity, which can include all or only a portion of either or both of the two nucleic acid molecules. As will be appreciated, two single-stranded nucleic acid molecules can also be less than perfectly complementary over the region of intended complementarity and still exhibit sufficient complementarity to allow hybridization between the nucleic acids under stringent hybridization conditions.
“Complement” is used as a synonym for a nucleic acid that is complementary to another nucleic acid.
A “compomer” is molecule synthesized in a target detection assay from a compomer template to indirectly indicate the presence of a particular target molecule in a sample being assayed. Compomers are comprised of one or more subunits. Particularly preferred subunits for compomer polymerization are nucleobase subunits.
A “compomer template” refers to that portion of a target detection reagent of the invention that encodes a compomer.
A compomer is said to be “correlated with” a target molecule when it is known beforehand that detection of a given compomer species means that the corresponding target molecule was present in the sample being assayed. Such a correlation is due to the design of the target detection reagent, as the target detection moiety is known to specifically react with the particular target. Thus, that specific interaction allows subsequent generation of the compomer encoded by the target detection reagent. As such, a target molecule's corresponding compomer species is/are said to be “correlated with” the particular target molecule, such that detection of a particular compomer indirectly indicates the presence of the corresponding target molecule in the sample under analysis.
A “contiguous span” of molecules refers to a region within a linear polymer wherein the molecules from which the polymer was synthesized are of the same type. For instance, a contiguous span of ribonucleotides refers to a polynucleotide (or portion thereof) wherein the nucleotides within the span are all ribonucleotides. Other nucleotides, such as deoxyribonucleotides, are not included in the contiguous span, although they may be included elsewhere in the polynucleotide if the polymer comprises more nucleotides than just the contiguous span of ribonucleotides.
A “defined characteristic” refers to known characteristic that allows one compomer species to be detected and distinguished from another. Defined characteristics include defined chemical compositions, defined masses, defined lengths, defined sizes, defined sequences, and defined structures. Having a “defined chemical composition” means that the identity of each base of the compomer is known. Having a “defined molecular formula” means that the number and identity of each atom comprising the molecule is known. As a result, the mass, or mass range (due to isotopic variation) of the molecule may also be defined, i.e., the molecule has a “defined mass”. For example, a specific molecular mass can be determined by summing the masses of the atoms represented in the molecule's chemical formula (e.g., C6H12O6). A “mass range” reflects the range of masses that molecules having the same chemical formula may have due to the inclusion of different isotopes. Having a “defined length” or “defined size” means that it is known how many subunits comprise a particular compomer. For example, a compomer that contains ten nucleotides is said to have a length of ten nucleotides. A “defined sequence” means that the compomer has a specific sequence of nucleobases, which sequence can be determined by any suitable technique (e.g., by hybridization, sequencing, etc.). A “defined structure” means that a compomer has a three-dimensional structure (e.g., an epitope) that can be recognized by a reagent (e.g., an antibody) specifically reactive with the structure. As will be appreciated, in some cases a compomer may be classified, and thus detected by, one or more different methods, each of which is based on analysis of a particular characteristic. For example, compomers comprised of nucleobase subunits will have defined chemical compositions, masses (or mass ranges), sequences, and lengths. Accordingly, they can be detected by a variety of elemental-, mass-, sequence-, and length-based detection methods. When appropriate detection systems are employed, compomers having a unique defined characteristic (e.g., a unique defined mass, chemical composition, etc.) may readily be distinguished from other compomer species.
A “gene” refers to a particular genetic locus, or region in a DNA molecule, that encodes a gene product (i.e., polypeptide or RNA molecule). In addition to the structural coding region(s), a gene may include non-coding regions, including, introns, transcribed but untranslated regions, and regulatory elements upstream and downstream of the coding regions. Depending on the context, a “gene” may optionally comprise sequence of nucleotides required for expression of the gene (e.g., promoters, enhancers, etc.).
The term “genotype” refers to the identity of the alleles for at least some of the genes in a subject's genome. “Genotyping” a sample refers to determining the specific allele or the specific nucleotide at a particular location carried by a subject (in all or only some of its cells). Thus, a genotype may refer to one or more specific alleles.
A “hybrid” or “duplex” refers to molecule comprised of two linear polymers hybridized over at least a portion of their respective lengths to form a stable hybrid or duplex molecule. In a hybrid, each linear polymer is comprised of nucleobase subunits. Examples of such polymers include single-stranded RNA and DNA molecules comprising naturally occurring and/or modified nucleobases and/or backbone chemistries. The double-stranded regions of hybrids are sufficiently stable such that they can be maintained for the desired purpose or manipulation, for example, to serve as a primer that can be catalytically extended, such that duplexes can be separated from single-stranded molecules, if desired, etc.
“Hybridization” refers to the ability of two completely or partially complementary nucleic acid strands to come together under specified hybridization assay conditions in a parallel or preferably antiparallel orientation to form a stable structure having a double-stranded region. The two constituent strands of this double-stranded structure, sometimes called a hybrid or duplex, are held together by hydrogen bonds. Although these hydrogen bonds most commonly form between nucleotides containing the bases adenine and thymine or uracil (A and T or U) or cytosine and guanine (C and G) on single nucleic acid strands, base pairing can also form between bases which are not members of these “canonical” pairs, as is known in the art.
The term “isotopically defined” refers to a population of molecules of the same chemical formula wherein one or more of the atomic species that comprise the molecules have a more restricted isotopic distribution (due to isotopic enrichment or depletion) than occurs in nature. For example, carbon typically has several naturally occurring isotopes (e.g., 12C6, 13C6, and 14C6), each of which has a different number of neutrons (6, 7, and 8, respectively). When referring to isotopes of a particular element, the formula “AXZ” is used, where “X” is the chemical symbol for the atom, “Z” is the atomic number (equal to the number of protons in one atom of the element), and “A” is the number of protons and neutrons combined for the particular isotope. The relative abundances for some of the naturally occurring isotopes of C, H, N, and O have been reported (see, e.g., Bievre and Taylor (1993), Int. J. Mass. Spectrom. Ion Phys., vol. 123:149). For carbon, the relative abundances (expressed as a percentage) of the 12C6 and 13C6 isotopes are 98.90 and 1.10, respectively. For hydrogen, the relative abundances of the 1H1 and 2H1 isotopes are 99.985 and 0.015, respectively. The relative abundances of 14N7 and 15N7 isotopes of nitrogen are 99.634 and 0.366, respectively, whereas the oxygen isotopes 16O8, 17O8, and 18O8 have a relative abundance of 99.762, 0.038, and 0.200, respectively. From the foregoing, a population of molecules of a particular species (e.g., a nucleoside such as adenosine) would be isotopically defined with respect to carbon if the relative abundance of the carbon atom isotopes 12C6 and 13C6 in the population were 99.90 and 0.10, respectively. Thus, for molecules comprised of several atomic species one or more of which has more than one naturally occurring isotope, it may be desirable to synthesize the molecule using atoms wherein the most prevalent isotope is enriched, i.e., more of it is present in relative terms as compared to the less prevalent isotope(s) of that element, or a less (or least) prevalent isotope is depleted. Methods for isotopic enrichment and depletion are known in the art.
A “label” refers to a molecule that allows a molecule attached to the label to be detected by a direct or indirect method. Here, “direct” detection refers to detection methods that do not require the interaction of another molecule with the label moiety for detection. Labels that can be directly detected include radioisotopes, luminescent molecules, fluorescent molecules, and other molecules whose presence can be detected directly. “Indirect” detection refers to methods that require one or more other molecules to interact with the label moiety in order detection to occur. Labels that can be indirectly detected include one member of a high affinity binding pair (e.g., one of biotin and streptavidin, and antigen and one or more antibodies (or antibody fragments) specific therefore, etc.)
A “library” refers to a collection of two or more different molecular species. In the context of compomers, a library comprises a plurality of different compomer species. Typically, each compomer species correlates with a different target molecule, it being understood that a “different target molecule” can mean genetic or structural variants of the same molecule (e.g., a gene or polypeptide) as well as target molecules that are different genes or polypeptides encoded by different genes. In the context of target detection reagents, a library comprises two or more different target reagent species. In any event, one member of a library differs from another due to differences in target binding moieties and/or compomer templates.
In the context of this invention, the terms “multiplex”, “multiplexing”, and the like refer to the ability to detect and/or analyze multiple target biomolecule species in a single assay. For example, a plurality of different target detection reagents, each specific for a different species of target biomolecule, can be used to analyze a biological sample in a single assay. If some or all of the targeted biomolecule species are present in the sample, the results of the assay will so indicate. Thus, multiplexing greatly increases assay efficiency. Typically, multiplexing allows for the analysis of more than about 10, preferably more than about 50, 100, 250, 500, or 1,000, and even more preferably more than 1,000, different species of target biomolecules in a single assay. Of course, the number of target molecule species that can be detected in a given multiplexed assay will depend on such factors as, for example, the chemical composition of the compomers encoded by the various target detection reagents employed, the type of detector used, the sensitivity of the detector, etc.
The term “mutated gene” refers to an allelic form of a gene that is capable of altering the phenotype of a subject having the mutation relative to a subject that does not have the mutated gene. If a subject must be homozygous for this mutation to have an altered phenotype, the mutation is said to be recessive. If one copy of the mutated gene is sufficient to alter the phenotype of the subject, the mutation is said to be dominant. If a subject has one copy of the mutated gene and has a phenotype that is intermediate between that of a homozygous subject and a heterozygous subject (for that gene), the mutation is said to be co-dominant. The term “mutation” as used herein refers to a difference in nucleotide sequence at a particular genetic location (e.g., nucleotide position in a gene) between or among different genomes or individuals that has a frequency below 1%.
Herein, the term “nucleic acid” refers to double- or single-stranded polymeric molecules made from naturally-occurring ribo- and deoxyribonucleotides (e.g., RNA, mRNA, rRNA, tRNA, small nuclear RNAs, DNA, cDNA, and RNA/DNA copolymers), as well as modified/non-natural nucleic acids, often known as nucleic acid mimics. Examples of nucleic acid mimics include those having phosphodiester modifications or replacements, including phosphorothioate, methylphosphonate, boranophosphate, amide, ester, and ether inter-subunit linkages, as well as complete subunit replacements with molecules such cleavage linkages (e.g., photocleavable nitrophenyl moieties) and nucleobase subunits other than nucleosides and nucleotides. A “target” nucleic acid is a nucleic acid containing a target nucleic acid sequence.
A “nucleotide sequence” refers generally to the linear sequences of nucleobases that comprise a particular nucleic acid molecule. Unless otherwise indicated, nucleotide sequences are written 5′ to 3′. A “target” nucleotide sequence refers to a particular portion of the nucleotide sequence of a nucleic acid molecule present in a sample that is targeted by, and is thus substantially complementary with, the oligonucleotide portion of the corresponding target detection reagent.
“Nucleic acid amplification” refers to method for increasing the number of particular nucleic acid molecules. Nucleic acid amplification according to the present invention may be either linear or exponential, although exponential amplification is preferred.
A “nucleobase” refers to a base (i.e., a purine or a pyrimidine) capable of forming hydrogen bonds with a complementary base to form a base pair. Bases include adenine (“A”), cytosine (“C”), guanine (“G”), hypoxanthine, orotic acid, thymine (“T”), uracil (“U”), and xanthine. Base pairs include the canonical Watson-Crick DNA base pairs A:T, T:A, G:C, C:G, and in RNA, U replaces T. A “nucleobase subunit” refers to a particular monomeric subunit of a linear polymer, wherein the subunit comprises a nucleobase linked to a scaffold that permits subunit polymerization such that the resulting single-stranded polymer presents the nucleobases therein oriented such that the polymer can form a stable, double-stranded hybrid with a complementary nucleic acid molecule (e.g., a naturally occurring target nucleic acid molecule in a biological sample). Nucleosides and nucleotides represent preferred examples of nucleobase subunits useful in practicing the invention. Nucleobases may also be modified to include one or more molecules of known chemical composition in order to provide for mass modification. Such mass-modifying moieties are termed “mass tags”, and the resulting mass-modified nucleobases, or nucleobase subunits, are termed “mass-tagged nucleobases” and “mass-tagged nucleobase subunits”, respectively.
A “nucleoside” is a molecule that comprises a purine or pyrimidine base attached to a sugar moiety (e.g., a β-D-ribose or a β-D-2-deoxyribose) via an N-glycosidic linkage between the C-1 of the sugar and the N-9 (in the case of pyrimidine bases) or N−1 (in the case of purine bases). The sugar moiety is 2′-deoxyribose in the case of a deoxyribonucleotides and a ribose moiety in the case of a ribonucleotide. Analogs of deoxyribose and ribose can also be used, including 2′,3′-deoxy as well as a vast array of other nucleotide mimics that are well-known in the art. Mimics include chain-terminating nucleotides, such as 3′-O-methyl, halogenated base or sugar substitutions; alternative sugar structures including non-sugar, alkyl ring structures. Representative examples of nucleosides include adenosine, cytidine, guanosine, inosine, orotidine, thymidine, uridine, and xanthosine. A “nucleoside subunit” refers to a particular nucleoside of a polynucleotide.
A “nucleotide” refers to a nucleoside having one or more phosphate groups esterified to the 5′-carbon atom of its sugar moiety. Nucleotides may either be naturally occurring or synthetic. Representative examples of nucleotides useful in the practice of the invention include adenosine mono-, di-, and tri-phosphate; cytidine mono-, di-, and tri-phosphate; guanosine mono-, di-, and tri-phosphate; inosine; orotidine; thymidine mono- and tri-phosphate; uridine mono-, di-, and tri-phosphate; and xanthosine.
An “oligonucleotide” is a polymer made up of two or more nucleoside and/or nucleobase subunits coupled together, for example, by the polymerization of nucleotides. An oligonucleotide may be comprised of nucleobase subunits that include, for example, nucleobases found in DNA and/or RNA and analogs thereof. When the nucleobase subunits are nucleosides, the sugar groups of the nucleoside subunits may be ribose, deoxyribose, or analogs thereof, including, for example, ribonucleosides having a 2′-O-methyl substitution to the ribofuranosyl moiety. The nucleobase subunits may by joined by linkages such as phosphodiester linkages, modified linkages, or by linkages between non-nucleotide moieties which do not prevent hybridization of the oligonucleotide to its complementary target nucleic acid sequence. Modified linkages include those linkages in which a standard phosphodiester linkage is replaced with a different linkage, such as a phosphorothioate linkage or a methylphosphonate linkage. The nucleobase subunits may be joined, for example, by replacing the natural deoxyribose phosphate backbone of DNA with a pseudo-peptide backbone, such as a 2-aminoethylglycine backbone that couples the nucleobase subunits by means of a carboxymethyl linker to the central secondary amine. DNA analogs having a pseudo-peptide backbone are commonly referred to as “peptide nucleic acids” or “PNAs” (see, e.g., U.S. Pat. No. 5,539,082. Other non-limiting examples of oligonucleotides or oligomers contemplated by the present invention include nucleic acid analogs containing bicyclic and tricyclic nucleoside and nucleotide analogs referred to as “locked nucleic acids,” “locked nucleoside analogues,” or “LNAs” (see, e.g., U.S. Pat. No. 6,083,482). Any nucleic acid analog is contemplated by the present invention, provided that the modified oligonucleotide can hybridize to a target nucleic acid under stringent hybridization assay conditions or amplification conditions. Oligonucleotides having a defined sequence of nucleobase subunits may be produced by techniques known to those of ordinary skill in the art, such as by chemical synthesis or other suitable methods.
An oligonucleotide is “substantially complementary” to its corresponding target nuclei acid molecule when it contains at least 6, and preferably at least 8, 9, 10, 11, 12, 13, 14, 15, or more contiguous nucleobases that are at least 80% complementary, preferably at least 90% complementary, and most preferably 100% complementary, to a contiguous span of nucleotides in the corresponding target nucleic acid. Those skilled in the art will readily appreciate modifications that could be made to the hybridization assay conditions at various percentages of complementarity to permit hybridization of the oligonucleotide to the target sequence while preventing unacceptable levels of non-specific hybridization. The degree of complementarity is determined by comparing the order of nucleobases making up the two regions over which complementarity is being compared, and does not take into consideration other structural differences which may exist between the two nucleic acids, provided the structural differences do not prevent hydrogen bonding between complementary bases. The degree of complementarity between two nucleic acids can also be expressed in terms of the number of nucleobase mismatches present in the regions being compared, which may range from 0 to 4, preferably 0 to 2, nucleobase mismatches.
A “patentable” composition, process, machine, or article of manufacture according to the invention means that the subject matter satisfies all statutory requirements for patentability at the time the analysis is performed. For example, with regard to novelty, non-obviousness, or the like, if later investigation reveals that one or more claims encompass one or more embodiments that would negate novelty, non-obviousness, etc., the claim(s), being limited by definition to “patentable” embodiments, specifically excludes the unpatentable embodiment(s). Also, the claims appended hereto are to be interpreted both to provide the broadest reasonable scope, as well as to preserve their validity. Furthermore, if one or more of the statutory requirements for patentability are amended or if the standards change for assessing whether a particular statutory requirement for patentability is satisfied from the time this application is filed or issues as a patent to a time the validity of one or more of the appended claims is again analyzed, the claims are to be interpreted in a way that (1) preserves their validity and (2) provides the broadest reasonable interpretation under the circumstances.
A “plurality” means more than one.
The term “polymorphism” refers to the occurrence of two or more alternative genomic sequences or alleles between or among different genomes or individuals. Thus, “polymorphic” refers to the coexistence of more than one form of a gene or portion (e.g., allelic variant) thereof. A portion of a gene of which there are at least two different forms, i.e., two different nucleotide sequences, is referred to as a “polymorphic region” of a gene. A polymorphic region can comprise as little as a single nucleotide, the identity of which differs in different alleles. A “single nucleotide polymorphism” or “SNP” is a single base pair change. Typically, a single nucleotide polymorphism occurs as the result of a replacement of one nucleotide by another nucleotide at the polymorphic site. Deletion or insertion of a single nucleotide can also give rise to single nucleotide polymorphisms. A polymorphic region can also involve multiple contiguous nucleotides, as in substitutions, rearrangements, insertions, and deletions of several nucleotides, although these polymorphisms are less common
A “polynucleotide” refers generally to a linear polymer of nucleotides, although if the polymer contains one or more nucleobase subunits other than a nucleotide or nucleoside, for purposes of the invention it shall still be considered a polynucleotide. Preferred polynucleotides are those in which the various subunits are linked by internucleotide 5′-3′ phosphodiester linkages. Polynucleotides include single- and double-stranded DNA and RNA molecules, including those where one or both strands are generated recombinantly or synthetically.
A “polypeptide” refers to a molecule comprising a polymer of amino acid residues (which include native and non-native amino acid residues). Thus, polypeptides include peptides and proteins, including native and engineered proteins, enzymes, antibodies, antibody fragments, and protein conjugates. In preferred embodiments, polypeptides are antibodies, antibody fragments, enzymes, receptors, receptor ligands, regulatory proteins, nucleic acid-binding proteins, hormones, or protein product of a display method, such as a phage display method or a bacterial display method.
The term “preferentially hybridize” means that under stringent hybridization assay conditions, complementary nucleic acids (or complementary portions of nucleic acids that also contain non-complementary portions) hybridize to form stable hybrids. Preferential hybridization can be measured using standard techniques. Preferably, there is at least a about 10-fold difference in hybridization between one nucleic acid species and its complementary nucleic acid, as compared with a non-complementary nucleic acid, more preferably at least about a 100-fold difference, and most preferably at least about a 1,000-fold difference. Preferably, the reaction conditions are such that hybridization between non-complementary nucleic acids in a test sample is no more than the background signal level.
A “probe” refers to a molecule that minimally comprises at least one target binding moiety. Probes may thus comprise two or more target binding moieties that may be linked to form the probe. For example, a particular probe may comprise two oligonucleotides which, when hybridized to their respective target molecules, become juxtaposed such that they can be linked (e.g., ligated) to form a complete probe molecule. Probes (or their constituent parts) may also contain other components, including labels and tags. Tags serve as moieties that allow the molecules to which they are attached to be isolated from other molecules present in a mixture (e.g., a solution).
A “promoter” means the minimal DNA sequence sufficient to direct transcription of a polypeptide encoded by a DNA molecule to which the promoter is operably linked, i.e., there is a functional linkage between the promoter and the coding sequence (e.g., a compomer-encoding region) such that the coding sequence can be transcribed by an RNA polymerase. In general a “promoter” refers to a variety of nucleic acid control sequences that can direct transcription of a nucleic acid. As used herein, a promoter includes the necessary nucleic acid sequences for RNA polymerase binding, transcription initiation, and elongation. Promoters can be either prokaryotic or eukaryotic in origin, with bacteriophage promoters such as the T7, T3, and SP6 promoters being preferred. Eukaryotic promoters include, among others, promoters from CMV, SV40, retroviruses, and adenoviruses. A promoter also optionally includes distal enhancer or repressor elements that can be located as much as several thousand base pairs from the start site of transcription. Promoters also include “consensus” promoters, which do not naturally but can be designed, for example, by comparing the promoter sequences of genes transcribed at high levels to develop a promoter sequence that reflects a “consensus” base (typically the nucleotide most frequently represented at the particular nucleotide position among the sequences being compared) at least one, preferably some, and most preferably all, of the nucleobase subunits comprising the promoter.
The term “reacting conditions” means reaction conditions that permit molecules that specifically interact with each other to preferentially interact. Reacting conditions include temperature, solute concentrations, pH, ionic conditions, etc. Stringent hybridization conditions are representative reacting conditions in the context of nucleic acid hybridization.
The term “reactive group” refers to a chemical moiety of a larger molecule that is capable of a reacting with a reactive group of another molecule using a specific chemistry.
The terms “separated”, “purified”, “isolated”, and the like mean that one or more components of a sample contained in a sample-holding vessel are or have been physically removed from, or diluted in the presence of, one or more other sample components present in the vessel. Sample components that may be removed or diluted during a separating or purifying step include, proteins, carbohydrates, lipids, inhibitors, non-target nucleic acids, and unbound probe molecules. With target capture procedures, target nucleic acids bound to immobilized capture probes are preferably retained in the sample during the separating or purifying step.
The term “species” is used herein in various contexts, e.g., compomer species, target molecule species, nucleotide species, etc. In each context, the term refers to a population of chemically indistinct molecules of the sort referred in the particular context. For example, a “compomer species” is a population of compomers having the same chemical composition, and thus effectively the same mass. Of course, due to the occurrence of isotopic variation in molecules having an identical chemical structure, molecules within a given species may have slightly different masses, and thus the “mass” for a given molecular species (e.g., a compomer) in fact represents a small mass range. Depending on factors such as the level of multiplexing in a given assay, the sensitivity of the analytical system being used, etc. it may be desired to synthesize compomers from isotopically defined subunits (e.g., ribonucleotide triphosphates) to more tightly define the small mass range of a particular compomer and thereby enhance the resolution of mass peaks that appear in spectra resulting from analysis of the sample.
Herein, “stable” refers to an interaction between two molecules (e.g., the strands of a nucleic acid duplex over their regions of complementarity) that is sufficiently stable such that the molecules can be maintained for the desired purpose or manipulation. For example, a “stable” interaction between a primer and its cognate primer binding site refers to one that will allow the primer to be extended under reaction conditions suited for primer extension reactions.
The phrases “stringent hybridization assay conditions,” “hybridization assay conditions,” “stringent hybridization conditions,” “stringent conditions”, and the like mean reaction conditions that permit complementary nucleic acids (e.g., an oligonucleotide, or a target sequence binding region of an oligonucleotide that further comprises other regions, and a nucleic acid having a base sequence complementary thereto) to preferentially hybridize. Stringent hybridization assay conditions may vary depending upon various factors, including the GC content and length of the regions of complementarity between the nucleic acids, the degree of similarity between the complementary sequences and other sequences that may be present in the sample. Hybridization conditions include the temperature and the composition of the hybridization reagents or solutions.
A “subunit” refers to a portion of a larger molecule. Thus, a polymer is comprised of two or more subunits. Exemplary subunits include individual amino acids, nucleobase subunits, nucleosides in a DNA or RNA and individual nucleotides used to synthesize a nucleic acid or oligonucleotide, as well as subunit multimers (e.g., molecules that comprise two, three, four, or more subunits, e.g., nucleotides) that can be used, for example, as intermediates in oligonucleotide or peptide synthesis. In other contexts, if an oligonucleotide contains two distinct regions, e.g., a target binding moiety and a compomer template, each of the distinct regions may be referred to as a subunit of the oligonucleotide.
A “tag” is a moiety that can be attached to or included as part of another molecule to facilitate separation of tagged molecules from non-tagged molecules in an assay. Representative examples of molecules that may be tagged include target detection reagents, cleavage substrates, and compomers.
A “target binding moiety” refers to a molecule capable of specific molecular recognition. Molecules capable of specific molecular recognition are capable of specific binding interactions with other molecules. In particular, a target binding moiety is the portion of a target detection reagent according to the invention that is capable of specifically interacting with and binding to a target molecule. Preferred target binding moieties are comprised of polynucleotides (e.g., oligonucleotides) and polypeptides (e.g., antibodies and antibody fragments), as well as aptamers (i.e., synthetic nucleic acid molecules that specifically bind to or otherwise interact with other molecules, including proteins and small molecules), and small molecules (i.e., naturally occurring or synthetic organic molecules having a molecular mass of less than about 10,000 Da that specifically bind to or otherwise interact with a biomolecule species of interest, for example, a target protein).
The term “target molecule” or “target” refers to a molecule the presence, absence, or abundance of which is to be determined. Preferred targets are biomolecules, including polypeptides and nucleic acid molecules.
A “target nucleic acid” refers to a nucleic acid molecule containing a target nucleic acid sequence, which sequence is typically comprised of nucleotides. Target nucleic acids can be single or double-stranded. In double-stranded molecules, the strands are preferably separated over at least that portion including the target nucleotide sequence in order to facilitate hybridization of target binding moiety of a target detection reagent specific for the particular target nucleotide sequence.
By “target nucleic acid sequence,” “target nucleotide sequence,” “target sequence,” or “target region” is meant a specific deoxyribonucleotide or ribonucleotide sequence comprising all or part of the nucleotide sequence of a target nucleic acid molecule.
A “target sequence binding region” refers to a nucleic acid molecule, e.g., an oligonucleotide, that has a base sequence sufficiently complementary to its target nucleic acid sequence to form, for example, an oligonucleotide:target hybrid stable for detection under stringent hybridization assay conditions. Typically, a target sequence binding region comprises at least about 6 nucleobase subunits, preferably between 6 to about 500 or 1,000 nucleobase subunits.
A “transcription unit” refers to a molecule that encodes a compomer or a cleavage substrate according to the invention. A transcription unit serves at the template for synthesizing a compomer according to the invention. Synthesis of compomers preferably occurs by transcription of the compomer-encoding region of the transcription unit. Thus, transcription units preferably at least include a functional promoter and a compomer-encoding region.
In the context of this invention, “unique” refers to a molecular species that differs in one or more distinguishable ways from the other molecular species present. Preferably, in the context of compomers, each compomer species generated in a particular reaction will be unique as compared to each of the other compomer species produced in the reaction. Thus, even if all of the compomer species present in a given reaction are to be analyzed, for example, based on a single defined characteristic (e.g., mass), the mass (or mass range) of each compomer species will be sufficiently different from the other compomer species present such that it can be detected and resolved in the context of the particular assay. In the context of target molecules, a “unique target molecule” refers to a target molecule species that can be distinguished from each of the other target molecule species in a given reaction. As will be appreciated, a single gene (or other genetic locus comprising a contiguous span of nucleotides (preferably from about 10 to about 1 million or more nucleotides) may contain multiple sites that can be independently targeted by different target detection reagent species (which species differ from one another due to different target detection moieties, and preferably due also to different compomer template species that encode distinguishable compomer species).