This invention relates to a method of identifying gene transcription patterns in a cell or tissue.
Expressed Sequence Tag (EST) programs have provided DNA sequence information for a substantial proportion of expressed human genes (Fields, C. et al., Nature Genetics 7: 345-346 (1994)) in the human genome. However, DNA sequence information alone is insufficient for a complete understanding of gene function and regulation.
Because only a fraction of the full genetic repertoire is expressed in a cell at any given time, and because gene expression effects cell phenotype, tools to qualitatively and quantitatively monitor gene transcription are needed.
Classical qualitative and quantitative techniques such as northern blotting and nuclease protection assays are accurate and quantitative, but cannot provide information quickly enough to generate global gene expression profiles.
More recent approaches include sequence analysis of random isolates from cDNA libraries, Polymerase Chain Reaction (PCR) and hybridization-array-based methodologies, but each of these methods has limitations.
High-density microarray hybridization of RNA or cDNA corresponding to known genes (Ramsay, G. Nature Biotechnology 16: 40-44 (1998)) is a fast method for parallel analysis of global gene expression. This method, however, is limited to known genes and the number of genes in a single microarray is limited as well.
Sequencing of random isolates from cDNA libraries to generate ESTs provides quantitative results, but is a daunting task. (Adams, M. D., et al., Science 252: 1651-1656 (1991); Adams, M. D. et al., Nature 355: 632-634 (1992)). Within cDNA libraries, the frequency of a cDNA clone should be proportional to the steady-state amount of that transcript in the RNA population of the cell or tissue from which the RNA was derived. (Okubo K. et al., Nature Genetics 2: 173-179 (1992); Lee, N. H. et al., Proc Natl Acad Sci USA 92: 8303-8307 (1995)). This approach, however, requires DNA sequencing efforts beyond the capacity of most laboratories.
PCR-based methods can generate DNA fragments from mRNA pools which differ in size and sequence enabling their separation and identification to form an expression profile. Profiles from different cell or tissue populations to detect differentially expressed genes. This method has been used to establish databases of mRNA fragments. (Williams, J. G. K., Nucl. Acids Res. 18:6531 (1990); Welsh, J., et al. Nucl. Acids Res., 18:7213 (1990); Woodward, S. R., Mamm. Genome, 3:73 (1992); Nadeau, J. H., Mamm. Genome 3:55 (1992)). Some have sought to adapt these methods to compare mRNA populations between two or more samples (Liang, P. et al. Science 257:967 (1992); See also Welsh, J. et al., Nucl. Acid Res. 20:4965 (1992); Liang, P., et al., Nucl. Acids Res., 3269 (1993), and WO 95/13369, Published May 18, 1995. Differential Display and Amplified Fragment Length Polymorphism (AFLP) (Liang P. and Pardee, A. B. Science 257: 967-971 (1992)), (Vos, P. et al., Nucleic Acids Res. 23: 4407-4414. (1995)), for example, can provide gene expression information at the appropriate speed and scale, but these methods can suffer from a lack of precision and reproducibility due to their susceptibility to quantitative PCR artifacts.
Recently, a variation of PCR for a random cDNA sequencing approach was described by Velculescu et al. (Velculescu, V. E. et al., Science 270:484 (1995)). This technique, called Serial Analysis of Gene Expression (SAGE), generates short, defined sequences from cDNAs which are randomly ligated in a tail-to-tail fashion and amplified by PCR to form xe2x80x9cdi-tagsxe2x80x9d. These di-tags are then concatenated into arrays which are cloned and analyzed by DNA sequencing. Because each sequencing template contains identifiable tags corresponding to many genes, the potential throughput of SAGE exceeds traditional cDNA sequencing, allowing gene transcription profiling in many laboratories.
However, the results for SAGE, like any other PCR process is influenced by factors other than starting template abundance. Sequence-specific differences in xe2x80x9camplification efficiencyxe2x80x9d are known to give rise to artifactual differences in product yield. That is, the quantity of PCR product may differ in the absence of real differences in starting template. For example, amplification of the same template preparation produces product yields that can vary by as much as 6-fold (Gilliand et al. PCR Protocols. Academic Press, pp 60-69 (1990)). Hence, any PCR-based method that attempts to infer starting template abundance from the quantity of product generated by amplification requires stringent co-amplification controls.
Thus, there is a need for a simple and reproducible method for detecting and quantifying gene transcription, identifying genes, and gene transcription patterns and frequency in individual cells or tissues, which is free from PCR and other artifacts, provides for unknown genes, and yet is fast enough to allow speedy detection and comparison between samples.
In order to circumvent the problems found in the art, we have developed a cDNA tag-based technique called TALEST (Tandem Arrayed Ligation of Expressed Sequence Tags) that avoids PCR amplification artifacts. The technique provides a xcx9c25-fold increase in throughput relative to random cDNA sequencing approaches to gene expression profiling.
This invention provides an improved method of obtaining short DNA xe2x80x9ctagxe2x80x9d sequences which allows for determination of the relative abundance of a gene transcript within a given mRNA population.
This invention provides a method of obtaining an array of tags.
This invention provides a method of identifying patterns of gene transcription.
This invention provides a method of detecting differences in gene transcription between two or more mRNA populations.
This invention provides a method of determining the frequency of individual gene transcription in an mRNA population.
This invention provides a method of screening for the effects of a drug on a cell or tissue.
This invention provides a method of detecting the presence of a stress, whether disorder, disease, the onset or proceeding of development or differentiation, exogenous substance (chemical, cofactor, biomolecule or drug), condition (including environmental conditions, such as heat, osmotic pressure, or the like), receptor activity (whether due to a ligand in a receptor or otherwise), abberant cellular condition (including mutation, unusual copy number or the like) in a target organism.
This invention provides a method of isolating a gene.
This invention provides a kit for obtaining a tag or an array of tags.
To aid the skilled artisan in understanding this invention the following definitions are provided, where they deviate from the terms commonly used in the art.
xe2x80x9cA pattern of gene transcriptionxe2x80x9d as used herein, means the set of genes within a specific tissue or cell type that are transcribed or expressed to form RNA molecules. Which genes are expressed in a specific cell line or tissue and at what level the genes are expressed will depend on factors such as tissue or cell type, stage of development of the cell, tissue, or target organism and whether the cells are normal or transformed cells, such as cancerous cells. For example, a gene is expressed at the embryonic or fetal stage in the development of a specific target organism and then becomes non-expressed as the target organism matures. Or, as another example, a gene is expressed in liver tissue but not in brain tissue of an adult human. In another example, a gene is expressed at low levels in normal lung tissue but is expressed at higher levels in diseased lung tissue. xe2x80x9cA punctuating restriction endonucleasexe2x80x9d as used herein, means a restriction endonuclease having a probability of recognizing a sequence within each copy of cDNA. Preferably, the punctuating endonuclease recognizes a sequence consisting of less than six bases. More preferably the punctuating endonuclease recognizes a sequence consisting of four bases. Most preferably, the punctuating endonuclease is MspI, HaeIII or Sau3aI, or any isoschizomer thereof.
xe2x80x9cA Type IIs restriction endonucleasexe2x80x9d as used herein, means a restriction endonuclease allowing DNA cleavage at a site in the DNA distant from the recognition sequence for the restriction endonuclease. Preferably, the Type IIS restriction endonuclease recognizes four to seven bases and cleaves the adjacent DNA 10-18 bases 3xe2x80x2 to the recognition sequence, also preferably greater than 10 bases. Hence xe2x80x9cdistantxe2x80x9d means within the range of type IIs or type IIs-like restriction endonucleases. Most preferably, the type IIs restriction endonuclease is BsgI, BseRI, FokI, BsmFI.
xe2x80x9cA 5xe2x80x2 cloning restriction endonucleasexe2x80x9d as used herein, means a restriction endonuclease having a corresponding methylase or other protection means that can protect any DNA from cleavage by the enzyme. Preferably, the 5xe2x80x2 cloning restriction endonuclease recognizes a sequence of between about four to ten bases. Examples of this most preferably, the 5xe2x80x2 cloning restriction endonuclease is EcoRI, BamH1, HindIII
xe2x80x9cA 3xe2x80x2 cloning restriction endonucleasexe2x80x9d as used herein, means a restriction endonuclease having a recognition sequence that appears infrequently in the human genome. Preferably, the 3xe2x80x2 cloning restriction endonuclease recognizes a sequence consisting of six or more bases, preferably more than six bases. More preferably the 3xe2x80x2 cloning restriction endonuclease recognizes a sequence consisting of eight or more bases containing a CG dinucleotide within it. More preferably, the 3xe2x80x2 cloning restriction endonuclease provides a cleavage end that does not easily ligate to the cleavage end generated by the 5xe2x80x2 cloning restriction enzyme. An example of a most preferred is the 3xe2x80x2 cloning restriction endonuclease, NotI.
xe2x80x9cA 3xe2x80x2-most cDNA fragmentxe2x80x9d as used herein, means a fragment of double-stranded cDNA, transcribed from an mRNA population of interest from an oligo dT primer, which is preferably biotinylated, consisting of that portion of the full length cDNA between the 3xe2x80x2-most punctuating restriction endonuclease site, and the 3xe2x80x2 terminus of the cDNA. The 3xe2x80x2-most cDNA fragment may be isolated on a solid phase matrix containing streptavidin so as to ease its separate the fragment from other cDNA fragments produced by digestion with the punctuating restriction endonuclease.
xe2x80x9cA first cDNA constructxe2x80x9d as used herein, means a cDNA construct comprising the 3xe2x80x2-most cDNA fragment ligated to a 5xe2x80x2 adapter. A 5xe2x80x2 adapter is ligated to the 5xe2x80x2 end of the cDNA fragment providing the first cDNA construct. It is preferred that this 5xe2x80x2 adapter would provide suitable recognition sites for endonucleases envisioned to be used therein, and it is also preferable to provide sufficient molecular weight for resolution from tags, of course these requirements change with choice of enzyme, staining method for resolution, and the like. The 5xe2x80x2 adapter may be biotinylated allowing the first cDNA construct to be captured on a solid phase matrix containing streptavidin so as to ease its isolation.
xe2x80x9cA second cDNA constructxe2x80x9d as used herein, means a cDNA construct provided by cleaving the first cDNA construct with a type IIs restriction endonuclease which recognizes a sequence within the 5xe2x80x2 adapter but cuts the DNA within the cDNA fragment.
xe2x80x9cA third cDNA constructxe2x80x9d as used herein, means a cDNA construct comprising the second cDNA construct and a 3xe2x80x2 adapter. A 3xe2x80x2 adapter is ligated to the 3xe2x80x2 end of the second cDNA construct providing the third cDNA construct. Preferably, the third cDNA construct is biotinylated and may be captured on a solid phase matrix containing streptavidin so as to ease its isolation.
xe2x80x9cA fourth cDNA constructxe2x80x9d as used herein, means a cDNA construct provided by cleaving the third cDNA construct with the 5xe2x80x2 cloning restriction endonucleases and 3xe2x80x2 cloning restriction endonuclease which recognizes sites located in the 5xe2x80x2 and 3xe2x80x2 adapters, respectively.
xe2x80x9cA 5xe2x80x2 adapterxe2x80x9d as used herein, means an adapter consisting of a double-stranded polydeoxyribonucleotide containing a recognition sequence for a type IIs restriction endonuclease. The 5xe2x80x2 adapter is ligated to the 5xe2x80x2 end of the cDNA fragment(s) generated by cleavage of cDNA with the punctuating restriction endonuclease. Preferably, the 5xe2x80x2 adapter further contains a single-stranded overhang sequence compatible with the overhang sequence produced by cleavage of a cDNA fragment with the punctuating restriction endonuclease. (xe2x80x9cOverhang,xe2x80x9d as used herein, is defined as the effect of having a double stranded DNA, or a DNA/RNA strand that, while largely double stranded, has at one or both ends one or more unpaired or dangling bases on one or both strand, which would be paired, but for the fact that there is no complement on the other strand. Preferably, these occur where one strand has an overhang on one end and the other strand has an overhang on the other end of the double stranded DNA.) More preferably, the 5xe2x80x2 adapter further contains a recognition sequence for a 5xe2x80x2 cloning restriction endonuclease located 5xe2x80x2 to the recognition sequence for the type IIs restriction endonuclease. In a 5xe2x80x2 adapter, the 5xe2x80x2 cloning restriction endonuclease recognition sequence is located greater than about four, preferably greater than about 10, more preferably greater than about twenty, most preferably greater than about 30, and preferably less than about 90, more preferably less than about 70, most preferably less than about 60, nucleotides 5xe2x80x2 to the type IIs restriction endonuclease cleavage sequence. By ligating to a cDNA fragment, a 5xe2x80x2 adapter re-creates a recognition sequence for the punctuating restriction endonuclease. Sense strand of the 5xe2x80x2 adapter has preferably a sequence shown in SEQ ID NO.1 and antisense strand of the 5xe2x80x2 adapter has preferably a sequence shown in SEQ ID NO. 2.
xe2x80x9cA 3xe2x80x2 adapterxe2x80x9d as used herein, means an adapter consisting of a double-stranded polydeoxyribonucleotide for ligation to the 3xe2x80x2 end of a second cDNA construct providing a third cDNA construct. Preferably, a 3xe2x80x2 adapter comprises a degenerate single-stranded end compatible with all possible ends of the second cDNA construct produced by a digestion with the type IIs restriction endonuclease. More preferably, a 3xe2x80x2 adapter further contains a recognition sequence for the punctuating restriction endonuclease located 3xe2x80x2 to the degenerate end. In the 3xe2x80x2 adapter, the recognition sequence of the punctuating restriction endonuclease is preferably adjacent to the degenerate single-stranded end compatible with ends of the second cDNA construct produced by the type Ils restriction endonuclease. Preferably, a 3xe2x80x2 adapter further comprises a recognition sequence for a 3xe2x80x2 cloning restriction endonuclease located 3xe2x80x2 to the recognition sequence for the punctuating restriction endonuclease. Preferably, the 3xe2x80x2 adapter contains one or more biotin molecules located 3xe2x80x2 to the recognition sequence for the 3xe2x80x2 cloning restriction endonuclease where the 3xe2x80x2 restriction endonuclease is not the sense strand of the 3xe2x80x2 adapter comprises all or part of SEQ ID NO. 3 preferably is SEQ ID NO. 3 and antisense strand of the 3xe2x80x2 adapter comprises all or part of SEQ ID NO. 4, and more preferably is SEQ ID NO. 4.
xe2x80x9cA tagxe2x80x9d as used herein, means a DNA a sequence consisting of:
(1) preferably double-stranded 10-14 deoxyribonucleotides corresponding to a cDNA sequence located proximal to the 3xe2x80x2-most punctuating restriction endonuclease site in the cDNA,
(2) a double-stranded base-pair flanking both ends of the cDNA-derived sequence which is itself derived from the recognition sequence from the punctuating restriction endonuclease, and optionally
(3) single-stranded overhang sequences derived from the punctuating restriction endonuclease generating self-compatible cohesive ends.
Before amplification, one tag represents one copy of mRNA and no two same tags are created from one copy of mRNA. A tag comprises cDNA sequences. Preferably, the tags are ligated together using DNA ligase. Treatment of tag ends may be either blunt or cohesive with overhangs. For reasons the skilled artisan will appreciate, an overhang is preferred. More preferably the tags, when ligated together, regenerate the recognition sequence for the punctuating restriction endonuclease allowing the recognition of discrete cDNA-derived sequences owing to their separation by the punctuating sequence.
xe2x80x9cA tag sequencexe2x80x9d as used herein, means DNA sequence comprising at least one tag sequence. An array of tags includes a number of tags, having one or more sequences. The tag sequence can be included in linear oligonucleotide, in a vector or the like.
xe2x80x9cA cDNA tag libraryxe2x80x9d as used herein, means a cDNA library prepared in a vector comprising (1) ligated fragment of a 5xe2x80x2 adapter cleaved with a 5xe2x80x2 cloning restriction endonuclease, (2) ligated fragment of 3xe2x80x2 adapter cleaved with a 3xe2x80x2 cloning restriction endonuclease, and (3) a tag. A cDNA tag library is cloned into a cloning vector and can be amplified in a host cell.
xe2x80x9cAn array of tagsxe2x80x9d as used herein, means tags ligated with their cohesive ends or blunt ends as to form double-stranded DNA sequence. The array of tags is also referred to as xe2x80x9ca concatemerxe2x80x9d, which means concatenated tags. The array of tags can be included in a vector. Preferably, an array of tags comprises cDNA sequences interspersed by recognition sequences for a punctuating restriction endonuclease. The ligated arrays of tags comprise approximately at least 10 tags, preferably at least 30 tags, more preferably at least 40 tags and less than 70 tags, more preferably less than 60 tags, most preferably less than about 51 tags. More preferably, an array of tags begins with and ends with a recognition sequence for the punctuating restriction endonuclease. Most preferably, the recognition sequence for the punctuating restriction endonuclease is located between each tag.
xe2x80x9cA punctuation sequencexe2x80x9d as used herein, means a sequence formed by ligating two ends digested with a punctuating restriction endonuclease as detected in a sequence of an array of tags which punctuates DNA nucleotide sequences.
xe2x80x9cA clampxe2x80x9d as used herein, means a base-pair derived from the recognition sequence from a punctuating restriction endonuclease which remains attached to the cDNA-derived sequence when a tag is generated by digestion with the punctuating restriction endonuclease. The base composition of a clamp preferably consists of guanine (G) and cytosine (C), and is referred to as xe2x80x9ca GC-clampxe2x80x9d, which means a clamp consisting of G and C. The function of a GC-clamp is to enhance thermal stability of a tag by increasing the number of hydrogen bonds which hold the anti-parallel strands of a tag together after digestion with the punctuating restriction endonuclease.
xe2x80x9cGC richxe2x80x9d as used herein, means a sequence in which the percentage of bases which are G or C is more than 40%, preferably more than 50%.
xe2x80x9cCorrespond,xe2x80x9d as used herein, means that at least a portion of one nucleic acid molecule is either complementary to or identical to a second nucleic acid molecule. Thus, a cDNA molecule may correspond to the mRNA molecule where the mRNA molecule was used as a template for reverse transcription to produce the cDNA molecule. Similarly, a genomic sequence of a gene may correspond to a cDNA sequence where portions of the genomic sequence are complementary or identical to the cDNA sequence.
xe2x80x9cHybridizexe2x80x9d as used herein, means the formation of a base-paired interaction between nucleotide polymers. The presence of base pairing implies that a fraction of the nucleotides (e.g., at least 80% of a group of adjacent bases in a nucleotide) in each of two nucleotide sequences are complementary to the other according to the commonly accepted base pairing rules. The exact fraction of the nucleotides which must be complementary in order to obtain stable hybridization will vary with a number of factors, including nucleotide sequence, salt concentration of the solution, temperature, and pH.
xe2x80x9cStringent conditionsxe2x80x9d as used herein, means conditions in which stable hybridization of complementary oligonucleotides is maintained, but mismatches are not (Sambrook et al., Molecular Cloning (1989), see for example, 11.46; RNA hybrids and 9.51, RNA:DNA hybrids. Preferably, stringent condition means incubation at 25-65xc2x0 C. in 1-6xc3x97SSC. More preferably, stringent condition means incubation at 42-65xc2x0 C. in 4-6xc3x97SSC.
xe2x80x9cA probexe2x80x9d as used herein, means an oligonucleotide or a vector containing a tag or tag-derived (that is, derived from all or part of a tag) sequence, used to hybridize to a pool of RNA or DNA and detect nucleic acids of interest by any of a variety of methods known to those skilled in the art.
xe2x80x9cA vectorxe2x80x9d as used herein, means an agent into which DNA of this invention can be inserted by ligation into the DNA of the agent allowing replication of both the insert and agent in a suitable host cell. Examples of classes of vectors can be plasmids, cosmids, and viruses (e.g., bacteriophage). A cloning vector is used for cloning DNA sequences comprising a tag sequence to form a cDNA tag library. More preferably, as a cloning vector, pUC18 and pUC19 are used. Preferably, the endogenous recognition sites for a punctuating restriction endonuclease within the cloning vector have been destroyed by site-directed mutagenesis. A sequencing vector is used to clone tags or arrays of tags in preparation for DNA sequence analysis. As a sequencing vector, pUC18 and pUC19 are preferred.
This invention provides a method of obtaining a tag comprising the steps of:
(a) providing a double-stranded cDNA,
(b) cleaving the double-stranded cDNA with a punctuating restriction endonuclease providing a cDNA fragment,
(c) ligating to the cDNA fragment a 5xe2x80x2 adapter which is blunt or preferably contains a single-stranded overhang compatible with the punctuating restriction endonuclease. Such ligation produces a first cDNA construct in which the recognition sequence for the punctuating restriction endonuclease is regenerated. The 5xe2x80x2 adapter also contains a recognition sequence for a type IIs restriction endonuclease which allows DNA cleavage at a site in the cDNA fragment distant from the recognition sequence for the type IIs restriction endonuclease,
(d) cleaving the first cDNA construct with the type IIs restriction endonuclease providing a second cDNA construct, preferably this construct has 10-14 base-pairs of cDNA-derived sequence and is flanked at its 3xe2x80x2 end by a random single-stranded overhang and at its 5xe2x80x2 end by the recognition sequence of the punctuating enzyme as well as additional sequence derived from the 5xe2x80x2 adapter,
(e) ligating to the second cDNA construct a 3xe2x80x2 adapter, where the adapter has overhangs, it contains degenerate single-stranded overhangs compatible with all possible overhangs present in the first cDNA construct. The 3xe2x80x2 adapter also contains a recognition sequence for the punctuating restriction endonculease which is preferably located immediately proximal to the degenerate single-stranded end if used. Hence, ligation of the 3xe2x80x2 adapter to the first cDNA construct generates a cDNA construct in which a cDNA-derived sequence of 10-14 bases is flanked at both ends by the recognition sequence for the punctuating restriction endonuclease, as well as additional sequence located at either end, providing a third cDNA construct,
(f) digesting the third cDNA construct with a 5xe2x80x2 and 3xe2x80x2 cloning restriction endonuclease to provide a fourth cDNA construct which is ligated into a like digested cloning vector (xe2x80x9clike digestedxe2x80x9d means digested to provide ends which can be ligated to the other ends of the vector or construct) and amplified by growth in a suitable host to form a tag library, and
(g) optionally isolating vector DNA from the tag library and digesting the DNA with the punctuating restriction endonuclease to release the tag from the vector, and
(h) determining the nucleotide sequences of the tag(s).
Preferably, this invention provides a method of obtaining an array of tags, comprising the steps of:
(a) providing double-stranded cDNA from an mRNA using a biotinylated oligo dT primer,
(b) cleaving the double-stranded cDNA with a punctuating restriction endonuclease which cleaves within the cDNA providing a population of a cDNA fragment,
(c) ligating to the cDNA fragment a 5xe2x80x2 adapter comprising,
i) a single-stranded end compatible with ends produced by cleavage with the punctuating endonuclease;
ii) a recognition sequence for a type IIs restriction endonuclease located 5xe2x80x2 to the single-stranded end;
iii) a recognition sequence for a 5xe2x80x2 cloning restriction endonuclease located 5xe2x80x2 to a recognition sequence for the type IIs restriction endonuclease, providing a first cDNA construct,
(d) isolating the first cDNA construct by affinity capture (sucha as chromatography, loose beads, magnetic media or the like) on preferably solid phase streptavidin,
(e) cleaving the first cDNA construct with the type IIs restriction endonuclease from the solid phase and/or streptavidin providing a second cDNA construct,
(f) ligating to the second cDNA construct a 3xe2x80x2 adapter comprising,
i) a degenerate single-stranded end compatible with any ends produced by the type IIs restriction endonuclease;
ii) a recognition sequence for the punctuating endonuclease located 3xe2x80x2 to the degenerate end;
iii) a recognition sequence for a 3xe2x80x2 cloning restriction endonuclease located 3xe2x80x2 to a recognition sequence for the punctuating endonuclease, providing a third cDNA construct,
(g) digesting the third cDNA construct with the third and 3xe2x80x2 cloning restriction endonucleases providing a fourth cDNA construct;
(h) inserting the fourth cDNA construct into a cloning vector digested with the third and 3xe2x80x2 cloning restriction endonucleases,
(i) replicating the vector DNA in a suitable host strain,
(j) isolating the vector DNA,
(k) digesting the vector DNA with the punctuating endonuclease providing a tag comprising cDNA sequences and GC rich clamps,
(l) ligating the tags providing arrays of tags comprising at least 10 tags and GC rich clamps,
(m) optionally inserting the arrays of tags into a sequencing vector,
(n) optionally determining the nucleotide sequences of the arrays of tags.
Double-stranded cDNA is prepared from the target mRNA pool by standard methods using oligo-dT primer. The oligo-dT primer is preferably biotinylated. Preferably, the double-stranded cDNA is treated with methylase or other protection means for a 5xe2x80x2-cloning restriction endonuclease and/or a 3xe2x80x2-cloning restriction endonuclease to protect any internal endonuclease recognition sites.
Double-stranded cDNA is cleaved with a punctuating endonuclease under any known conditions providing a cDNA fragment. The punctuating restriction endonuclease cleaves within the cDNA fragment.
A synthetic, double-stranded adapter molecule with a single-stranded overhang compatible with the punctuating restriction endonuclease, is ligated to the cDNA fragment. The 3xe2x80x2-most cDNA fragment is then isolated by affinity capture, preferably such as biotin and streptavidin, on solid phase which is extensively washed to remove free 5xe2x80x2 adapter providing a first cDNA construct. The 5xe2x80x2 adapter introduces a recognition sequence for a type IIs restriction endonuclease, preferably BsgI; immediately 5xe2x80x2 to the ligated cDNA fragment. Preferably, the 5xe2x80x2 adapter contains a recognition sequence for a 5xe2x80x2 cloning restriction endonuclease, at its 5xe2x80x2 terminus to facilitate later cloning.
Cleavage of the adapter-ligated, preferably solid-phase bound cDNA fragment with the type IIs restriction endonuclease releases into the solution phase a linear DNA fragment consisting of the adapter itself and additional nucleotides of unknown cDNA sequence separated from the adapter by the punctuation sequence providing a second cDNA construct.
A second cDNA construct is then ligated to a 3xe2x80x2 adapter molecule which, if an overhang, such as a two base overhang, is used, would have a 16-fold degenerate overhang at the 5xe2x80x2 end of the 3xe2x80x2 adapter which renders it compatible with all possible cDNA overhang sequences released by the type IIs restriction endonuclease, providing a third cDNA construct. Preferably, the 3xe2x80x2 adapter contains a recognition sequence for a 3xe2x80x2-cloning restriction endonuclease. This adapter introduces a recognition sequence for the punctuating restriction endonuclease to the 3xe2x80x2 end of the second cDNA construct, such that the construct contains a cDNA-derived xe2x80x9ctagxe2x80x9d sequence flanked at both ends by punctuation sequence produced by a 5xe2x80x2 and a 3xe2x80x2 adapter.
The third cDNA construct is digested with the punctuating endonuclease under conditions known to person skilled in the art, thus providing a tag. Digestion with the punctuating endonuclease provides a tag which comprises cDNA sequences with a recognition sequence for a punctuating endonuclease at its ends. The resulting tag can be inserted for example, into a cloning vector to amplify in microorganisms. After amplification, the tag sequence is determined by any known method.
Alternatively, instead of digesting a third cDNA construct with the punctuating restriction endonuclease, the resulting third cDNA construct is digested with the 5xe2x80x2 and 3xe2x80x2-cloning restriction endonucleases providing a fourth cDNA construct. The third cDNA construct can be digested initially with either of 5xe2x80x2 or 3xe2x80x2 restriction endonuclease or digested with both restriction endonucleases simultaneously. In this case the fourth cDNA construct is isolated by any known method including gel electrophoresis or the like, to resolve it from dimers of the adapters which are also formed in the ligation reaction. These manipulations result in a 5xe2x80x2 and 3xe2x80x2-cloning restriction endonuclease-tailed DNA fragment containing a cDNA tag flanked at both ends by the punctuation sequence. The resulting fourth cDNA construct is isolated from the isolation means, or resolving means by known methods, such as eluting from a gel and recovery by ethanol precipitation.
Before inserting the fourth cDNA construct into a cloning vector (digested with the 5xe2x80x2 and 3xe2x80x2-clonong restriction endonuclease), it is preferred that any endogenous punctuating endonuclease restriction sites in the vector have been removed by site-directed mutagenesis. As a cloning vector, pUC18 or pUC19 are preferably used. The cloning vector is digested with a 5xe2x80x2 and 3xe2x80x2-cloning restriction endonuclease and a fourth cDNA construct is inserted into the cloning vector.
The cloning vectors are replicated using any method known in the art. Preferably, the cloning vector comprising cDNA construct is amplified in a host cell such as, but not limited to, E. coli by first transforming E. coli with the vector comprising cDNA construct, growing the transformed cells, and isolating the cloning vector from cell culture.
Preferably, cultured host cells are collected by centrifugation and then plasmid DNA is prepared from the precipitate using any known procedures to isolate the vector DNA.
The plasmid DNA is digested with the punctuating restriction endonuclease to release the tags. Each tag is a DNA fragment consisting of a 10-14 base-pair sequence derived from the cDNA. The resulting tag is flanked at both ends, preferably by compatible single-stranded overhangs, which are derived from recognition sequence for a punctuating endonuclease. When the punctuating endonuclease is MspI, tags have GC single-stranded 3xe2x80x2 overhang and CG single-stranded 5xe2x80x2 overhangs. A GC clamp prevents the melting of tags at ambient temperatures and attendant bias against AT-rich sequences. The tag fragments are isolated away from the plasmid backbone by acrylamide gel electrophoresis, eluted from the gel and recovered by ethanol precipitation in preparation.
Tags are ligated together via their compatible ends to form arrays of tags. The arrays of tags are isolated by agarose gel electrophoresis.
Arrays of tags are inserted into a sequencing vector in preparation for DNA sequence analysis. As a sequencing vector, any vector is useful. Preferably, pGEM(copyright)(Promega Corp., Madison, Wis.), pBluescript(copyright) (Stratagene, La Jolla, Calif.), pUC18 or pUC19 are used. More preferably, pUC19 can be used. Each array consists of preferably, 10-14-base pair tag sequences separated from each other and from the plasmid backbone by the defined 4-base punctuation sequence.
Any known procedures are used for sequencing analysis to determine the nucleotide sequences of the tag or the arrays of tags.
This method allows mRNAs with a number of copies to be detected in a given cell population. By comparing gene transcription profiles among cells, this method can be used to identify individual genes whose transcription is associated with a pathological phenotype.
Using high throughput DNA sequencing, the method of this invention also permits the generation of a global gene transcription profile. Thus, this invention provides a simple and rapid method of obtaining sufficient data to use in an information system known to those of skill in the art to obtain a global gene transcription profile and identify genes of interest.
Accordingly, this invention can be used to identify differential gene transcription patterns among two or more cells or tissues. Thus, using the methods of this invention one can identify a gene or genes that are transcribed in any given cell type, tissue, or target organism at a different level from that in another cell type, tissue, or target organism.
The methods of this invention can be used to identify differential gene transcription patterns at different stages of development in the same cell-type or tissue-type, and to identify changes in gene transcription patterns in diseased or abnormal cells. Further, this invention can be used to detect changes in gene transcription patterns due to changes in environmental conditions or to treatment with drugs. To do so, patterns of gene transcription are compared using double-stranded cDNA obtained from different mRNA populations of interest.
This invention also provides a method of identifying patterns of gene transcription, comprising steps of:
(a) providing tags from sources of interest according to this invention, and
(b) identifying patterns of gene transcription.
Tags are prepared from an mRNA population of interest according to this invention and their sequences are determined using conventional procedures.
Sequences of resulting tags are compared to known sequence databases to identify patterns of gene transcription.
This invention also provides a method of detecting a difference in gene transcription between two or more mRNA populations, comprising steps of:
(a) identifying patterns of gene transcription from a first mRNA population according to this invention,
(b) identifying patterns of gene transcription from a second mRNA population according to this invention, and
(c) comparing the patterns of gene transcription from (a) and (b).
Preferably, the first mRNA population is obtained from a normal cell or tissue. Patterns of gene transcription from a first mRNA population are then identified. Preferably, the second mRNA population and/or any additional mRNA population is obtained from a target organism having a disease or disorder, cells or tissues at different developmental stages, different tissues or organs of the same target organism or different target organisms and patterns of gene transcription are identified.
The patterns obtained from the first and second mRNA populations are compared and the difference is observed. In addition, patterns from other mRNA populations can be compared to those initially derived. This method is also useful in identifying genes modulated by development, disorders, drugs, stress, disease or the like.
This invention also provides a method of determining the relative frequency of a particular gene""s transcription compared to other genes transcribed into an mRNA population comprising the steps of:
(a) providing an array of tags of interest,
(b) sequencing the array of tags, and
(c) determining relative frequency of any or all tags.
An array of tags is prepared based on this invention from cDNA library from an mRNA population.
The array of tags can be sequenced by using any known methods, such as sequencing by hybridization method, (see for example U.S. Pat. No. 5,202,231, hereby incorporated by reference).
Once sequencing of tags is accomplished, determining the frequency of tags is done by any method available. For example, this can be done manually or using a suitable algorithm and/or a computer searchable database.
This invention also provides a method of screening for a disease, a disorder, or the like stress as defined above, including the effects of a drug on a cell or tissue comprising the steps of:
(a) identifying patterns of gene transcription in the normal cell according to this invention,
(b) identifying patterns of gene transcription in the presence of a stress or the like according to this invention,
(c) comparing the patterns of gene transcription from (a) and (b).
For example, the differences in the patterns of gene transcription between cells cultured with the drug and those without the drug is compared to determine whether the drug changes the gene transcription profile. This method yields information on (1) markers useful in diagnosis of disease or other stress as defined above, such as by blood test, the like, and/or (2) determining target enzymes or proteins for treatment and thus providing an aid in drug design or development.
This invention also provides a method of detecting the presence of a disease, or other stress as defined above in a target organism comprising the steps of:
(a) providing the tag sequence of a gene that is differentially expressed (either expressed more abundantly-increased expression, or expressed less abundantly-decreased expression, that is the disease or other stress as defined above modulates expression in some way) in a normal cell or tissue according to this invention,
(b) hybridizing a cDNA library obtained from a first target organism with the tag,
(c) hybridizing a cDNA library obtained from a second normal or diseased (or effected by other stress as defined above) target organism with the tag sequence, and
(d) comparing the level of transcription of the gene in the first target organism with the level of transcription of the gene in the second target.
Any known methods are employed to detect the presence of a disease or other stress as defined above in the first target organism to compare the level of transcription of the gene in the first target organism with the level of transcription of the gene in the second target organism.
This invention also provides a method of isolating a gene, comprising the steps of:
(a) providing a probe comprising a tag sequence of interest according to this invention,
(b) probing cDNA library of interest, and
(c) isolating a gene.
Any tag of interest can be used to provide a probe comprising a tag sequence of interest. This probe can be used to find a new gene, detect a gene in a cell, or detect a mutation. A probe comprising a tag sequence is prepared using synthetic oligonucleotide or a vector comprising a tag sequence. A probe is preferably labeled for detection by radioisotope, fluorescence and the like, or for isolation such as by biotin, streptavidin or the like.
The nucleotide sequence of a tag is compared with known nucleotide sequences to determine which gene to isolate. Known nucleotide sequences can be obtained from any source using sequence databases, such as GenBank, etc. This invention also provides a kit for obtaining a tag or an array of tags comprising:
(a) a 5xe2x80x2 adapter,
(b) a 3xe2x80x2 adapter
(c) appropriate vectors, including cloning and/or sequencing vectors, and
(d) appropriate restriction endonucleases, as disclosed herein.
The kit can further include reaction buffer and/or cDNA library and other such components.
Hence, this invention provides a rapid and accurate means to quantitatively analyze the gene transcription profile of interest and to compare profiles between different sources for a host of reasons.
The method also assists in new gene discovery because the 10-14-bp tag sequences generated by this invention can serve as hybridization probes to facilitate the isolation of interesting tagged genes whose function is not yet known. Isolation of genes using such tags is well understood in the art.