The present invention discloses techniques for simply and efficiently sorting expressed genes into nonredundant groups of cDNA molecules reverse-transcribed from any source of eukaryotic RNA. These groups of cDNA molecules can themselves be used for genetic analyses according to methods in the art, or they can be further sorted according to the techniques of the present invention. By applying these techniques one can obtain a collection of nonredundant subgroups of cDNA molecules, with every expressed-gene transcript from the original mRNA sample uniquely represented in its own subgroup. The method further allows to reach a stage in which each expressed-gene transcript is found in one tube, i.e. xe2x80x9cone gene per well.xe2x80x9d Uses of the present invention include the isolation, identification and analysis of genes, the analysis and diagnosis of disease states, the study of cellular differentiation, and gene therapy.
The production of cDNA or gene libraries has involved cloning by the use of cloning vectors placed in host organisms such as bacteria or yeast. These libraries suffer from redundancy: they contain either multiple copies of particular cDNA sequences, or multiple cDNA fragments from each expressed gene, or both. This redundancy persists in all of the current normalization procedures. The presence in a collection of cDNAs of multiple copies of particular cDNA sequences, and/or multiple cDNA fragments from each expressed gene, can result in pointless duplication of research efforts and other significant inefficiencies.
U.S. Pat. No. 5,707,807, Molecular Indexing For Expressed Gene Analysis, concerns the creation of subgroups of DNA by repeated digestions with a number of restriction enzymes, followed by ligation with adaptors having a common primer template, PCR amplification and, finally, comparison of patterns of PCR products separated by polyacrylamide-gel electrophoresis. The method of this patent creates groups of DNA molecules. However, because each PCR step indiscriminately amplifies all ligated DNA molecules in each sample, the method has a limited capacity to sort DNA into nonredundant groups.
P. Unrau and K. V. Deugau, Non-cloning amplification of specific DNA fragments from whole genomic DNA digests using DNA xe2x80x98indexersxe2x80x99, Gene 145:163-169 (1994) concerns characterizing fragments of digested DNA by the sequences of their cohesive ends and their lengths, optionally aided by PCR. However, each PCR step indiscriminately amplifies all ligated DNA molecules in each sample, and amplifies numerous DNA fragments per gene. The method does not yield nonredundant groups of genes.
U.S. Pat. No. 5,728,524, Process For Categorizing Nucleotide Sequence Populations, concerns obtaining groups of DNA molecules by using pools of adaptors ligated to digested DNA, followed by PCR. Each PCR step amplifies numerous DNA fragments per gene. The method fails to produce nonredundant groups of genes.
D. R. Smith, Ligation-mediated PCR of Restriction Fragments from Large DNA Molecules, concerns a general method for PCR amplification of type IIs restriction fragments by ligation of adaptors with degenerate end sequences complementary to cohesive ends of digested DNA fragments. Each PCR step amplifies numerous DNA fragments per gene. The method fails to produce nonredundant groups of genes.
U.S. Pat. No. 5,871,697, Method And Apparatus For Identifying, Classifying, Or Quantifying DNA Sequences In A Sample Without Sequencing, concerns classifying DNA sequences by making extensive use of comparative databases and fragment-length and restriction-digest information. The patent concerns DNA digestion and ligation of adaptors with priming sequences specific for a particular restriction enzyme. The method in this patent does not aim at the production of nonredundant groups of genes.
The present invention provides novel techniques for producing a cDNA or gene library without redundancy. These techniques sort DNA on a sequence-dependent basis into nonredundant groups. At the same time, however, these techniques eliminate the need to determine any of the DNA sequences prior to sorting and identifying genes.
One object of the present invention is providing a method of sorting cDNA or genes into nonredundant groups, which can then be analyzed by various techniques known in the art. One of many such techniques is the cDNA microarray method in which the cDNA clones derived from the present invention are used to produce the array that is then examined by hybridization to determine differential gene expression. Another technique is differential display of gel-electrophoresis patterns involving mRNA sources to analyze biological models such as disease states or cellular differentiation. In application to this technique the groups derived from the present invention can be used for differential display of gel-electrophoresis patterns.
Another object of the present invention is providing a method of obtaining a collection of nonredundant subgroups of cDNA molecules, with every expressed-gene transcript from an original mRNA sample uniquely represented in its own subgroup, i.e. xe2x80x9cone gene per well.xe2x80x9d Such isolated genes have a wide variety of uses, notably including gene therapy and analysis of the human genome.
The present invention provides a method of sorting genes and/or gene fragments comprising the following steps (herein called xe2x80x9cMethod Ixe2x80x9d):
(1) preparing ds cDNA molecules from mRNA molecules by reverse transcription, using a poly-T primer optionally having a general primer-template sequence upstream from the poly-T sequence, yielding ds cDNA molecules having the poly-T sequence, optionally having the general primer-template sequence;
(2) digesting the ds cDNA molecules with a restriction enzyme that produces digested cDNA molecules with cohesive ends having overhanging ssDNA sequences of a constant number of arbitrary nucleotides;
(3) ligating to the digested cDNA molecules a set of dsDNA oligonucleotide adaptors, each of which adaptors has at one of its ends a cohesive-end ssDNA adaptor sequence complementary to one of the possible overhanging ssDNA sequences of the digested cDNA, at the opposite end a specific primer-template sequence specific for the ssDNA adaptor complementary sequence, and in between the ends a constant sequence that is the same for all of the different adaptors of the set;
(4) amplifying by separate polymerase chain reactions the ligated cDNA molecules, utilizing for each separate polymerase chain reaction a primer that anneals to the cDNA poly-T sequence optionally having the cDNA general primer-template, and a primer from a set of different specific primers that anneal to the cDNA specific primer-template sequences; and
(5) sorting the amplified cDNA molecules into nonoverlapping groups by collecting the amplification products after each separate polymerase chain reaction, each group of amplified cDNA molecules determined by the specific primer that annealed to the specific primer-template sequence and primed the polymerase chain reaction.
One embodiment of the present invention according to the principles of Method I, comprises a complete set of oligonucleotide adaptors and specific primers, containing an oligonucleotide adaptor and a specific primer complementary to each of the possible overhanging ssDNA sequences of the digested cDNA.
Another embodiment of the present invention according to the principles of Method I further comprises:
(6) amplifying the sorted nonredundant groups of cDNA molecules by nesting polymerase chain reaction, each amplification utilizing a primer that anneals to the cDNA poly-T sequence optionally having the cDNA general primer-template sequence, as well as one of a set of nesting primers with the following general formula
5xe2x80x2-|sequence complementary to the constant sequence of the oligonucleotide adaptors|-NIx-|1-5 nucleotides complementary to one of the possible sequences of 1-5 nucleotides immediately upstream from the overhanging ssDNA sequence on the cDNA|-3xe2x80x2
where N is an arbitrary nucleotide; I is inosine; and x=1,2,3 or 4, being one fewer than the constant number of nucleotides in the overhanging ssDNA sequences; and
(7) sorting the amplified cDNA molecules into nonredundant subgroups by collecting the amplification products after each separate nesting polymerase chain reaction, each nonredundant subgroup of cDNA molecules determined by the particular nested primer that complemented the 1-5 nucleotides immediately upstream from the overhanging ssDNA sequence on the cDNA.
Another embodiment of the present invention according to the principles of Method I further comprises conducting further polymerase chain reactions with further nesting primers complementary to the next immediately upstream cDNA nucleotides, thereby sorting the amplified cDNA molecules further into nonredundant subgroups.
A preferred embodiment of the present invention according to the principles of Method I further comprises conducting further polymerase chain reactions with further nesting primers complementary to the next immediately upstream cDNA nucleotides until each nonredundant subgroup contains cDNA molecules all of essentially the same sequence, with every expressed-gene transcript in the mRNA sample uniquely represented in one of the nonredundant subgroups.
The present invention also concerns a method of sorting genes and/or gene fragments comprising the following steps (herein called xe2x80x9cMethod IIxe2x80x9d):
(1) preparing ds cDNA molecules from mRNA molecules by reverse transcription, using a poly-T primer optionally having a general primer-template sequence upstream from the poly-T sequence, yielding ds cDNA molecules having the poly-T sequence, optionally having the general primer-template sequence;
(2) digesting the ds cDNA molecules with a first restriction enzyme that produces digested cDNA molecules with cohesive ends having first overhanging ssDNA sequences of a constant number of arbitrary nucleotides;
(3) ligating to the digested cDNA molecules a set of dsDNA oligonucleotide adaptors, each of which adaptors has at one of its ends a cohesive-end ssDNA adaptor sequence complementary to one of the possible first overhanging ssDNA sequences of the digested cDNA, at the opposite end a specific primer-template sequence specific for the ssDNA adaptor complementary sequence, and in between the ends a constant sequence that is the same for all of the different adaptors of the set, and that contains a recognition site for a second restriction enzyme that can cleave the ligated cDNA molecules at a point further from the ligated oligonucleotide adaptor than the overhanging ssDNA sequences of the digested cDNA, and can create cohesive ends having second overhanging ssDNA sequences of a constant number of arbitrary nucleotides;
(4) amplifying by separate polymerase chain reactions the ligated cDNA molecules, utilizing for each separate polymerase chain reaction a primer that anneals to the cDNA poly-T sequence optionally having the cDNA general primer-template, and a primer from a set of different specific primers that anneal to the cDNA specific primer-template sequences; and
(5) sorting the amplified cDNA molecules into nonoverlapping groups by collecting the amplification products after each separate polymerase chain reaction, each group of amplified cDNA molecules determined by the specific primer that annealed to the specific primer-template sequence and primed the polymerase chain reaction.
One embodiment of the present invention according to the principles of Method II comprises using a complete set of oligonucleotide adaptors and specific primers, containing an oligonucleotide adaptor and a specific primer complementary to each of the possible first overhanging ssDNA sequences of the digested cDNA.
Another embodiment of the present invention according to the principles of Method II further comprises
(6) digesting the sorted nonredundant groups of cDNA molecules with the second restriction enzyme, cleaving the ligated cDNA molecules at a point further from the ligated oligonucleotide adaptor than the overhanging ssDNA sequences of the digested cDNA, and creating cohesive ends having second overhanging ssDNA sequences of a constant number of arbitrary nucleotides;
(7) ligating to the digested cDNA molecules a set of nesting dsDNA oligonucleotide adaptors, each of which adaptors has at one of its ends a cohesive-end ssDNA adaptor sequence complementary to one of the possible second overhanging ssDNA sequences of the digested cDNA, at the opposite end a specific primer-template sequence unique for the ssDNA adaptor complementary sequence, and in between the ends a constant sequence that is the same for all of the different adaptors of the set, and that contains the recognition site for the second restriction enzyme;
(8) amplifying by separate polymerase chain reactions the ligated cDNA molecules, utilizing for each separate polymerase chain reaction a primer that anneals to the cDNA poly-T sequence optionally having the cDNA general primer-template, and a primer from a set of different specific primers that anneal to the cDNA specific primer-template sequences; and
(9) sorting the amplified cDNA molecules into nonredundant subgroups by collecting the amplification products after each separate polymerase chain reaction, each subgroup of amplified cDNA molecules determined by the specific primer that annealed to the specific primer-template sequence and primed the polymerase chain reaction.
One embodiment of the present invention according to the principles of Method II further comprises using a complete set of nesting dsDNA oligonucleotide adaptors, containing an oligonucleotide adaptor complementary to each of the possible second overhanging ssDNA sequences of the digested cDNA.
Another embodiment of the present invention according to the principles of Method II further comprises conducting further polymerase chain reactions using further nesting oligonucleotide adaptors, optionally with different restriction enzymes and recognition sites, thereby sorting the amplified cDNA molecules further into nonredundant subgroups.
A preferred embodiment of the present invention according to the principles of Method II further comprises conducting further ligations with further nesting oligonucleotide adaptors, optionally with different restriction enzymes and recognition sites, until each nonredundant subgroup contains cDNA molecules all of essentially the same sequence, with every expressed-gene transcript in the mRNA sample uniquely represented in one of the nonredundant subgroups.
The present invention also provides a method (Method III) of sorting genes and/or gene fragments comprising the steps of:
(1) preparing ds cDNA molecules from mRNA molecules by reverse transcription, using a poly-T primer having a general primer-template sequence upstream from the poly-T sequence that includes a recognition sequence for a restriction enzyme, yielding ds cDNA molecules having the poly-T sequence, having the general primer-template sequence;
(2) dividing the cDNA into N pools, wherein N is 1 to 25 digesting the ds cDNA molecules with different restriction enzymes that produce digested cDNA molecules with cohesive ends having overhanging ssDNA sequences of a constant number of arbitrary nucleotides;
(3) ligating to the digested cDNA molecules of each pool a set of dsDNA oligonucleotide adaptors, each of which adaptors has at one of its ends a cohesive-end ssDNA adaptor sequence complementary to one of the possible overhanging ssDNA sequences of the digested cDNA, at the opposite end a specific primer-template sequence specific for the ssDNA adaptor complementary sequence, and in between the ends a constant sequence that is the same for all of the different adaptors of the set;
(4) amplifying by separate polymerase chain reactions the ligated cDNA molecules of each pool, utilizing for each separate polymerase chain reaction a primer that anneals to the cDNA poly-T sequence optionally having the cDNA general primer-template, and a primer from a set of different specific primers that anneal to the cDNA specific primer-template sequences;
(5) sorting the amplified cDNA molecules from each pool into non-overlapping groups by collecting the amplification products after each separate polymerase chain reaction, each group of amplified cDNA molecules determined by the specific primer that annealed to the specific primer-template sequence and primed the polymerase chain reaction, wherein each of the restriction enzymes digests the N separate cDNA pools into 64 or 256 non-redundant sub-groups;
(6) digesting cDNA fragments in each non-redundant sub-group of the cDNA pools with different restriction enzymes and further purifying the digested cDNA fragments by removing the small end fragments produced by the cleavage;
This invention also provides a method of making sub-libraries of ligation sets by ligating restriction enzymes digested fragments generated by method III into a plasmid vector that have recognition sequence for said restriction enzymes and predigesting with these enzymes to make 64xc3x97N or 256xc3x97N sets of ligations, wherein N is 1 to 25.
This invention further provides a method of making sub-libraries of bacterial colonies, wherein the ligation sets, generated in the method of making sub-libraries of ligation sets, are transformed into bacteria and plated onto bacterial growth plates to produce bacteria colonies containing each of the 64xc3x97N or 256xc3x97N non-redundant subgroups of cDNA fragments, wherein N is 1 to 25.
In one embodiment of method III, N is two and the restriction enzymes in step (1) comprise AscI and another similar rare restriction enzyme.