1. FIELD OF THE INVENTION
2. BACKGROUND OF THE INVENTION
2.1. CHARACTERISTICS OF DISEASE AND OTHER PHENOTYPES
2.2. GENE IDENTIFICATION BY POSITIONAL CLONING
2.2.1. LINKAGE MAPPING
2.2.2. CHROMOSOMAL LOCALIZATION
2.2.3. FURTHER REFINEMENT
2.2.4. FROM LOCUS TO GENE
2.3. MISMATCH REPAIR
3. SUMMARY OF THE INVENTION
4. BRIEF DESCRIPTION OF THE DRAWINGS
5. DETAILED DESCRIPTION OF THE INVENTION
5.1. GENETIC HETEROGENEITY
5.1.1. GENETIC HETEROGENEITY IN CELL LINES
5.1.2. GENETIC HETEROGENEITY IN TISSUES
5.2. TWO APPROACHES FOR THE VGIDSM METHOD
5.2.1. FIRST APPROACH: CELL LINES OR SOLE TISSUE SAMPLE
5.2.2. SECOND APPROACH: SAMPLES FROM ORGANISMS HAVING CONSANGUINITY
5.3. MISCELLANEOUS METHODS USED IN CONJUNCTION WITH THE VGIDSM METHOD
5.3.1. DNA AMPLIFICATION
5.3.2. ADJUSTING STRINGENCY
5.4. PHENOTYPE SELECTION TO OPTIMIZE THE VGIDSM METHOD
5.4.1. TISSUE SAMPLE COLLECTION
5.4.2. CELL CULTURE
5.5. TROUBLESHOOTING THE VGIDSM METHOD
5.6. ASSAYS FOR PHENOTYPE SELECTION
5.7. DISEASES, DISORDERS, AND OTHER PHENOTYPES
5.8. LINKING OLIGONUCLEOTIDES TO SPECIFIC BINDING LIGANDS
5.9. ANTIBODIES AND DERIVATIVES THEREOF
5.10. ANTIBODY COLUMNS FOR SORTING NUCLEIC ACIDS
5.11. DETECTION OF ANTIBODIES AGAINST PEPTIDE-LABELED OLIGONUCLEOTIDES
6. EXAMPLE: USE OF THE VGIDSM METHOD TO IDENTIFY hDinP GENES
6.1. INTRODUCTION
6.2. MATERIALS AND METHODS
6.3. RESULTS
6.4. DISCUSSION
6.5. APPLICATION OF THE VGIDSM METHOD TO A COMPLEX, MULTISTAGE SYSTEM
6.5.1. EXAMPLE OF A COMPLEX SYSTEM-BREAST CANCER
6.5.2. ANALYSIS OF A COMPLEX SYSTEM USING MULTIPLEX VGIDSM
The present invention relates generally to the field of genomics. More particularly, the present invention relates to a method for gene identification beginning with user-selected input phenotypes. The method is referred to generally as the ValiGeneSM Gene Identification method, or the VGIDSM method. The method employs nucleic acid mismatch binding protein chromatography to effect a molecular comparison of one phenotype with others. Genes are identified as having a specified function, or as causing or contributing to the cause or pathogenesis of a specified disease, or as associated with a specific phenotype, by virtue of their selection by the method. Identified genes may be used in development of reagents, drugs and/or combinations thereof useful in clinical or other settings for prognosis, diagnosis and/or treatment of diseases, disorders and/or conditions. The method is equally suited for gene identification for agricultural, bio-engineering, medical, veterinary, and many other applications. When more than two source populations of nucleic acids are simultaneously compared, the method may be referred to as multiplex VGIDSM.
Identification of a particular genotype responsible for a given phenotype is an essential goal underlying gene-based medicine because it affords a rational departure point for the development of successful strategies for disease management, therapy and even cure. While, by one recent estimate, only two percent (2%) of the human genome has yet been sequenced, perhaps more than 50% of expressed human genes are at least partially represented in existing databases (Duboule, D., Oct. 24, 1997, Editorial: The Evolution Of Genomics, Science 278, 555). It is therefore quite clear that understanding functional interactions among the products of expressed genes represents the next great challenge in medicine and biology. This pursuit has been referred to as xe2x80x9cfunctional genomics,xe2x80x9d although this term is perhaps too broad to have a clear meaning (Heiter, P. and Boguski, M., Oct. 24, 1997, Functional Genomics: It""s All How You Read It, Science 278, 601-602). Nevertheless, it is the prevailing view that functional genomics generally describes xe2x80x9c . . . a transition or expansion from the mapping and sequencing of genomes . . . to an emphasis on genome function.xe2x80x9d (Id.). Further, this new emphasis will require xe2x80x9c . . . creative thinking in developing innovative technologies that make use of the vast resource of structural genomics information.xe2x80x9d Perhaps the best definition of functional genomics is xe2x80x9c . . . the development and application of global (genome-wide or system-wide) experimental approaches to assess gene function by making use of the information provided by structural genomics.xe2x80x9d (Id., emphasis added).
One of the major advantages of the present invention is the circumvention of large-scale sequencing in determining functional relationships among genes. The VGIDSM method of the present invention is a straightforward yet very powerful genetic comparison or subtraction technique. Functional information is obtained from global (i.e. genome-wide) expressed gene comparison of two or more user-defined phenotypes using mismatch binding protein chromatography. With the VGIDSM method, disease genes may be identified over a time period of weeks, unlike the years required to succeed using positional cloning.
2.1. Characteristics of Disease and Other Phenotypes
Genetic diseases and other genetically-determined phenotypes, irrespective of mode of inheritance, can be due to single or multiple lesions (i.e. mutations) affecting one gene or more than one gene simultaneously. Genetic heterogeneity (i.e. a difference in DNA sequence), by definition, characterizes all diseases which have a genetic component. Genetic diseases can be further categorized among four broad genotypic groups, as described below.
A mono-allelic disease is characterized as having a mutation in a single allele of a single gene. This disease group is the simplest in terms of genetic analysis since mono-allelic diseases arise, by definition, from a unique lesion affecting a single gene. Mono-allelic diseases have also been described as displaying xe2x80x9cmolecular monomorphism,xe2x80x9d which is another way of saying that a single molecular defect in a single gene accounts for the disease phenotype. Since such genetic lesions are unique, they are invariably xe2x80x9ccausativexe2x80x9d of the disease in question. For a mono-allelic disease, only a few affected individuals need to undergo genetic analysis to attribute a given mutation to a disease phenotype. That is, large familial studies are not required to identify the disease-causing gene. Only a few examples of such diseases are known. One example is sickle cell anemia, which is due to a single base substitution (i.e. Axe2x86x92T) in the gene encoding hemoglobin. This base substitution changes the respective codon from GAG to GTG, ultimately resulting in a glutamate-to-valine amino acid substitution at position six of the hemoglobin xcex2 chain molecule and the characteristic, devastating sickle-shaped erythrocyte.
A polyallelic disease is characterized as having several different mutations arising independently in a single gene. Here, each independent mutation event gives rise to a different disease allele. A significant proportion of all genetic disease is thought to result in this way. Because such de novo mutations are so frequent, polyallelism is a very common characteristic of genetic disease. Duchenne""s muscular dystrophy (DMD), Becker""s myopathy, and cystic fibrosis (CF) are well-known examples of polyallelic diseases (see e.g. McKusick, Mendelian Inheritance in Man, Catalog of Autosomal Dominant, Autosomal Recessive, and X-Linked Phenotypes, 10th Edition, 1992, The Johns Hopkins University Press, Baltimore, Md.). Polyallelism may arise in at least two ways. First, each new case of a disease may arise from an independent mutation event in the target gene. For example, in DMD, at least 30% of cases present novel mutations in the dystrophin gene which differ from all previously-characterized mutations. Second, selective fixation of different founder-effect mutations contributes to the occurrence of polyallelism. One example of this is the xcex2-thallasemias in which the world population of affected individuals presents remarkably high polyallelism, but local populations are characterized by limited allelic heterogeneity.
Non-allelic genetic disease is characterized as having more than one candidate gene. Here, a genetic disease which is clinically well-defined may be due to a lesion (mutation) of any one gene among several candidate genes. For example, imperfect osteogenesis is caused by lesion of any one of five distinct type 1 collagen genes. However, the identification of candidate genes for a non-allelic genetic disease is made more difficult when the several candidate genes, unlike the collagen genes, are not related in sequence. For example, pituitary dwarfism is physiologically due to hyperfunction of the anterior pituitary gland. In a minority of pituitary dwarfism cases, the causative lesion has been traced to the gene complex elaborating growth hormone (Kaplan and Delpech, 1993, in Molecular Biology and Medicine, 2nd ed., Mxc3xa9decine-Sciences Flammarion, Paris, Chap. 12, pp. 307-308). In the vast majority of cases, however, these genes are perfectly normal and the causative disease loci are not even linked to the growth hormone complex (as demonstrated by polymorphism linkage studies, Id.). Therefore, other unidentified genes comprising alleles not related to growth hormone account for the majority of pituitary dwarfism cases. Such non-allelic diseases clearly require more than just linkage analysis to identify all of the involved genes. The VGIDSM method of the present invention provides a rapid, rational way of approaching this problem.
A polygenic disease is characterized as having several abnormal genes acting concurrently to produce a pathologic phenotype. This group includes many genetic diseases often described as xe2x80x9cmultifactorial disorders.xe2x80x9d Examples include diabetes mellitus, hypertension, atherosclerosis, autoimmune disorders, and many others. For the majority of polygenic diseases, the metabolic complexities are so great that a rational basis on which candidate genes could be identified may not have existed before the invention set forth herein. In the few instances where a candidate gene has been suggested, this knowledge has still proven largely inadequate to identify susceptible individuals, or to explain pathogenesis.
The last two groups of genetic disorders described above (i.e. non-allelism and polygenism) represent the greatest challenge currently facing human and veterinary medicine. Because of an absence of sufficient biochemical and physiological data, credible candidate genes have largely gone unidentified. This absence of credible candidate genes has, in turn, ruled out the possibility of identifying susceptible individuals and attempting preventive intervention before symptoms appear. The invention set forth herein provides one way to overcome these limitations by identifying credible candidate genes.
2.2. Gene Identification by Positional Cloning
There are several known methods available to identify candidate disease genes, and to further select genes among identified candidates, which are systematically associated with a given pathology. These include various methods for differential expression analysis (e.g. differential display, serial analysis of gene expression or SAGE), and positional cloning methods. In the positional cloning approach, the initial steps are quite similar or identical; most often, it is only the final steps that differ (see e.g. Rommens et al., 1989, Science 245, 1059-1065; Duyk et al., 1990, Proc. Natl. Acad. Sci. U.S.A. 87, 8995-8999). The major drawbacks of positional cloning methods generally include: (a) the slow pace of discovery, often requiring several years for success; (b) the high complexity of the techniques involved, requiring highly-trained individuals who must pay painstaking attention to detail to get satisfactory results; (c) the labor-intensive nature of the techniques, often requiring enormous amounts of sequencing; and (d) the extreme expense associated with any slow, complex, labor-intensive effort. Positional cloning can be considered as four discrete steps which are well-known in the art. Each of these steps is briefly described below.
2.2.1. Linkage Mapping
The first step in using positional cloning for disease gene identification consists of a search for genetic linkage between a locus implicated in pathogenesis and a number of genotypic polymorphic markers. This step requires segregation analysis in affected families. Linkage mapping takes advantage of the fact that the closer two genetic loci are to each other, the smaller the chances of an independent recombination event in separating them. Therefore, the aim is to find a specific fragment of genomic DNA bordered by two known markers systematically present in all affected members of a family, but rarely present in the unaffected members. If such a genomic fragment can be identified, the pathogenic locus will be found located between the markers.
Linkage mapping presents difficulties that vary according to the mode of inheritance of a disease. In an ideal linkage map, all bearers of an abnormal gene will be identified. In the case of an autosomal dominant disease, this is only theoretically possible if: (a) all bearers show the diseased phenotype (i.e. penetrance is complete); and (b) disease manifestation is precocious. In the case of autosomal recessive disorders, it is only possible to detect the homozygotes (all affected) and the obligate heterozygotes (the parents). It is therefore essential to have access to families where there are at least two living, homozygous affected siblings when mapping an autosomal recessive disorder.
In a few lucky cases of linkage map construction and analysis, specific chromosomes can be easily ruled out as carrying the diseased gene of interest. In these rare instances, the gene search quickly becomes more focused. For example, DMD is a recessive disorder which is very rare in females. As a result, the search for the DMD gene could safely be limited to the X chromosome. However, in the majority of cases, such a simplified approach is not at all available. A case in point is CF, where it took five years of intensive effort just to identify the chromosome associated with the disease.
2.2.2. Chromosomal Localization
The genomic fragment identified in the preceding step is often very large (i.e. several million bases) and entirely unknown in terms of the number and identity of genes it encodes. Therefore, it is often essential to localize the genomic fragment to a specific chromosome in order to take advantage of other known markers which may not yet be associated with the fragment. Chromosomal localization may be carried out by utilization of polymorphic markers (e.g. microsatellites) identified on genomic DNA or large genomic fragments cloned into yeast artificial chromosomes (YACS) that have been assigned to specific human chromosomes. Chromosomal localization may also be effected by fluorescently labeling a large (e.g. 100 kilobase) identified genomic fragment for hybridization and karyotype analysis (Dauwerse et al., 1992, Hum. Mol. Genet. 1, 593-598).
2.2.3. Further Refinement
Once the identified genomic fragment has been localized to a specific chromosome, the largest possible number of polymorphic markers is used to bracket the smallest possible region (i.e. locus) encoding the gene of interest. This step can yield genomic fragments that are still very large, i.e. one-half to one million bases long. Since the average length of a gene is on the order of seventy thousand bases, such a region is very likely to encode many different genes. Furthermore, this approach does not allow one to distinguish between monogenic and polygenic disorders. If an apparent lack of genetic heterogeneity cannot be clinically determined, then the actual degree of heterogeneity must be assessed by systematic comparison of different families. In this very-frequent case, the results from each family must be analyzed separately to determine whether they are consistent with a xe2x80x9csingle locusxe2x80x9d hypothesis. This is a complex problem since genetic heterogeneity may be clinically undetectable (e.g. pituitary dwarfism, see above). Alternatively, apparent clinical heterogeneity may lead to the erroneous conclusion that different genes are involved when, in fact, different allelic forms of the same gene are involved (e.g. DMD and Becker""s myopathy, see above).
2.2.4. From Locus to Gene
Having defined a genetic locus for a disease-associated gene using the above methods, there is much work left to be done before the gene itself is ultimately identified. The identification problem encompasses two major difficulties. First, it is necessary to generate new markers for further map refinement. The new markers must be located as close as possible to, and ultimately in, the gene concerned. Second, it is necessary to demonstrate that the identified gene is actually responsible for the disease. These two tasks require the utilization, in parallel, of a wide variety of methods. Two of the most commonly followed approaches are briefly described below.
Exon trapping involves the cloning of short fragments generated from an entire identified locus into retroviral vectors which have been engineered to reveal the presence of exons (i.e. coding sequences) within a short fragment. Any positive clones (i.e. clones containing an exon) function as new markers and must next be sequenced and mapped back to the locus in order to define the relative position of each. The exon trapping approach is enormously labor-intensive in that it requires massive amounts of DNA sequencing and produces a substantial number of false positives and false negatives. Of course, the exon map generated includes exons from any gene within the locus and is not specific to exons from the disease gene of interest. Accordingly, further work is required.
Complementary DNA (cDNA) subtraction assays utilize cDNA libraries constructed from cells of an affected individual and from cells of a healthy individual. The procedure has two successive phases. In phase one, the cDNA inserts from the healthy individual are immobilized on a membrane and used to trap (subtract) the homologous cDNA inserts present in the affected individual""s library. In phase two, the procedure is inverted: i.e. the cDNA inserts from the library of the affected individual are immobilized and used to subtract homologous inserts from the healthy cDNA library. Therefore, these two phases yield cDNA fragments that are entirely unique to the affected or to the healthy individual, respectively. Any fragment homologous (similar but not identical) to a sequence present in the immobilized library remains trapped. Accordingly, this approach often results in a complete loss of the gene of interest.
Clones obtained by the exon trapping or cDNA subtraction approaches are then used for direct hybridization to: (a) yeast artificial chromosome overlapping segments (YAC contigs) covering the locus of interest; (b) mRNA preparations obtained from affected and healthy individuals; and/or (c) enriched genomic libraries obtained from the same affected and healthy individuals. Any positive hybridization signals are then further analyzed by sequencing.
At the last step in positional cloning, i.e. gene identification, one is often confronted with results that cannot precisely pinpoint the relevant gene. In this instance, the only approach remaining is to entirely sequence and analyze the smallest genomic region of the defined locus, which may still range from 300 to 700 kilobases. The problematic nature of positional cloning for disease gene identification is further highlighted below in noting a few of the realities associated with the approach.
Positional cloning projects are so labor intensive that they have been undertaken, in most instances, only by large consortia of international research groups comprising at least three laboratories per consortium. Each laboratory of such a consortium, in turn, is typically composed of five or more researchers devoting essentially all of their time and effort to the project. For example, identification of the CF gene took a total of eight years, finding the gene for polycystic kidney disease type 1 (PKD1) took six years, and finding the ataxia-telangiectasia gene took over five years. Many other examples could be recited, and many positional cloning efforts have yet to identify the target gene. Notably, these are all monogeneic diseases, i.e. only one gene is responsible for the disease and it is the same gene in all cases of the disease.
The difficulties are amplified in the context of polygenic or multifactorial disorders. Here, very little progress has been made in gene identification. For example, after over fifteen years of intensive searching by a considerable number of research teams, the genetic causes of diabetes mellitus (type I and type II) remain largely unknown. The same can be said for chronic renal failure (CRF), multiple sclerosis (MS), atherosclerosis, and many others. This list names only a few of the most prevalent polygenic or multifactorial disorders.
One of the major reasons for this state of affairs is that, in the absence of any information allowing the testing of likely candidate genes, it is necessary to first map the loci associated with the disorder to specific chromosomal regions before having a chance of isolating the genes concerned by positional cloning (see above). Of course, it would be considerably simpler to forego mapping entirely and work from mRNA transcripts of genes expressed in affected tissues. However, this approach has proven virtually impossible using past methods. This is due, at least in part, to the fact that tissues and cells express a great many genes. Furthermore, genes associated with pathologies are often expressed at very low levels. Therefore, the few relevant disease mRNA transcripts may be lost among an enormous number of other transcripts. Still further adding to the identification problem, the disease transcripts may differ widely among affected individuals. These intrinsic shortcomings of past positional and subtraction methodologies are such that very small quantities of mRNA cannot be used.
The VGIDSM method for gene identification set forth herein provides a simple solution to this enormous problem. It allows one to identify phenotype-associated genes, in monogenic as well as polygenic contexts, in a matter of weeks rather than years and at greatly reduced expense.
2.3. Mismatch Repair
DNA mismatch repair genes comprise one of several mechanisms by which high fidelity DNA replication is maintained in cells under physiologic conditions. Many investigators over the years have manipulated one or more of these genes to achieve various ends. First described in bacteria, the mismatch repair system comes into play when the product of the MutS gene recognizes and binds to a mispaired base pair (see Cox, E. C., 1997, MutS, Proofreading And Cancer, Genetics 146, 443-446). MutS works in concert with the products of the MutH and MutL genes; these three proteins together form the so-called MutHLS mismatch repair system. A recent review has provided a detailed description of this system in eukaryotes (see Kolodner, R., 1996, Biochemistry And Genetics Of Eukaryotic Mismatch Repair, Genes Dev. 10, 1433-1442).
Hereditary nonpolyposis colon cancer (HNPCC) arises from mutations in the hMSH2 gene, the human homolog of the bacterial MutS gene, as shown by two laboratories in 1993 (see Fishel, R. et al., 1993, The Human Mutator Gene Homolog MSH2 And Its Association With Hereditary Nonpolyposis Colon Cancer, Cell 75, 1027-1038; Leach, F. S. et al., 1993, Mutations Of A MutS Homolog In Hereditary Nonpolyposis Colorectal Cancer, Cell 75, 1215-1225). The human MSH2 protein also functions via binding to DNA mismatches (Fishel, R. et al., 1994, Binding Of Mismatched Microsatellite DNA Sequences By The Human MSH2 Protein, Science 266, 1403-1405; Fishel, R. et al., 1994, Purified Human MSH2 Protein Binds To DNA Containing Mismatched Nucleotides, Cancer Res. 54, 5539-5542). Another human homolog of bacterial MutS has recently been linked to cancer susceptibility (Edelman, W. et al., Nov. 14, 1997, Mutation In The Mismatch Repair Gene Msh6 Causes Cancer Susceptibility, Cell 91, 467-477).
Traditionally, manipulation of the mismatch repair system has been employed in a variety of ways. For example, a method for in vitro recombination of mismatches has been described which takes advantage of MutS-deficient E. coli (Resnick, M. A. and Radman, M., Aug. 2, 1994, System For Isolating And Producing New Genes, Gene Products And DNA Sequences, U.S. Pat. No. 5,334,522). Others have described using the MutS protein to detect DNA mismatches in vitro with antibodies (Wagner, R. E., Jr. and Radman, M., Apr. 2, 1997, Method For Detection Of Mutations, European Patent EP 0 596 028 B1). Still others have used the inability of the system to repair loops of five nucleotides or greater in vivo to design a system capable of detecting a single mismatch in a DNA fragment as large as 10 kilobases (see Faham, M. and Cox, D. R., 1997, A Novel in vivo Method To Detect DNA Sequence Variation, Genome Research 5, 474-482).
This invention provides a method for identifying a gene or allele, or several genes or alleles, underlying a phenotype-of-interest. In this regard, genes or alleles are identified as having a specified function, or as causing or contributing to the cause or pathogenesis of a specified disease, or as associated with a specific phenotype, by virtue of their selection by the method.
This invention is based, at least in part, on the recognition that comparison of a population of nucleic acid molecules with one or more other populations of nucleic acid molecules, so as to isolate genes underlying specific phenotypic traits, is greatly facilitated by first taking steps to insure internal homogenization of one or more of the populations to be compared before performing the external comparison of two or more populations. In this regard, internal homogenization is effected by a first round of hybridization and sorting of matched from mismatched DNA duplexes. Similarly, external comparison is effected by a second round of hybridization and sorting of matched from mismatched DNA duplexes, as described in detail hereinbelow.
This invention provides a method for identifying one or more genes underlying a defined phenotype comprising the following steps in the order stated: (a) removing mismatched duplex nucleic acid molecules formed from hybridization within each of two source populations of nucleic acids; and (b) retaining mismatched duplex nucleic acid molecules formed from hybridization between the two source populations, the retained molecules in step (b) comprising the one or more genes underlying the defined phenotype.
Further, this invention provides a method for identifying one or more genes underlying a defined phenotype comprising the following steps in the order stated: (a) removing mismatched duplex nucleic acid molecules formed from hybridization within a first source population of nucleic acids; and (b) retaining mismatched duplex nucleic acid molecules formed from hybridization between the first source population and a second source population of nucleic acids, the retained molecules in step (b) comprising the one or more genes underlying the defined phenotype.
Nucleic acid sample populations may be derived from many different sources. In one embodiment, the first and second source populations each are nucleic acid populations derived from at least two individuals having consanguinity. In another embodiment, the first and second source populations each are nucleic acid populations derived from more than two individuals having consanguinity. In one embodiment, the first and second source populations each are nucleic acid populations derived from two to six individuals having consanguinity. In another embodiment, the first and second source populations each are nucleic acid populations derived from three individuals having consanguinity. In still another embodiment, each source population is a cell line.
Further, nucleic acid sample populations may be manipulated in various ways so as to facilitate gene identification. In one embodiment, the source populations are normalized cDNA libraries to facilitate identification of rare transcripts. In another embodiment, the source populations are linearized cDNA libraries to facilitate hybridization. In still another embodiment, the source populations are normalized and linearized.
Still further, nucleic acid sample populations may be manipulated in various ways so as to facilitate removal of undesired cDNAs. In one embodiment, the two source populations are of DNA, the DNA of a source population is labeled, and the hybridization in step (b) is carried out using an excess of labeled DNA. In another embodiment, the excess of labeled DNA is a three-fold excess.
Genes underlying virtually any defined phenotype may be identified using the method of the invention. In a preferred embodiment, the defined phenotype is selected from the group consisting of a plant resistance phenotype, a microorganism resistance phenotype, cancer, osteoporosis, obesity, type II diabetes, and a prion-related disease. Additional examples of preferred defined phenotypes follow immediately below.
Defined plant phenotypes include but are not limited to resistance to herbicides, resistance to insect predators, resistance to fungal infections, increased yields, resistance to frost, resistance to dehydration, enhanced stem strength, and many others.
Defined microorganism phenotypes include but are not limited to susceptibility or resistance to antibiotics, detoxification of liquids, soils, solids, and/or gases contaminated by pollutants or toxic compounds (e.g. dioxin, nitrous oxides, carbon monoxide, sulfur dioxide, free radicals, and so on).
Defined animal and/or veterinary phenotypes include but are not limited to resistance to neurological disorders such as prion-related diseases, infectious disorders (e.g. porcine plague), foot-and-mouth disease, and many others.
Defined human phenotypes include but are not limited to susceptibility to cancer, autoimmune diseases, neurological disorders, metabolic disorders (e.g. diabetes, obesity), systemic diseases (e.g. osteoporosis), and many others.
This invention provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a second cDNA library is derived. The method comprises the steps of (a) hybridizing insert DNA from the first cDNA library with itself, (b) hybridizing insert DNA from the second cDNA library with itself, (c) contacting the DNA hybridized in step (a) with a first immobilized mismatch binding protein, (d) contacting the DNA hybridized in step (b) with a second immobilized mismatch binding protein, (e) separating unbound DNA from bound DNA contacted in step (c), (f) separating unbound DNA from bound DNA contacted in step (d), (g) labeling unbound DNA separated in step (f) with a label capable of binding a partner molecule or agent immobilized on a substrate, (h) hybridizing labeled DNA with unbound DNA separated in step (e), (i) contacting DNA hybridized in step (h) with a third immobilized mismatch binding protein, (j) separating unbound DNA from bound DNA contacted in step (i), (k) contacting unbound DNA separated in step (j) with the partner molecule or agent immobilized on the substrate capable of binding the label, and (l) separating unbound DNA from bound DNA contacted in step (k), which unbound DNA separated in step (l) encodes one or more identified genes underlying the defined phenotype.
Further, this invention provides a method for identifying one or more genes underlying a defined phenotype from organisms having consanguinity. The method comprises the steps of (a) hybridizing insert DNA from a first collection of cDNA libraries derived from organisms having the defined phenotype with itself, (b) contacting DNA hybridized in step (a) with a first immobilized mismatch binding protein, (c) separating unbound DNA from bound DNA contacted in step (b), (d) labeling unbound DNA separated in step (c) with a label capable of binding a partner molecule or agent immobilized on a substrate, (e) hybridizing DNA labeled in step (d) with insert DNA from a second collection of cDNA libraries derived from organisms not having the defined phenotype, (f) contacting DNA hybridized in step (e) with a second immobilized mismatch binding protein, (g) separating unbound DNA from bound DNA contacted in step (f), (h) contacting unbound DNA separated in step (g) with the partner molecule or agent immobilized on the substrate capable of binding the label, and (i) separating unbound DNA from bound DNA contacted in step (h), which unbound DNA separated in step (i) encodes identified genes underlying the defined phenotype. This paragraph sets forth a preferred embodiment in which the DNA labeled in step (d) corresponds to undesired material labeled for removal.
Still further, this invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a second cDNA library is derived. The method comprises the steps of (a) hybridizing insert DNA from the first cDNA library with itself, (b) hybridizing insert DNA from the second cDNA library with itself, (c) contacting the DNA hybridized in step (a) with a first immobilized mismatch binding protein, (d) contacting the DNA hybridized in step (b) with a second immobilized mismatch binding protein, (e) separating unbound DNA from bound DNA contacted in step (c), (f) separating unbound DNA from bound DNA contacted in step (d), (g) labeling unbound DNA separated in step (f) with a label capable of binding a partner molecule or agent immobilized on a substrate, (h) hybridizing DNA labeled in step (g) with unbound DNA separated in step (e), (i) contacting DNA hybridized in step (h) with a third immobilized mismatch binding protein, (j) separating unbound DNA from bound DNA contacted in step (i), (k) releasing bound DNA separated in step (j) from the third immobilized mismatch binding protein, (l) contacting DNA released in step (k) with the partner molecule or agent immobilized on the substrate capable of binding the label, (m) denaturing DNA contacted in step (l), and (n) separating unbound DNA from bound DNA denatured in step (m), which unbound DNA separated in step (n) encodes one or more identified alleles underlying the defined phenotype.
Yet still further, this invention provides a method for identifying one or more alleles underlying a defined phenotype from organisms having consanguinity. The method comprises the steps of (a) hybridizing insert DNA from a first collection of cDNA libraries derived from organisms having the defined phenotype with itself, (b) contacting DNA hybridized in step (a) with a first immobilized mismatch binding protein, (c) separating unbound DNA from bound DNA contacted in step (b), (d) labeling unbound DNA separated in step (c) with a label capable of binding a partner molecule or agent immobilized on a substrate, (e) hybridizing DNA labeled in step (d) with insert DNA from a second collection of cDNA libraries derived from organisms not having the defined phenotype, (f) contacting DNA hybridized in step (e) with a second immobilized mismatch binding protein, (g) separating unbound DNA from bound DNA contacted in step (f), (h) releasing bound DNA separated in step (g) from the second immobilized mismatch binding protein, (i) contacting DNA released in step (h) with the partner molecule or agent immobilized on the substrate capable of binding the label, (j) denaturing DNA contacted in step (i), and (k) separating bound DNA from unbound DNA denatured in step (j), which bound DNA separated in step (k) encodes one or more identified alleles underlying the defined phenotype.
The cDNA library collections will vary according to the specific attributes of the sample source. In one embodiment, the first and second cDNA library collections each are nucleic acid populations derived from at least two individuals having consanguinity. In another embodiment, the first and second cDNA library collections each are nucleic acid populations derived from more than two individuals having consanguinity. In one embodiment, the first and second cDNA library collections each are nucleic acid populations derived from two to six individuals having consanguinity. In another embodiment, the first and second cDNA library collections each are nucleic acid populations derived from three individuals having consanguinity.
A nucleic acid sample population may be left unlabeled or labeled with a unique label in various ways. In one embodiment, labeling is effected by polymerase chain reaction using a 5xe2x80x2-biotinylated primer. In another embodiment, labeling is effected by polymerase chain reaction using a 5xe2x80x2-peptide-labeled primer. In a preferred embodiment, labeling using a 5xe2x80x2-biotinylated primer is performed when using one unlabeled sample population and one labeled sample population. In another preferred embodiment, labeling using a 5xe2x80x2-peptide-labeled primer is performed when multiplexing, i.e. when using three or more nucleic acid sample populations.
A labeled nucleic acid sample population may be sorted in various ways. In one embodiment, the substrate for binding the biotin label is streptavidin. In another embodiment, the substrate for binding the peptide label is an antibody. In still another embodiment, the antibody is an anti-peptide antibody. In yet still another embodiment, the anti-peptide antibody is monoclonal.
A variety of wild-type and recombinant, engineered mismatch binding proteins may be used to effect sorting (i.e. binding and release) of DNA duplexes containing mismatches. In one embodiment, the mismatch binding protein is E. coli MutS. In another embodiment, the mismatch binding protein is hMSH2. In still another embodiment, the mismatch binding protein is an hMSH2-hMSH6 protein complex.
This invention provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a second cDNA library is derived. The method comprises the steps of (a) amplifying insert DNA from the first cDNA library by polymerase chain reaction, (b) amplifying insert DNA from the second cDNA library by polymerase chain reaction, (c) hybridizing DNA amplified in step (a) with itself, (d) hybridizing DNA amplified in step (b) with itself, (e) contacting DNA hybridized in step (c) with a first immobilized MutS, (f) contacting DNA hybridized in step (d) with a second immobilized MutS, (g) separating unbound DNA from bound DNA contacted in step (e), (h) separating unbound DNA from bound DNA contacted in step (f), (i) amplifying unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) amplifying and labeling unbound DNA separated in step (h) by polymerase chain reaction using 5xe2x80x2-biotinylated primers, (k) hybridizing DNA amplified and labeled in step (j) with DNA amplified in step (i), (l) contacting DNA hybridized in step (k) with a third immobilized MutS, (m) separating unbound DNA from bound DNA contacted in step (l), (n) contacting unbound DNA separated in step (m) with immobilized streptavidin, and (o) separating unbound DNA from bound DNA contacted in step (n), which unbound DNA separated in step (o) encodes one or more identified genes underlying the defined phenotype.
Further, this invention provides a method for identifying one or more genes underlying a disease phenotype from healthy and affected individuals having consanguinity. The method comprises the steps of (a) amplifying insert DNA from a first collection of cDNA libraries derived from affected individuals by polymerase chain reaction, (b) hybridizing DNA amplified in step (a) with itself, (c) contacting DNA hybridized in step (b) with a first immobilized MutS, (d) separating unbound DNA from bound DNA contacted in step (c), (e) amplifying and labeling unbound DNA separated in step (d) by polymerase chain reaction using 5xe2x80x2-biotinylated primers, (f) amplifying insert DNA from a second collection of cDNA libraries derived from healthy individuals by polymerase chain reaction, (g) hybridizing DNA amplified and labeled in step (e) with DNA amplified in step (f), (h) contacting DNA hybridized in step (g) with a second immobilized MutS, (i) separating unbound DNA from bound DNA contacted in step (h), (j) contacting unbound DNA separated in step (i) with immobilized streptavidin, and (k) separating unbound DNA from bound DNA contacted in step (j), which unbound DNA separated in step (k) encodes one or more identified genes underlying the disease phenotype.
Still further, this invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a second cDNA library is derived. The method comprises the steps of (a) amplifying insert DNA from the first cDNA library by polymerase chain reaction, (b) amplifying insert DNA from the second cDNA library by polymerase chain reaction, (c) hybridizing DNA amplified in step (a) with itself, (d) hybridizing DNA amplified in step (b) with itself, (e) contacting DNA hybridized in step (c) with a first immobilized MutS, (f) contacting DNA hybridized in step (d) with a second immobilized MutS, (g) separating unbound DNA from bound DNA contacted in step (e), (h) separating unbound DNA from bound DNA contacted in step (f), (i) amplifying unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) amplifying and labeling unbound DNA separated in step (h) by polymerase chain reaction using 5xe2x80x2-biotinylated primers, (k) hybridizing DNA amplified and labeled in step (j) with DNA amplified in step (i), (l) contacting DNA hybridized in step (k) with a third immobilized MutS, (m) separating unbound DNA from bound DNA contacted in step (l), (n) releasing bound DNA separated in step (m) from the third immobilized MutS, (o) contacting DNA released in step (n) with immobilized streptavidin, (p) denaturing DNA contacted in step (o), and (q) separating unbound DNA from bound DNA denatured in step (p), which unbound DNA separated in step (q) encodes one or more identified alleles underlying the defined phenotype. In one embodiment, releasing bound DNA from the third immobilized MutS in step (n) is carried out using ATP or proteinase K.
Yet still further, this invention provides a method for identifying one or more affected alleles underlying a disease phenotype from healthy and affected individuals having consanguinity. The method comprises the steps of (a) amplifying insert DNA from a first collection of cDNA libraries derived from affected individuals by polymerase chain reaction, (b) hybridizing DNA amplified in step (a) with itself, (c) contacting DNA hybridized in step (b) with a first immobilized MutS, (d) separating unbound DNA from bound DNA contacted in step (c), (e) amplifying and labeling unbound DNA separated in step (d) by polymerase chain reaction using 5xe2x80x2-biotinylated primers, (f) amplifying insert DNA from a second collection of cDNA libraries derived from healthy individuals by polymerase chain reaction, (g) hybridizing DNA amplified and labeled in step (e) with DNA amplified in step (f), (h) contacting DNA hybridized in step (g) with a second immobilized MutS, (i) separating unbound DNA from bound DNA contacted in step (h), (j) releasing bound DNA separated in step (i) from the second immobilized MutS, (k) contacting DNA released in step (j) with immobilized streptavidin, (l) denaturing DNA contacted in step (k), and (m) separating bound DNA from unbound DNA denatured in step (l), which bound DNA separated in step (m) encodes one or more identified affected alleles underlying the disease phenotype. In one embodiment, releasing bound DNA from the second immobilized MutS in step (j) is carried out using ATP or proteinase K.
Yet still further, this invention provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) hybridizing insert DNA from each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized mismatch binding protein, (c) separating unbound DNA from bound DNA contacted individually in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a different label capable of binding a partner molecule immobilized on a substrate, (e) hybridizing DNA separately labeled in step (d), (f) contacting DNA hybridized in step (e) with an immobilized mismatch binding protein, and (g) separating unbound DNA from bound DNA contacted in step (f).
Still further, this invention provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) amplifying insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing each separate population of DNA amplified in step (a) with itself, (c) contacting each separate population of DNA hybridized in step (b) individually with immobilized MutS, (d) separating unbound DNA from bound DNA contacted in step (c), (e) labeling each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a distinct 5xe2x80x2-peptide-labeled primer capable of binding a partner molecule immobilized on a substrate, (f) hybridizing DNA labeled in step (e), (g) contacting DNA hybridized in step (f) with immobilized MutS, and (h) separating unbound DNA from bound DNA contacted in step (g).
Further, this invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) hybridizing insert DNA from each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized mismatch binding protein, (c) separating unbound DNA from bound DNA contacted in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a distinct label capable of binding a partner molecule immobilized on a substrate, (e) hybridizing DNA labeled in step (d), (f) contacting DNA hybridized in step (e) with an immobilized mismatch binding protein, and (g) separating unbound DNA from bound DNA contacted in step (f).
Still further, this invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) amplifying insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing DNA amplified from each library in step (a) with itself, (c) contacting DNA from each library hybridized in step (b) individually with an immobilized mismatch binding protein, (d) separating unbound DNA from bound DNA contacted in step (c), (e) amplifying and labeling each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a distinct 5xe2x80x2-peptide-labeled primer, (f) hybridizing DNA amplified and labeled in step (e), (g) contacting DNA hybridized in step (f) with an immobilized mismatch binding protein, (h) separating unbound DNA from bound DNA contacted in step (g), (i) releasing bound DNA separated in step (h), and (j) separating DNA released in step (i) into single strands.
Still further, this invention provides a method for identifying one or more alleles underlying a defined phenotype comprising the following steps in the order stated: (a) removing mismatched duplex nucleic acid molecules formed from hybridization within each of a plurality of source populations of nucleic acids; (b) retaining mismatched duplex nucleic acid molecules formed from hybridization among the plurality of source populations; (c) separating mismatched strands retained in step (b), which separated strands comprise one or more alleles underlying the defined phenotype.
This invention provides a method for identifying one or more genes underlying a defined phenotype. The method comprises the steps of (a) removing mismatched duplex nucleic acid molecules formed from hybridization within each of a plurality of source populations of nucleic acids, and (b) retaining mismatched duplex nucleic acid molecules formed from hybridization among the plurality of source populations, the retained molecules in step (b) comprising the one or more genes underlying the defined phenotype. In one embodiment, the plurality of source populations comprises at least one normalized cDNA library. In another embodiment, the plurality of source populations comprises at least one linearized cDNA library. In yet another embodiment, the plurality of source populations consists of DNA, the DNA of each of the source populations being labeled with a different label, and the hybridization in step (b) is carried out using an excess of labeled DNA from one or more source populations. In one embodiment, the excess of labeled DNA is a three-fold excess. Yet in another embodiment, each of the source populations is derived from a cell line.
This invention also provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) hybridizing insert DNA from the first cDNA library with itself, (b) hybridizing insert DNA from each library of the plurality of additional cDNA libraries with itself, (c) contacting the DNA hybridized in step (a) with an immobilized mismatch binding protein, (d) contacting each separate population of DNAs hybridized in step (b) individually with an immobilized mismatch binding protein, (e) separating unbound DNA from bound DNA contacted in step (c), (f) separating unbound DNA from bound DNA contacted individually in step (d), (g) labeling each separate population of the unbound DNA separated in step (f) with a distinguishable label capable of binding a partner molecule immobilized on a substrate, (h) hybridizing DNA separately labeled in step (g) with unbound DNA separated in step (e), (i) contacting DNA hybridized in step (h) with an immobilized mismatch binding protein, (j) separating unbound DNA from bound DNA contacted in step (i), (k) contacting unbound DNA separated in step (j) with the partner molecule of each different label, and (l) separating unbound DNA from bound DNA contacted in step (k), which unbound DNA separated in step (l) encodes one or more identified genes underlying the defined phenotype. In one embodiment, one or more of the cDNA libraries is normalized. In another embodiment, one or more of the cDNA libraries is linearized. In yet another embodiment, labeling is carried out by polymerase chain reaction using a 5xe2x80x2-peptide labeled primer. In yet another embodiment, at least one partner molecule immobilized is an antibody. In still another embodiment, the antibody is an anti-peptide antibody. In yet another embodiment, the hybridization in step (h) is carried out using an excess of labeled DNA. In yet another embodiment, the excess of labeled DNA is a three-fold excess. In yet another embodiment, an immobilized mismatch binding protein is MutS. In one embodiment, the defined phenotype is selected from the group consisting of a plant phenotype, a microorganism phenotype, and a pathologic phenotype. In another embodiment, the defined phenotype is a pathologic phenotype that is selected from the group consisting of cancer, osteoporosis, obesity, type II diabetes, and a prion-related disease.
This invention further provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) amplifying insert DNA from the first cDNA library by polymerase chain reaction, (b) amplifying insert DNA from each of the plurality of additional cDNA libraries by polymerase chain reaction, (c) hybridizing DNA amplified in step (a) with itself, (d) hybridizing each separate population of DNA amplified in step (b) with itself, (e) contacting DNA hybridized in step (c) with immobilized MutS, (f) contacting each separate population of DNA hybridized in step (d) individually with immobilized MutS, (g) separating unbound DNA from bound DNA contacted in step (e), (h) separating unbound DNA from bound DNA contacted in step (f), (i) labeling unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) labeling each separate population of unbound DNA separated in step (h) by polymerase chain reaction using a primer having a distinguishable 5xe2x80x2-peptide-label capable of binding a partner molecule immobilized on a substrate, (k) hybridizing DNA labeled in step (i) with DNA labeled in step (j), (l) contacting DNA hybridized in step (k) with immobilized MutS, (m) separating unbound DNA from bound DNA contacted in step (l), (n) contacting unbound DNA separated in step (m) with one or more partner molecules capable of binding the distinguishable 5xe2x80x2-peptide-labeled primers, and (o) separating unbound DNA from bound DNA contacted in step (n), which unbound DNA separated in step (o) encodes one or more identified genes underlying the defined phenotype.
This invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) hybridizing insert DNA from the first cDNA library with itself, (b) hybridizing insert DNA from each of the plurality of additional cDNA libraries with itself, (c) contacting DNA hybridized in step (a) with an immobilized mismatch binding protein, (d) contacting each separate population of DNA hybridized in step (b) individually with an immobilized mismatch binding protein, (e) separating unbound DNA from bound DNA contacted in step (c), (f) separating unbound DNA from bound DNA contacted in step (d), (g) labeling each separate population of unbound DNA separated in step (f) with a distinguishable label capable of binding a partner molecule immobilized on a substrate, (h) hybridizing DNA labeled in step (g) with unbound DNA separated in step (e), (i) contacting DNA hybridized in step (h) with an immobilized mismatch binding protein, (j) separating unbound DNA from bound DNA contacted in step (i), (k) releasing bound DNA separated in step (j) from the immobilized mismatch binding protein, (l) contacting DNA released in step (k) with one or more partner molecules capable of binding the distinct labels, (m) denaturing DNA contacted in step (l), and (n) separating unbound DNA from bound DNA denatured in step (m), which unbound DNA separated in step (n) encodes one or more identified alleles underlying the defined phenotype. In one embodiment, at least one cDNA library is normalized. In another embodiment, at least one cDNA library is linearized. In one embodiment, labeling is carried out by polymerase chain reaction using 5xe2x80x2-peptide labeled primers. In another embodiment, at least one immobilized partner molecule is an antibody. In another embodiment, the antibody is an anti-peptide antibody. In another embodiment, the hybridization in step (h) is carried out using an excess of labeled DNA. In another embodiment, the excess of labeled DNA is a three-fold excess. In another embodiment, at least one of the immobilized mismatch binding proteins is MutS.
This invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) amplifying insert DNA from the first cDNA library by polymerase chain reaction, (b) amplifying insert DNA from each of the plurality of additional cDNA libraries by polymerase chain reaction, (c) hybridizing DNA amplified in step (a) with itself, (d) hybridizing DNA amplified from each library in step (b) with itself, (e) contacting DNA hybridized in step (c) with immobilized MutS, (f) contacting each population of DNA hybridized in step (d) individually with immobilized MutS, (g) separating unbound DNA from bound DNA contacted in step (e), (h) separating unbound DNA from bound DNA contacted in step (f), (i) amplifying unbound DNA separated in step (g) by polymerase chain reaction using unlabeled primers, (j) amplifying and labeling each population of unbound DNA separated in step (h) by polymerase chain reaction using a distinguishable 5xe2x80x2-peptide-labeled primer, (k) hybridizing DNA amplified and labeled in step (j) with DNA amplified in step (i), (l) contacting DNA hybridized in step (k) with immobilized MutS, (m) separating unbound DNA from bound DNA contacted in step (l), (n) releasing bound DNA separated in step (m) from immobilized MutS, (o) contacting DNA released in step (n) with one or more immobilized antibodies specific for each distinguishable 5xe2x80x2-peptide-labeled primer, (p) denaturing DNA contacted in step (o), and (q) separating unbound DNA from bound DNA denatured in step (p), which unbound DNA separated in step (q) encodes one or more identified alleles underlying the defined phenotype. In one embodiment, releasing bound DNA from immobilized MutS in step (n) is carried out using ATP or proteinase K. In another embodiment, the. method further comprises a step of using the one or more genes or alleles identified to carry out a prognosis or a diagnosis. In one embodiment, the one or more genes or alleles identified, or an encoded protein thereof, is a target for drug intervention. In another embodiment, the plurality of source populations is in the range of three to twelve source populations. In yet another embodiment, the plurality of source populations is in the range of three to six source populations. In another embodiment, the plurality of source populations consists of four source populations.
This invention provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) hybridizing insert DNA from each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized mismatch binding protein, (c) separating unbound DNA from bound DNA contacted individually in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a distinguishable label capable of binding a partner molecule immobilized on a substrate, (e) hybridizing DNA separately labeled in step (d), (f) contacting DNA hybridized in step (e) with an immobilized mismatch binding protein, and (g) separating unbound DNA from bound DNA contacted in step (f).
This invention provides a method for identifying one or more genes underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) amplifying insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing each separate population of DNA amplified in step (a) with itself, (c) contacting each separate population of DNA hybridized in step (b) individually with immobilized MutS, (d) separating unbound DNA from bound DNA contacted in step (c), (e) labeling each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a primer having a distinguishable 5xe2x80x2-peptide-label capable of binding a partner molecule immobilized on a substrate, (f) hybridizing DNA labeled in step (e), (g) contacting DNA hybridized in step (f) with immobilized MutS, and (h) separating unbound DNA from bound DNA contacted in step (g).
This invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) hybridizing insert DNA from each cDNA library with itself, (b) contacting each separate population of DNA hybridized in step (a) individually with an immobilized mismatch binding protein, (c) separating unbound DNA from bound DNA contacted in step (b), (d) labeling each separate population of unbound DNA separated in step (c) with a distinguishable label capable of binding a partner molecule immobilized on a substrate, (e) hybridizing DNA labeled in step (d), (f) contacting DNA hybridized in step (e) with an immobilized mismatch binding protein, and (g) separating unbound DNA from bound DNA contacted in step (f).
This invention provides a method for identifying one or more alleles underlying a defined phenotype displayed by a cell or individual from which a first cDNA library is derived, but not displayed by a cell or individual from which a plurality of additional cDNA libraries is derived. The method comprises the steps of (a) amplifying insert DNA from each cDNA library by polymerase chain reaction, (b) hybridizing DNA amplified from each library in step (a) with itself, (c) contacting DNA from each library hybridized in step (b) individually with an immobilized mismatch binding protein, (d) separating unbound DNA from bound DNA contacted in step (c), (e) amplifying and labeling each separate population of unbound DNA separated in step (d) by polymerase chain reaction using a distinct 5xe2x80x2-peptide-labeled primer, (f) hybridizing DNA amplified and labeled in step (e), (g) contacting DNA hybridized in step (f) with an immobilized mismatch binding protein, (h) separating unbound DNA from bound DNA contacted in step (g), (i) releasing bound DNA separated in step (h), and (j) separating DNA released in step (i) into single strands.
This invention provides a method for identifying one or more alleles underlying a defined phenotype. The method comprises the steps of (a) removing mismatched duplex nucleic acid molecules formed from hybridization within each of a plurality of source populations of nucleic acids, (b) retaining mismatched duplex nucleic acid molecules formed from hybridization among the plurality of source populations, and (c) separating mismatched strands retained in step (b), which separated strands comprise one or more alleles underlying the defined phenotype.