The present invention relates to the field of biotechnology and describes methods of identification and cloning of nucleic acid differences between polynucleotides from different sources, origins, environments or different physiological situations.
The nucleotide sequence of a given gene may be different between individuals within a single species, between cells within a single individual, between both chromosomes within the same cell. Such differences may result from genetic variation or environmental change in DNA by insertions, deletions, point mutations, or by acquiring foreign DNA or RNA by means of infection by bacteria, molds, fungi and viruses. For example, acquisition by pathogens of a sudden resistance to a given drug may be caused by the deletion or to an acquisition of a new sequence in the genome. Alternatively, pathogenesis may result from insertion or deletions of genomic regions. For instance, the fragile X syndrome, the most common cause of inherited mental retardation, is partly due to an insertion of multiple CGG trinucleotides in the 5xe2x80x2 untranslated region of the fragile X mRNA resulting in the inhibition of protein synthesis via ribosome stalling (Feng et al., Science 268:731-4, 1995). Alterations in nucleotide sequences can have profound effects on cells. For example, many tumors and many genetic diseases result from alteration, or mutation, of particular nucleotide sequences. Mutations in nucleotide sequences that encode proteins can result in production of proteins with altered polypeptide sequences and, in some instances, altered biological activities. Changes in the activity of a single protein can sometimes have profound effects on the physiology of an entire organism.
In order to develop effective preventive, diagnostic and therapeutic methods for treatment of cancer and hereditary diseases, we must first identify the genetic mutations that contribute to disease development. Typically, mutations are identified in studies of cloned genes whose normal sequences are already known (see, for example, Suzanne et al., Science 244:217, 1989; Kerem et al., Science 245:1073, 1989). That is, a gene is first identified as being associated with a disorder, and particular sequence changes that correlate with the diseased state are subsequently identified.
In addition to variations on genomic DNA, variation of nucleotide sequence may also occur between the different messenger RNA molecules transcribed from a single gene. Indeed, the pre-mRNAs of some genes may be spliced in various ways to produce different mRNAs, thus leading to the synthesis of protein isoforms that may exhibit different functions. Such alternative splicing may depend on the cell type, the stage of development, or the chemical or physical environment of the cell. Alternative splicing of pre-mRNAs is a powerful and versatile regulatory mechanism that can affect quantitative control of gene expression and lead to functional diversification of proteins.
The prevalence of alternative splicing as a mechanism for regulation of gene expression makes it a very likely target for alterations leading to human disease. The splicing machinery can be altered in several circumstances. For example, a gene mutation can disturb the splicing profile by inactivating physiological splicing sites or uncovering cryptic splicing sites. More particularly, genetic point mutations could alter or eliminate the splice junctions and prevent normal splicing yielding either aberrantly truncated transcripts or transcripts containing an exon which is normally deleted and/or missing another exon which is normally present.
Multiple examples of splicing alterations are associated with diseases or related disorders. Indeed, 15% of the gene mutations associated with diseases alter the process of RNA splicing. Many cancer-associated genes are alternatively spliced and their expression leads to the production of multiple splice variants (Mercatante and Kole, Pharmacol Ther 2000, 85:237-43). Although the functions of most of these variants are not well-defined, some have antagonistic activities related to regulated cell death mechanisms. In a number of cancers and cancer cell lines, the ratio of splice variants is frequently shifted so that the anti-apoptotic splice variant predominates. Therefore, characterization of these splice variants can lead to the identification of new therapeutic targets and the design of new drugs and new means of diagnosis.
A variety of techniques have been used to identify sequence variations in nucleic acids. For example, Restriction Fragment Length Polymorphism (RFLP) analysis detects restriction sites generated by mutations or alterations in nucleotide sequences (see Kan et al., Lancet ii:910, 1978); Denaturing Gradient Gel Electrophoresis and Single Stranded DNA Electrophoretic Mobility Studies identify nucleotide sequence differences through alterations in the mobility of bands in electrophoresis gels (see Myers et al., Nature 313:495, 1985; Orita et al., Proc. Natl. Acad. Sci. USA 86:2766, 1989); Chemical Cleavage analysis identifies mismatched sites in heteroduplex DNA (see Cotton, Proc. Natl. Acad. Sci. USA 85:4397, 1988); and RNase Cleavage analysis identifies mismatched sites in RNA-DNA or RNA-RNA heteroduplexes (see Myers et al., Science 230:1242, 1985; Maniatis et al. U.S. Pat. No. 4,946,773).
A significant problem with each of the above-described methods for identifying nucleic acid sequence differences is that prior knowledge of the gene of interest is generally required.
Three methods have been recently developed to detect and eventually subsequently identify nucleic acid differences without prior knowledge of the gene presenting such difference. These methods rely on the fact that complementary strands of related polynucleotides will be able to anneal to each other forming double stranded molecules except for the nucleic acid difference, thus forming heteroduplexes. If the difference consists in a single nucleotide difference or a small insertion or deletion, a mismatched duplex is formed. If the difference comprises a large nucleotide region, a duplex with an internal single stranded region is formed.
The WO 99/36575 patent application, which disclosure is hereby incorporated by reference in its entirety, discloses methods in which mismatched duplex nucleic acid molecules formed from hybridization within two source populations of nucleic acids are isolated from the rest of the sample using an enzyme able to bind to the mismatched duplex, such as MutS. However, this technique does not apply to heteroduplexes containing internal single stranded regions larger than mismatched regions of a few nucleotides.
The U.S. Pat. No. 5,922,535 patent, which disclosure is hereby incorporated by reference in its entirety, discloses a method in which nucleic acid strands from different populations are hybridized with one another so that heteroduplexes are formed. Then, those heteroduplexes are cleaved in a heteroduplex-dependent fashion and cleavage products are isolated and used to identify the genetic sequence that differ in the nucleic acid populations. The WO 99/46043 patent application, which disclosure is hereby incorporated by reference in its entirety, discloses methods in which internal loops of heteroduplexes are retrieved by digestion of double stranded regions of such heteroduplexes. However, these last two methods does not allow to isolate directly full-length polynucleotides containing nucleic acid differences but only fragments thereof.
The present invention discloses methods to isolate related polynucleotides harboring nucleic acid differences, or fragment thereof, including regions surrounding said nucleic acid differences, wherein said nucleic acid difference consists in insertions or deletions, or replacement of large regions of nucleotides. Such methods are particularly interesting to isolate genomic insertions/or deletions, alternative splicing events and sequence extension repeats.
One of the advantage of these techniques is to isolate not only the nucleic acid differences but also the flanking sequences and even the full length polynucleotides harboring said nucleic acids differences. Such full-length polynucleotide are then available for several applications, for example for cloning and/or sequencing.
The invention relates to methods of isolation of related polynucleotides harboring nucleic acid differences in a polynucleotide sample, said method comprising the selection of heteroduplexes containing at least one internal single stranded region (herein referred to as ISSRHs) with a single stranded trap (herein referred to as SST), wherein said ISSRHs are formed between said related polynucleotides and wherein said internal single stranded regions represent said nucleic acid differences.
In an embodiment of the present invention, said single-stranded trap involves the use of a Recognition Element (RE) having a preferential affinity for single-stranded polynucleotides compared to double stranded polynucleotides. In a preferred embodiment of the present invention, said single-stranded trap involves the use of a Recognition Element (RE) having a preferential affinity for single-stranded DNA compared to double stranded DNA. In a more preferred embodiment, said RE has a preferential affinity for DNA compared to RNA. In a further preferred embodiment, said RE has a preferential affinity for single stranded DNA compared to double stranded DNA and to single stranded RNA under conditions used to select single stranded DNA.
In another preferred embodiment, said RE is an antibody. In another preferred embodiment, said RE is a peptide. In still another preferred embodiment, said RE is a protein. Even more preferably, said RE is a single strand binding protein (SSB). Even more preferably, said RE is selected from the group consisting of the E. coli. SSB, the product of gene 32 of phage T4, the adenovirus DBP and the calf thymus UP1. Even more preferably, said RE is the E. coli. SSB. In still another preferred embodiment, said RE is a material selected from the group consisting of benzoylated-naphthoylated-DEAE-cellulose (BNDC), methylated albumin on bentonite (MAB) and methylated albumin on Kieselgur (MAK). More preferably, said RE is BNDC.
In one embodiment, said polynucleotide sample contains single-stranded polynucleotides. Preferably, said single stranded polynucleotides comprises both (+) strands and (xe2x88x92) strands. In another embodiment, said polynucleotide sample contains double-stranded polynucleotides. In an additional embodiment, said polynucleotide sample contains both single-stranded and double-stranded molecules.
In one embodiment, said polynucleotide sample contains DNA. In a preferred embodiment, said polynucleotide sample contains cDNA. In another preferred embodiment, said polynucleotide sample contains genomic DNA. In another embodiment, said polynucleotide sample contains RNA, preferably mRNA. In still another embodiment, said polynucleotide sample contains both DNA and RNA, preferably cDNA and mRNA.
In one embodiment, said polynucleotide sample comprises polynucleotides from a single source or a single environment or a single physiological condition. In another embodiment, said polynucleotide sample comprises a mixture of polynucleotides from samples coming from at least two different sources, environments or physiological conditions.
In one embodiment, said polynucleotide sample comprises polynucleotides derived from a single gene or limited set of genes. In a preferred embodiment, said polynucleotide sample comprises cDNA or mRNA derived from a single gene or limited set of genes. In another embodiment, the polynucleotide sample comprises a complex polynucleotide mixture. In a preferred embodiment, the polynucleotide mixture comprises a cDNA collection, an mRNA collection or both a cDNA and mRNA collection.
More particularly, the invention relates to a method of isolation of related polynucleotides harboring nucleic acid differences in a polynucleotide sample, said method comprising the following steps:
obtaining a polynucleotide sample containing said related polynucleotides;
annealing polynucleotides present in said sample to allow the formation of ISSRHs between said related polynucleotides; and
selecting said ISSRHs using a single-stranded trap.
Optionally, said method comprises an additional step of reducing the size of polynucleotides, preferably by fragmentation, more preferably to a size suitable for single pass DNA sequencing. Preferably the reduction step is performed before step (c), more preferably before step (b).
Optionally, said method comprises an additional step of denaturing said polynucleotides in said sample before the annealing step (b).
Optionally, said method comprises an additional step of removing single-stranded regions other than internal single-stranded regions on ISSRHs, wherein said additional step occurs before step (c).
Optionally, said method comprises an additional step of blunting polynucleotides obtained after step (b), wherein said additional step preferably occurs before step (c), more preferably after the cleaning step.
Optionally, the method comprises an additional step of ligating an oligonucleotide adapter to polynucleotide ends. Preferably, said method comprises an additional step of ligating an oligonucleotide adapter to the ends of polynucleotides after step (b). More preferably, said ligation step is performed after said cleaning step, after said blunting step, or after said cleaning and blunting steps. Optionally, said method comprises an additional step of removing totally or partially adapters from the ends of polynucleotides, preferably after the amplification step, more preferably after the amplification step and before either the cloning step or another cycle of isolation of related polynucleotides containing nucleic acid differences.
Optionally, said method comprises an additional step of amplifying ISSRHs selected by said single stranded trap, preferably using polymerase chain reaction (PCR).
Optionally, said isolation method may be repeated several times, preferably 1, 2, 3 or 5 times.
Optionally, said isolation method comprises a final step of cloning said isolated polynucleotides.
Optionally, said isolation method comprises a final step of identifying said nucleic acid differences of said isolated polynucleotides, preferably using DNA sequencing.
In one embodiment, the invention concerns a method of isolation of related DNA molecules harboring nucleic acid differences in a DNA sample, said method comprising the following steps:
obtaining a DNA sample containing said related polynucleotides;
a) denaturating DNA molecules in said sample;
annealing said denatured DNA molecules to allow the formation of ISSRHs between said related DNA molecules; and
removing single stranded regions other than internal single stranded regions of ISSRHs;
b) selecting said ISSRHs using a single-stranded trap; and amplifying, using PCR, said ISSRHs selected by said single-stranded trap.
Optionally, said method comprises an additional step of reducing the size of DNA molecules, preferably by fragmentation, more preferably to a size suitable for single pass DNA sequencing. Preferably the reduction step is performed before step (e), more preferably before step (b). Optionally, said method comprises an additional step of blunting polynucleotides obtained after step (c) and before step (e).
In another embodiment, the invention concerns a method of isolation of related DNA molecules harboring nucleic acid differences in DNA sample, said method comprising the following steps:
obtaining a DNA sample containing said related DNA molecules;
a) denaturating DNA molecules in said sample;
annealing said denatured DNA molecules to allow the formation of ISSRHs between said related DNA molecules; and
removing single stranded regions other than internal single stranded regions of ISSRHs;
ligating adapters to the ends of said ISSRHs;
b) selecting said ISSRHs using a single-stranded trap; and
amplifying, using PCR, said ISSRHs selected by said single-stranded trap.
Optionally, said method comprises an additional step of reducing the size of DNA molecules, preferably by fragmentation, more preferably to a size suitable for single pass DNA sequencing. Preferably the reduction step is performed before step (f), more preferably before step (b). Optionally, said method comprises an additional step of blunting polynucleotides obtained after step (c) and before step (e). Optionally, said method comprises an additional step of removing said adapters totally or partially from the ends of said amplified ISSRHs.
In a preferred embodiment, selection of said ISSRHs in any of the methods of the invention comprises the following steps:
i) mixing said sample with said RE under condition to allow the binding of said internal single stranded regions within said ISSRHs to said RE and subsequent formation of internal single stranded region containing heteroduplex-recognition element (ISSRH-RE) complexes; and
ii) separating said ISSRH-RE complexes from said sample. Alternatively, said single stranded trap comprises the following steps:
i) immobilizing said RE;
ii) bringing said immobilized RE into contact with said annealed sample to allow the binding of said internal single stranded regions within said ISSRH to said RE and subsequent formation of internal single stranded region containing heteroduplex-recognition element (ISSRH-RE) complexes; and
iii) removing the unbound polynucleotides.
Optionally, any selection method of the invention may comprises the additional step of recovering said related polynucleotides from said ISSRH-RE complexes.
More particularly, the invention relates to a method to isolate polynucleotides subjected to alternative splicing, comprising the steps of:
a) obtaining a double stranded cDNA sample containing splicing isoforms;
b) denaturing said cDNA to obtain single stranded cDNA;
c) annealing said single stranded cDNAs under conditions allowing the formation of ISSRHs between single stranded cDNAs from different splicing isoforms, wherein an internal single stranded region comprises said alternative splicing event;
d) removing single stranded regions other than internal single stranded regions of said ISSRHs;
e) ligating an adapter to the ends of blunted cDNAs;
f) selecting said ISSRHs with a SST; and
g) amplifying said selected cDNAs.
Optionally, said method comprises an additional step of blunting polynucleotides obtained after step (c) and before step (e). Optionally, said method comprises an additional step of reduction, wherein the size of polynucleotides is reduced, preferably by fragmentation. Preferably the reduction step is performed before step (c), more preferably before step (b).
In one embodiment, said cDNA sample comprises polynucleotides from a single source, a single environment or a single physiological condition. In another embodiment, said cDNA sample comprises a mixture of polynucleotides from samples coming from at least two different sources, environments or physiological conditions.
In one embodiment, said cDNA sample comprises cDNA derived from a single gene or limited set of genes. In another embodiment, the cDNA sample comprises a complex polynucleotide mixture. In a preferred embodiment, the cDNA mixture comprises a cDNA collection, an mRNA collection or both a cDNA and mRNA collection.
The invention encompasses ISSRH-REs obtainable by any method of the invention. The invention also encompasses ISSRH-REs obtained by any method of the invention.
The invention also encompasses libraries obtained using any of the methods of the invention. Preferably, said library is enriched in related polynucleotides harboring at least one nucleic acid difference. More preferably, said library is enriched in alternative splicing isoforms or alternative splicing events.
The invention encompasses any polynucleotides isolated, or fragments thereof, using any method of the invention. Preferably, said isolated polynucleotides are polynucleotides harboring a nucleic acid difference. In one embodiment, said isolated polynucleotides derive from the same gene by alternative splicing. In a preferred embodiment, said isolated polynucleotides differ by the presence of at least one exon or part of an exon in one polynucleotide compared to the other. In another preferred embodiment, said isolated polynucleotides differ by the replacement of one exon in one polynucleotide by a different exon in the other polynucleotide. In another embodiment, said isolated polynucleotides differ by the insertion, deletion or replacement of a nucleotide sequence on one gene compared to an allelic variant of the same gene.
The invention also encompasses polynucleotides able to hybridize, preferably specifically, to a polynucleotide isolated using any method of the invention, preferably under stringent conditions. Preferably said polynucleotides is able to hybridize, preferably specifically, to a nucleic acid difference isolated using any method of the invention, preferably under stringent conditions.
In one embodiment, said nucleic acid difference comprises an insertion, deletion, or replacement of at least 6, 8, 10, 12, 15, 18, 20, 25, 50, 75, 100, 150, 200, 300, 500, 1000, 1500, 2000, 3000, 5000, 10000 or 50000 nucleotides. Preferably, said nucleic acid difference comprises an insertion, deletion, or replacement of 10, 12, 15, 18, 20, 25, 50, 75, 100, 150, 200, 300, 500, 1000, 1500, 3000 or 5000 nucleotides. More preferably, said nucleic acid difference comprises an insertion, deletion, or replacement of 12, 15, 18, 20, 25, 50, 75, 100, 150, 200, 300, or 500 nucleotides. Even more preferably, said nucleic acid difference comprises an insertion, deletion, or replacement of 15, 18, 20, 25, 50, 75, 100, or 150 nucleotides.
The invention also encompasses all oligonucleotides, preferably primers and probes, that may be designed to detect a nucleic acid difference using a polynucleotides isolated by any method of the invention.