To better understand the molecular basis of the phenotypic variability and hereditary disorders of a species, the first requirement is to identify the genes which are responsible for this variability. Current techniques to link specific phenotypes to specific chromosomal regions are very powerful, but the methods to identify genes within these regions are laborious and inefficient. Two basic methods to identify genes in cloned DNA can be distinguished, those dependent of the expression of the gene in question and those independent of it.
In the first mentioned type of gene identification methods, hybridization-based techniques are used to isolate RNA or cDNA which is homologous to the cloned genomic DNA of interest. Such techniques require adequate levels of expression, and thus knowledge of the tissue expressing the gene in question. The methods result in the isolation of any sequence hybridizing to the input DNA. Consequently, the identification of transcribed sequences is indirect and based on homology which may be incomplete, yielding related genes which in reality map elsewhere in the genome. Thus the obtained sequences have to be matched to the input sequence to prove their identity.
In the second type of methods, the cloned DNA is studied directly for its coding potential. The DNA can be sequenced and analyzed by computer for sequences predicting the presence of genes, e.g. CpG-islands, open reading frames, potential coding regions, splice donor and acceptor sites (delineating exons), promoter/5'-first exons, polyadenylation signal/3'-terminal exons, etc. However, the computer only calculates a likelihood and any `gene` thus identified requires direct experimental proof. Alternatively, the region of interest can be cloned and tested for the presence of sequences which, given a suitable in vivo or in vitro splicing system, can be incorporated as exons in the processed mature RNA. This experimental protocol has been named `exon trapping`.
Different variations of exon trapping have been described.sup.1-8. In general, cloned genomic fragments (e.g. total cosmid DNA) are subcloned in a special vector, between a splice donor (SD) and splice acceptor (SA) site. Individual subclones are picked, the DNA is isolated and introduced into a host cell (e.g. COS-cells). During propagation of the host cells, the introduced DNA is replicated and transcribed into RNA. After several days, total cellular RNA is isolated and vector-derived transcripts are amplified by RNA-PCR using vector-specific primers. Exons present in the cloned DNA will have been spliced between the known vector-derived exons and are thus `trapped` in the RNA-PCR products. In a specific variation of exon trapping, designated 3'-exon trapping, the 3'-terminal exon of a gene is specifically isolated based on its ability to provide both a splice acceptor site and a polyadenylation signal.sup.6,7.
At present, all these exon trapping variants suffer from major limitations, especially when complex sources of input DNA are used. In such cases, exon trapping becomes a laborious and insensitive technique which limits its use considerably. These limitations include:
a. The original clones, containing large segments of genomic DNA, need to be further subcloned in plasmid (or retroviral) vectors. To accommodate the insert capacity of these vectors, the inserts typically measure 2 kb or less. This step causes the genes to be fragmented into many separate and disconnected bits. Consequently, after trapping, individual exons rarely constitute a complete or even a substantially complete set of all the exons. Furthermore, any exons thus obtained have to be aligned to reconstruct their original order. This process is slow and inefficient and implies a major loss of information over the input material which contained the exons in the right order indeed. PA1 b. Due to the small insert sizes after subcloning, most clones in the originally described systems do not contain any exons. Consequently, the RNA-PCR amplification step gives primarily small vector-to-vector products without an insert (`empty` products). These empty products are heavily enriched by the PCR-step, which favours the generation of small products. No efficient method has yet been presented to get rid of such products effectively without the disadvantage of the simultaneous removal of other, bona fide exons. PA1 c. Many of the single exons which are trapped using any of the current methods are small (.sup..about. 120-150 bp) and often give poorly hybridizing probes for subsequent experiments, e.g. to screen cDNA-libraries. Furthermore, since the individually trapped exons require the use of cDNA-libaries in the next step to further define the gene, the initial advantage of working with an expression independent system is lost. PA1 d. The vectors often contain internal, so called `cryptic` splice sites resulting in a substantial proportion of false positives. PA1 e. Subcloning disrupts the genomic context of the exons which results in a high background of false positives. Cloning of regions which are never transcribed or cloning of intronic sequences without their naturally flanking exons often results in activation of cryptic splice sites or spuriously coincident processing signals and thus leads to recognition of false exons. PA1 f. The vectors can only be used in combination with specific cell lines (e.g. COS-cells), since they require a specific system of replication in the host cell, commonly based on the SV40 origin of replication.
Although 3' exon trapping is better in some respects, in that it traps larger exons and it identifies the end of a gene specifically, it does not provide a solution for the major limitations of the exon trapping technique in general.