The present invention relates generally to methods for producing recombinant nucleotide sequences from noncontiguous nucleotide sequences.
An emerging class of genes is first discovered through interrogation of genomic databases. Many of these genes are not represented in the current expressed sequence tag databases, suggesting that they are expressed at very low levels per cell, by a limited number of cell types, or at restricted times. In the absence of information about a suitable source of mRNA template, conventional methods for isolating a full-length cDNA derived from such genes are of limited utility.
The provision of a nucleic acid molecule encoding a full-length polypeptide is a necessary first step for producing the polypeptide with recombinant technology. Although certain eukaryotic expression systems can produce recombinant proteins encoded by genomic sequences, many genes contain multiple introns with a collective length that renders the expression unit too large to be efficiently inserted into typical plasmid-based expression vectors. Moreover, the presence of repeated elements within intron sequences may promote plasmid-instability while the expression vector is being propagated within the bacterial host. The presence of intron sequences in an expression cassette also creates the possibility that recipient mammalian host cells will use cryptic splice donor and acceptor sites within the intron. The use of such alternative splice sites may be natural to the gene, or may create an artifact in the expression host cell. Thus, the use of these cryptic splice sites in the host cell may lead to the production of a different recombinant polypeptide then intended. The splicing mechanisms of mammalian, yeast, or insect cells may also be sufficiently different from each other to preclude accurate or efficient splicing of certain mRNA molecules transcribed by heterologous genes. Finally, the lack of mRNA splicing in bacterial host cells necessitates the removal of all intron sequences within the expression unit for the production of recombinant protein.
A particularly convenient method for the isolation of defined DNA segments uses the polymerase chain reaction (PCR) and suitable pairs of primers. While the use of PCR and pairs of exon-specific primers enables the isolation of exon gene segments, however, the joining of resulting exon segments to produce a contiguous polypeptide coding sequence is difficult.
A common method to join PCR generated DNA segments to other DNA segments uses a class II restriction endonuclease cleavage site near the 5xe2x80x2 end of each member of the PCR primer pair. The resulting PCR product would incorporate the restriction endonuclease sites at its terminus, which, upon digestion, would produce suitable cohesive overhangs to promote efficient ligation in the presence of DNA ligase. This method, however, cannot be employed to ligate exon segments, or portions of exon segments, to produce a contiguous polypeptide coding sequence. Exon segments ligated together in this manner would contain a foreign restriction endonuclease recognition sequence between ligated segments, thereby introducing one or more added amino acid residues at each ligation junction.
At high DNA and DNA ligase concentrations, it is possible to ligate blunt-ended DNA segments together. The socalled blunt-end ligation reaction enables the ligation of PCR-generated exon segments without the need to incorporate restriction endonuclease sites to the primers, and, as a result, the ligated products are free of foreign sequences. This approach, however, has severe limitations as well. Since the blunt-end ligation reaction does not use defined cohesive ends, the number of combinations and permutations of incorrect ligation products increases exponentially with the number of exons to be ligated, rendering this method impractical for general use.
Hence, there is a need for a rapid and efficient method to convert an intron-containing gene sequence to a contiguous polypeptide coding sequence free of introns.
The present invention provides improved methods for producing nucleic acid molecules, that encode an amino acid sequence of interest, or that comprise at least one regulatory sequence. According to one aspect of the present invention, an amino acid-encoding nucleic acid molecule with a continuous open reading frame is produced from noncontiguous amino acid-encoding nucleotide sequences.
These and other aspects of the invention will become evident upon reference to the following detailed description and the attached drawing. In addition, various references are identified below and are incorporated by reference in their entirety.