I. Field of the Invention
The present invention relates to a multi-functional cloning vector, a vector-primer derived therefrom, and a method to prepare a cDNA bank using the same. This invention provides a method to obtain readily a cDNA which can be used to produce a large amount of protein which is available as a medicine.
II. Description of the Related Art
Proteins constituting cells in our body play a central role to maintain our life as the skeleton of cells, catalysts of reactions, and mediators of signal transduction. Thus, finding new proteins and elucidating those structures and functions are important not only to know life but also to use proteins as medicines or reagents for diagnosis. Although there have been many trials to use these proteins functioning in our body as a medicine, few proteins have been available because of difficulty of preparing a large amount of protein. The recent progress of genetic engineering made it easy to find a gene encoding a useful protein from human cells and to express it in bacteria or animal cells in high yields. Using this technique recently many proteins were recently developed as medicines.
Any human protein has potential utility as a medicine because it functions in the body. It has been known that many genetic diseases are caused by disappearance of a protein playing an important role in a living cell. Therefore to produce a large amount of a protein and to elucidate its function and any relationship to disease are important steps for developing new medicines. If the investigation is begun from the purification of a protein showing a target activity, it takes a long time and much labor to obtain a large amount of starting material. To avoid this problem, the inventors take the strategy of analyzing the gene by which information of the amino acid sequence of a protein is encoded. In our plan, cDNA (complementary DNA) is synthesized from all mRNA which is transcribed from the genome and is translated into protein, and then each cDNA is characterized.
The number of genes in the human genome is estimated to be approximately 10.sup.5, which may correspond to the number of human proteins. Recently the determination of the whole sequence of the human genome is discussed around the world, under the rubric of the Human Genome Project. If this plan is realized, the amino acid sequence of all proteins encoded in the human genome is expected to be elucidated. It is, however, difficult to determine the amino acid sequence only from the nucleotide sequence of genomic DNA because the genome consists of exons encoding parts of proteins and introns not encoding protein. In the cell, mRNA transcribed from the genome is processed by deletion of introns and the processed mRNA containing only exons is translated into protein on ribosomes. Therefore if we determine the nucleotide sequence of the processed mRNA, we can obtain the information of the amino acid sequence of proteins encoded by the mRNA. The progress of gene recombination techniques enabled us to convert mRNA to cDNA and to introduce it into E. coli. According to this technique we can synthesize a "cDNA vector" which contains cDNA derived from one species of mRNA, and construct the "cDNA library" which is a group of E. coli containing cDNA vectors. If we determine the nucleotide sequence of all cDNAs contained in the cDNA library, which corresponds to the nucleotide sequences of all mRNA existing in a cell, we obtain the amino acid sequence of all proteins synthesized in a cell. We call the group of sequence-determined cDNAs the "cDNA bank". If the obtained cDNA vector is introduced and expressed in adequate animal cells, the expression product may be used for a screening assay. Taking the above strategy, we can construct the cDNA bank of human proteins, which is called the "homo-protein" cDNA bank, in which the amino acid sequence of each cDNA has been determined and each cDNA can be expressed in mammalian cells. Here we describe only the case of human proteins as an example, but this strategy can apply to the case of other animal and plant cells, and we can obtain also useful information about proteins of these cells.
The success of above strategy depends on the quality of the cDNA library. From this point of view, there are many problems in cDNA libraries prepared by known methods, for example, that subcloning of a cDNA fragment into an adequate vector is required to sequence the cDNA, to screen to obtain a full-length cDNA using it as a probe, or to express it in animal cells. It takes a long time and much labor to analyze many cDNA clones by this subcloning procedure. This problem is solved by developing a novel method by which all of the above procedures can be carried out on the same vector. The minimum requirements for a cDNA vector satisfying the above method are 1) to contain a directional full-length cDNA clone, 2) to be sequenced easily, 3) to be used to prepare a probe which can be used for various screenings, and 4) to be expressed in in vivo or in vitro systems. The term "full-length cDNA clone" described in the requirement 1) means a cDNA clone containing at least the whole coding region of a protein encoded by an mRNA in this specification, but strictly the cDNA clone containing the whole sequence of a template mRNA from 5' end to poly A tail.
The most popular method to prepare cDNA at present is the so called Gubler-Hoffman method described below [Gene 25:263-269 (1983)]. The first strand of cDNA is synthesized from a poly(A)+RNA template isolated from cells and an oligo dT primer using reverse transcriptase. Then the RNA strand is replaced by a second DNA strand using E. coli RNase H, E. coli DNA polymerase I, and E. coli DNA ligase. After blunting both ends of the double-stranded cDNA by T4 DNA polymerase, an adequate oligonucleotide linker DNA is added. Then the resulting cDNA is ligated to a phage vector or a plasmid vector, and used to transform E. coli. It is difficult to obtain a full-length cDNA using this method, because when synthesizing a second strand, the 5' terminal sequence of mRNA used as a primer is deleted and both terminals of cDNA are often deleted by exonuclease activity of DNA polymerase I or T4 DNA polymerase. The vector containing an origin and a promoter for expression in mammalian cells, a replication origin of a single-stranded phage, and a RNA polymerase promoter, for example, such as pCDM8 [B.Seed, Nature 329:840-842 (1987)] is known. Using this vector, preparation of single-stranded DNA for sequencing, preparation of a RNA probe, synthesis of mRNA for in vitro or in vivo translation, and expression in a mammalian cell are possible, but the cDNA insertion is not directional. Thus, a cDNA library prepared using the Gubler-Hoffman method does not satisfy the requirement 1). Furthermore, the large size of pCDM8, which is 4.8 kbp long, is not suitable to clone a long cDNA.
The Okayama-Berg method [Mol. Cell. Biol. 2:161-170 (1982)] is known as a method giving full-length cDNA clones at high frequency, in which a dT-tailed vector-primer is used for synthesis of the first strand cDNA from poly(A)+RNA by reverse transcriptase. Following addition of a dC tail, dG-tailed linker DNA is ligated and then RNA is replaced by DNA by E. coli RNaseH, E. coli DNA polymerase I, and E. coli DNA ligase. Since the dC tail addition occurs rarely at the 3' end of a first strand cDNA incompletely extended, the dC-tailed clone gives a full-length cDNA preferentially. Using a vector-primer furthermore causes the directional insertion of cDNA. Okayama et al., [Mol. Cell. Biol. 3:280-289 (1983)] have developed an expression system in mammalian cells by using linker DNA containing an origin and a promoter of SV40. Honjo has developed an expression system in Xenopus oocytes for mRNA transcribed in vitro by using linker DNA containing a SP6 promoter [Japanese Patent 62-4291]. However, the use of the Okayama-Berg method is limited because to do so requires a high quality of technique. Also, the large number of dG residues in the tail at the 5' terminus often makes it difficult to sequence from the 5' terminus. It is further difficult to prepare probes by this method. Thus a cDNA library using this method does not satisfy the requirements 2) and 3) described above.
Although there have been several improved methods, no methods satisfy all four requirements described above. Since even the lack of one requirement makes it difficult to carry out the above strategy, the development of a method satisfying all of the above requirements is necessary.