A virus is a microorganism comprising single or double stranded nucleic acid (DNA or RNA) contained within a protein (and possibly lipid) shell called a "capsid" or "coat". A virus is smaller than a cell, and it does not contain most of the components and substances necessary to conduct most biochemical processes. Instead, a virus infects a cell and uses the cellular processes to reproduce itself.
The following is a simplified description of how a DNA-containing virus infects a cell; RNA viruses will be disregarded in this introduction for the sake of clarity. First, a virus attaches to or enters a cell, normally called a "host" cell. The DNA from the virus (and possibly the entire viral particle) enters the host cell where it usually operates as a plasmid (a loop of extra-chromosomal DNA). The viral DNA is transcribed into messenger RNA, which is translated into one or more polypeptides. Some of these polypeptides are assembled into new capsids, while others act as enzymes to catalyze various biochemical reactions. The viral DNA is also replicated and assembled with the capsid polypeptides to form new viral particles. These viral particles may be released gradually by the host cell, or they may cause the host cell to lyse and release them. The released viral particles subsequently infect new host cells. For more background information on viruses see, e.g., Stryer, 1981 and Matthews, 1970 (note: all references cited herein, other than patents, are listed with citations after the examples).
As used herein, the term "virus" includes phages and viroids, as well as replicative intermediates. As used herein, the phrases "viral nucleic acid" and DNA or RNA derived from a virus" are construed broadly to include any DNA or RNA that is obtained or derived from the nucleic acid of a virus. For example, a DNA strand created by using a viral RNA strand as a template, or by chemical synthesis to create a known sequence of bases determined by analyzing viral DNA, would be regarded as viral nucleic acid.
The host range of any virus (i.e., the variety of cells that a type of virus is capable of infecting) is limited. Some viruses are capable of efficient infection of only certain types of bacteria; other viruses can infect only plants, and may be limited to certain genera; some viruses can infect only mammalian cells. Viral infection of a cell requires more than mere entry of the viral DNA or RNA into the host cell; viral particles must be reproduced within the cell. Through various assays, those skilled in the art can readily determine whether any particular type of virus is capable of infecting any particular genus, species, or strain of cells. As used herein, the term "plant virus" is used to designate a virus which is capable of infecting one or more types of plant cells, regardless of whether it can infect other types of cells.
With the possible exception of viroids (which are poorly understood at present), every viral particle must contain at least one gene which can be "expressed" in infected host cells. The expression of a gene requires that a segment of DNA or RNA must be transcribed into or function as a strand of messenger RNA (mRNA), and the mRNA must be translated into a polypeptide. Most viruses have about 5 to 10 different genes, all of which are expressed in a suitable host cell.
In order to be expressed in a cell, a gene must have a promoter which is recognized by certain enzymes in the cell. Gene promoters are discussed in some detail in the parent application Ser. No. 458,414, now abandoned, cited above, the contents of which are incorporated herein by reference. Those skilled in the art recognize that the expression of a particular gene to yield a polypeptide is dependent upon two distinct cellular processes. A region of the 5' end of the gene called the promoter, initiates transcription of the gene to produce a mRNA transcript. The mRNA is then translated at the ribosomes of the cell to yield an encoded polypeptide. Therefore, it is evident that although the promoter may function properly, ultimate expression of the polypeptide depends at least in part on post-transcriptional processing of the mRNA transcript.
Promoters from viral genes have been utilized in a variety of genetic engineering applications. For example, chimeric genes have been constructed using various structural sequences (also called coding sequences) taken from bacterial genes, coupled to promoters taken from viruses which can infect mammalian cell(the most commonly used mammalian viruses are designated as Simian Virus 40 (SV40) and Herpes Simplex Virus (HSV)). These chimeric genes have been used to transform mammalian cells. See, e.g., Mulligan et al 1979; Southern and Berg 1982. In addition, chimeric genes using promoters taken from viruses which can infect bacterial cells have been used to transform bacterial cells; see, e.g., the phage lambda P.sub.L promoter discussed in Maniatis et al, 1982.
Several researchers have theorized that it might be possible to utilize plant viruses as vectors for transforming plant cells. See, e.g., Hohn et al, 1982. In general, a "vector" is a DNA molecule useful for transferring one or more genes into a cell. Usually, a desired gene is inserted into a vector, and the vector is then used to infect the host cell.
Several researchers have theorized that it might be possible to create chimeric genes which are capable of being expressed in plant cells, by using promoters derived from plant virus genes. See, e.g., Hohn et al, 1982, at page 216.
However, despite the efforts of numerous research teams, prior to this invention no one had succeeded in (1) creating a chimeric gene comprising a plant virus promoter coupled to a heterologous structural sequence and (2) demonstrating the expression of such a gene in any type of plant cell.
Cauliflower Mosaic Virus (CaMV)
The entire DNA sequence of CaMV has been published. Gardner et al, 1981; Hohn et al, 1982. In its most common form, the CaMV genome is about 8000 bp long. However, various naturally occurring infective mutants which have deleted about 500 bp have been discovered; see Howarth et al 1981. The entire CaMV genome is transcribed into a single mRNA, termed the "full-length transcript" having a sedimentation coefficient of about 35S. The promoter for the full-length mRNA (hereinafter referred to as "CaMV(35S)") is located in the large intergenic region about 1 kb counterclockwise from Gap 1 (see Guilley et al, 1982).
CaMV is believed to generate at least eight proteins; the corresponding genes are designated as Genes I through VIII. Gene VI is transcribed into mRNA with a sedimentation coefficient of 19S. The 19S mRNA is translated into a protein designated as P66, which is an inclusion body protein. The 19S mRNA is promoted by the 19S promoter, located about 2.5 kb counterclockwise from Gap 1.