There are many approaches that can be adopted in order to improve the expression of heterologous genes in plants. Indeed, all the elements that make up a gene exert, or can exert, a control function on gene expression, modulating the transcription and/or translation process. The untranslated sequences present at the 5′- and 3′ ends of the mRNA (called 5′-UTR and 3′-UTR, where UTR stands for “untranslated region”) are no exception to this and indeed must be considered preferential targets for suitable modifications since, to a large extent, they determine the translation efficiency and the turn-over of the mRNA itself. In fact, copious evidence proves that:                the m7Gppp (5′-cap) structure present at the 5′ terminal of the mRNA is essential for recruiting the eIF4F complex able to bond the ribosomal 40S subunit (Franks and Likke-Andersen, 2008);        through the eIF4G component, the eIF4F complex interacts with the poly(A) tail present at the 3′ terminal of the mRNA, allowing the latter to assume a circular structure (Franks and Likke-Andersen, 2008);        poly(A) tail and eIF4F complex reduce the enzymatic hydrolysis of the 5′-cap structure and hence prevent the rapid degradation of the mRNA by cytoplasmic exonucleases active on the mono-phosphate 5′ terminals (Franks and Likke-Andersen, 2008);        the 5′-UTR sequence can contain elements able to influence the formation of the 5′-cap structure, the bond of the latter with the eIF4E factor, the recruiting of the ribosomal 40S subunit, the constitution of polysomes, the spontaneous dissociation rate of the 43S complex, the recognition of the authentic translation start codon AUG;        the 5′-UTR sequence can also contain sequences that represent bonding sites to the DNA for specific transcription factors, and hence can modify the transcription activity of the promoters upstream.        
It is therefore evident that the 5′-UTR, also called leader region, needs to be particularly considered in plant engineering programs in order to increase the expression level of recombinant proteins.
However, for various reasons, it is not at all easy to design high-efficiency leader sequences, even for a person of skill in the art. Firstly, the great variability in the sequence observable between leader regions of different genes belonging to the same genome or to related genomes must be considered. This variability makes it very difficult to identify potential tracts able to confer an improved characteristic on the leader, and practically impossible to predict possible interactions with other elements or sequences that make up the 5′-UTR region. Secondly, the overall length of the leader region must possibly be contained within 100-120 bp, preferably 80 bp, so as not to increase the frequency of spontaneous dissociation of the 43S complex from the region itself. This imposes a strict choice of the components that will actually be used in the construction of the leader tract, to the detriment of others. Thirdly, the leader region should not contain palindrome sequences or a nucleotide composition rich in G/C, so as to prevent the formation of secondary structures in the transcript that cannot be resolved through the intervention of eIF4A. Finally, a minority portion, but in any case significant, of the sequence (about 10%) cannot vary freely but must contain essential functional elements, such as, specifically, the Inr initiator site and the Kozak motif or equivalent Kozak-like motif.
Application WO 2008/080954 describes the combination of repeated CAA elements with repeated CT elements inside 5′-UTR sequences usable to increase the expression of recombinant proteins in plants. Furthermore, it also describes the co-presence of poly(CAA) and poly(CT) with the transcription initiator site (Inr) of the CaMV 35S promoter, that is, the cauliflower mosaic virus (Guilley et al., 1982) and/or with the ACAATTAC octamer from the TMV Ω leader (Gallie and Walbot, 1992). In fact, WO 2008/080954 describes a leader sequence called LLTCK containing for example all the elements cited above:    1. Inr site of CaMV 35S gene for an efficient mRNA capping;    2. Poly(CAA) region similar to the “translational enhancer” present in the TMV Ω leader (Gallie and Walbot, 1992);    3. Sequence rich in CT elements, similar to some plant leaders (Bolle et al., 1996);    4. Octamer of TMV Ω leader.
The effect of the LLTCK leader in WO 2008/080954 was assessed in tobacco, using the leader of the CaMV 35S gene for comparison, which is present in a large number of commercial vectors, by determining the expression levels of the uidA reporter gene (coding for enzyme β-glucuronidase, GUS) under the control of the constitutive CaMV 35S promoter. The LLTCK leader determined an increase in concentration of the GUS enzyme equal to 8-12 times that of the control leader.
There is however a need to further increase the efficiency of the 5′-UTR tract for the expression of transgenes, and hence of recombinant proteins in plants.
In particular, in order to further increase the efficiency of the 5′-UTR tract for the expression of transgenes in plants compared with the state of the art, considering that LLTCK is the only synthetic high-efficiency leader whose effects on the transcription and translation processes of genetic information are known, it may be useful to consider this leader as a model or starting point for interventions to improve them.
As we said, WO 2008/080954 provides to combine repeated CAA elements with repeated CT elements and identifies a series of factors able to make the advantage of said combination more evident.
A preferential application is associated with each factor. Particular importance is given to the presence of the octamer motif ACAATTAC harbored by the TMV Ω leader; in fact, according to WO 2008/080954, an efficient leader can derive from joining tracts of the TMV Ω leader with a region bearing repeated CT motifs.
Inside the Ω leader known from WO 2008/080954, repeated sequences of different types can be seen: one such sequence is represented by the trinucleotide CAA repeated 11 times, although not always contiguously; the other sequence is represented by the octamer motif ACAATTAC repeated 3 times.
It has been experimentally demonstrated that both sequences can cause a great increase in gene expression, acting on a post-transcriptional level.
Although the octamer contains a trinucleotide CAA, the enhancement of gene expression is connected to the presence of the entire sequence, and not of the CAA alone.
It is important to underline that the octamer contains an A/T-rich tract, that is, AATTA, which in turn includes the ATT triplet.
As a possible preferential technical solution, the inventors of WO 2008/080954 indicate keeping the octamer sequence ACAATTAC, even if this contains the AATTA sequence, and therefore a non-canonical translation start site ATT.
Obviously, they believe that the inclusion of the octamer motif mentioned above is more important, even if this entails the introduction of an A/T-rich sequence and with it a putative translation start codon. It must be underlined that in the ID sequence no 1 (LLTCK) of WO 2008/080954, other A/T-rich sequences are specifically noted, positioned respectively:    1. immediately downstream of the initiator site (TATTTTTA);    2. inside the poly(CAA) (AATA) tract;    3. at the end of this tract, in a site again involving the octamer (ATTA);    4. just downstream of the octamer (TATTT).
Three sequences out of four carry the triplet ATT, like the octamer.
We shall now give, for comparison, the known sequence LLTCK leader, highlighting the A/T-rich regions (underlined) and the ATT triplets (bigger character); the tract ACAATTAC in bold corresponds to the octamer motif:
ACACG TTACAACAATACCAACAACAACAACAACAAACAA  CGTATTTCTCTCTCTAGA
We also underline that this known LLTCK sequence does not provide any poly(CAA) region contiguous with a poly(CT) region.
In this case too, although they are aware of the presence of non-canonical translation start sites inside the A/T-rich regions, the inventors of WO 2008/080954 have provided to use said regions in the construction of an efficient leader like LLTCK.
In fact, the A/T-rich sequences, specifically type 1 and 4 as described above, are found not only in the TMV Ω leader but also at the core of the AMV leader commonly used as a translation enhancer as an alternative to Ω.
Hereafter, for comparison, we give the sequences of the TMV Ω leader (a) and AMV leader (b), highlighted, the A/T-rich regions (underlined) and the ATT triplets (bigger character):
(a)(SEQ ID NO: 4)ACCTCGAG CAAC CCAACAACAACAAACAACAAACAA  C CT CACC (b)(SEQ ID NO: 5)ACCTCGAG CTTTCAAATACTTCCATCCC. 
With regard to the actual significance of the ATT triplets in inducing the start of the translation process in an unwanted point of the mRNA inside the leader, it must be noted here that the authentic translation start codon (ATG) needs a context sequence adequate to be recognized as such by the translation complex; it is very likely for a person of skill in the art that an adequate context must equally exist for the recognition of non-canonical translation start triplets such as ATT and CTG.
However, the recognition contexts of the triplets are not known at the moment, and therefore the person of skill is not able to establish, by assessing the state of the art, if and how much a certain triplet ATT (or CTG) really represents a non-canonical translation start site.
Faced by this evidence, in determining the choice of using Ω, AMV or leaders deriving therefrom, it is the positive effect, experimentally proven, of the inclusion of the Ω leader or AMV leader on the level of gene expression that is important.
The person of skill knows, however, that if an ATT or CGT triplet inside the leader were actually interpreted as a translation start codon, a different protein would be produced, not the programmed one, and this could cause problems of functional and structural bio-equivalence, particularly critical in the case of proteins for which a therapeutic application is intended.
The inventors of WO 2008/080954, working mainly in the pharmaceutical field, are aware of the potential risks and, prudently, construct their 5′-UTR sequence by putting all the ATT triplets at a reciprocal distance which is always a multiple of 3, and a stop codon (TAG) in frame with respect to them, toward the end of the leader sequence. Even more ingeniously, the end of the LLTCK sequence is represented by the restriction site for Xba I (TCTAGA) which has the triple function of bearing the stop codon (TAG), of contributing to the formation of a poly(CT) region, of making a possible context favorable to the recognition of an authentic start codon located immediately downstream, as well as of constituting an extremely useful cloning site in 5′ of the desired coding sequence.
Other persons of skill behave differently and simply leave the ATT triplets inside the relative A/T-rich sequences.
In fact, it is common to find synthetic leaders with a programmed sequence bearing ATT triplets even in a divergent position from the authentic reading frame.
From the above it may be concluded that, like other patents and publications preceding this description, WO 2008/080954:    1. does not teach to remove A/T-rich motifs from 5′-UTR sequences, but rather the exact opposite;    2. does not teach to remove ATT triplets from omega-derived or AMV-derived 5′-UTR sequences, but rather the exact opposite;    3. does not teach how to make contexts favorable to gene expression in the absence of A/T-rich motifs, whether or not they bear ATT triplets;    4. does not teach how to construct more efficient variants to the LLTCK leader used in the examples of WO 2008/080954.
All this considered, the need to remove A/T-rich sequences and ATT triplets is in no way suggested or promoted, either explicitly or implicitly by the state of the art, and therefore it is anything but obvious for a person of skill in the art.
Furthermore, since every nucleotide replacement, deletion or addition is potentially able to generate leaders with an unexpected behavior, also the effect of such a removal, like any other manipulations of the 5′-UTR sequence, is anything but obvious for a person of skill in the art.
Therefore, the present invention proposes, in a new and inventive manner, the synthesis of 5′-UTR variants endowed with new elements or new combinations of elements, which constitute an advantageous technical solution, able to modify and significantly improve the state of the art. The Applicant has devised, tested and embodied the present invention to obtain these and other purposes and advantages.
Unless otherwise defined, all the technical and scientific terms used here and hereafter have the same meaning as commonly understood by a person with ordinary experience in the field of the art to which the present invention belongs. Even if methods and materials similar or equivalent to those described here can be used in practice and in the trials of the present invention, the methods and materials are described hereafter as an example. In the event of conflict, the present application shall prevail, including its definitions. The materials, methods and examples have a purely illustrative purpose and shall not be understood restrictively.