Protein therapeutics are increasingly common in treating various conditions. For these products to be safe, reproducible and effective in a patient, high fidelity transcription and translation of a given polynucleotide sequence into the “correct” amino acid polypeptide product, yielding a product of high homogeneity, is critical to pharmaceutical production of human proteins in any expression system or host cell including Escherichia coli, yeast, or mammalian cells, for instance, Chinese hamster ovary (CHO) cells.
Low abundance protein sequence variants, having an amino acid primary sequence which differs from that encoded by the respective coding region of a polynucleotide used for expression, are found in essentially every expressed protein and can be accounted for product-dependent impurities in a protein therapeutic. These low abundance sequence variants are the result of amino acid residue misincorporations due to nucleotide mismatches during transcription and/or translation. Whether or not a nucleotide mismatch at the transcriptional and/or the translational level leads to an exchange of differing amino acid residues by misincorporation, e.g., Gly→Glu, or Gly→Asp, is primarily dependent from the type of codon encoding said amino acid residue at the level of the encoding polynucleotide sequence.
Any heterogeneity or deviation from the “correct” or native encoded amino acid sequence of a polypeptide product can lead to significant disadvantages including lower quality, increased purification effort, altered therapeutic efficacy and/or altered immunogenicity, e.g., increased immunogenicity. Two protein products obtainable by expression in any expression system may substantially differ in the afore-mentioned characteristics even where their respective coding sequences are highly similar or almost identical, but differ in at least one codon encoding the same amino acid in the same respective position of the protein. Thus, protein products which are not obtained by expression of exactly the identical coding sequence, can show different immunological properties, for example, induction of antibody formation directed against said proteins, and/or different pharmacological properties, for example, a differing half-life or differing pharmacokinetics.
For these reasons, it is desirable to have a method available which could be used to either provide a coding sequence which is more or less prone to amino acid misincorporation, or to provide the unknown coding sequence of a known, previously characterized and/or marketed protein product, in order to exactly match the degree or type of amino acid residue misincorporation of said known product. The latter aspect of such method is particularly relevant in the development of biosimilar pharmaceutics, where the product's protein sequence is fully known, or can be readily determined experimentally by routine methods, but the encoding polynucleotide sequence or a part thereof is proprietary, non-disclosed or otherwise unknown. The degree or type of impurities by low level sequence variants can be highly significant in an original protein therapeutic and/or a corresponding biosimilar therapeutic.
Yu et al., Anal. Chem. 2009, 81, 9282-9290 describes mass spectrometry-based analytics and molecular mechanisms resulting in the formation of low level sequence variants of polypeptides, comprising misincorporated amino acids. Yu et al. explains that the majority of low level sequence variants of polypeptides comprising at least one misincorporated amino acid results from at least one non-Watson-Crick base mismatch during transcription or translation. Yu et al., however, does not teach to optimize the coding sequence of a polynucleotide encoding the polypeptide, e.g., by identification and selection of at least one codon of an unknown second polynucleotide, coding for the same polypeptide but differing in at least said one codon. In other words, Yu et al. does not envisage reverse engineering of at least one codon of an unknown coding sequence.
More generally, a deduction of a coding RNA or DNA sequence from the encoded protein sequence was so far not deemed possible, due to the central dogma of molecular biology concerning the flow of genetic information within any living organism and due to the degeneracy of the genetic code.
Thus, there is a need in the biopharmaceutical industry for methods for optimizing a coding sequence encoding a polypeptide by reverse-engineering of at least one codon of an unknown coding polynucleotide sequence or parts thereof which encodes a known protein product.
Further, immunogenic activity of biopharmaceuticals, and protein therapeutics in particular, is a problem commonly encountered in the biotech industry. Specifically, the presence of T-cell epitopes in a biopharmaceutical and the occurrence of anti-drug antibodies (ADA) has now been described for a number of protein drugs demonstrating that the T cell epitope content is a significant factor that contributes to antigenicity (Shankar et al. Nat Biotechnol 2007, 25(5):555-61; Nechansky & Kircheis Expert Opin. Drug Discov 2010, 5(11):1067-1079; Harding et al. MAbs 2010, 2(3):256-65). One key determinant of T-cell activation (T-helper cells; CD4+ cells) is the binding strength of T cell epitopes to major histocompatibility complex (Type II of MHC or HLA) molecules (Weber et al. Adv Drug Deliv Rev. 2009, 30; 61(11):965-76). Considerable effort has been undertaken to estimate and reduce the immunogenic potential of protein therapeutics by predicting potential T-cell epitopes using in silico tools (Roque-Navarro et al. Hybrid Hybridomics 2003, 22(4):245-57; Tangri et al. Current Medicinal Chemistry 2002, 9:2191-9; Mateo et al. Hybridoma 2000, 19(6):463-71; De Groot et al. Vaccine 2009, 27:5740-7). Based on the predicted MHC II binding strength of a peptide sequence it is possible to make an informed decision about the likelihood that the peptide sequence will provoke an immune response. Nevertheless, prediction, alteration, or reduction of the immunogenic potential of polypeptide pharmaceuticals, has so far been limited to the “main” or native polypeptide obtainable by expression and/or purification procedures, i.e. to the amino acid sequence encoded by the respective coding sequence of a polynucleotide according to the genetic code used by the organism or host organism expressing such polypeptide.
The above cited prior art, however, did not appreciate or address the problem that the primary amino acid sequence can vary in a considerable portion of the total amount of a polypeptide drug due to mistranscriptional and/or mistranslational events resulting in one or more amino acid residue misincorporation(s) within low level sequence variants of the main or native polypeptide. It has also not been recognized in the art that such amino acid residue misincorporation(s) may significantly affect the immunogenic potential of the respective polypeptide pharmaceuticals. Concurrently, codon selection based on the immunogenic potential of the misincorporated amino acid (and the resulting peptide sequence) has so far not been performed or suggested. More specifically, the immunogenic potential of amino acid misincorporations within protein drugs has also not been included in the attempts in the prior art to alter, i.e., decrease or increase, the immunogenic potential of a polypeptide, e.g., reducing the formation of ADA.
To conclude, there is a general need for improved methods for providing codon-optimized nucleotide sequences, which encode polypeptides having an altered, in particular a decreased immunogenic potential. In particular, there is a need for methods which are useful in obtaining optimized coding sequences encoding polypeptide drugs eliciting a decreased ADA response as compared to polypeptides expressed from the non-optimized polynucleotide when introduced into a subject.