Due to the complexity of proteins and their biological production, characterization of protein pharmaceuticals (“biologics”) poses much more demanding analytical challenges than do small molecule drugs. Biologics are prone to production problems such as sequence variation, misfolding, variant glycosylation, and post-production degradation including aggregation and modifications such as oxidation and deamidation. These problems can lead to loss of safety and efficacy, so the biopharmaceutical industry would like to identify and quantify variant and degraded forms of the product down to low concentrations, plus obtain tertiary structure information. Because of the rapidly increasing power of mass spectrometry (MS), an MS-based platform for comprehensive measurement of almost all the relevant drug's physical characteristics is now conceivable. A crucial piece of such a platform is data analysis software focused to address the needs of the biopharmaceutical industry.
At every stage in the development and manufacture of a protein pharmaceutical, there is a need to characterize recombinantly produced protein molecules. This need arises in new product development, biosimilar (generic) product development, and in quality assurance for existing products. With the first generation of protein drugs just emerging from patent protection, and generic manufacturers rushing to enter the marketplace, assays and regulatory guidelines for biosimilarity have become a matter of some urgency. Over 30 branded biologics with worldwide sales>$50B will come off patent in 2011-2015, and the biosimilars markets is expected to grow to about $4B by 2015.
Quality assurance for monoclonal antibodies, as an example, must consider primary structure, higher order structure, glycosylation and heterogeneity. Primary structure analyses can include total mass (as measured by MS), amino acid sequence (as measured by orthogonal peptide mapping with high resolution MS and MS/MS sequencing), disulfide bridging (as measured by non-reducing peptide mapping), free cysteines (as measured by Ellman's or peptide mapping), and thioether bridging (as measured by peptide mapping, SDS-PAGE, or CGE). Higher order structure can be analyzed using CD spectroscopy, DSC, H-D-exchange, and FT-IR. Glycosylation requires identification of glycan isoforms (by NP-HPLC-ESI-MS, exoglycosidase digestion, and/or MALDI TOF/TOF), sialic acid (by NP-HPLC, WAX, HPAEC, RP-HPLC) and aglycolsylation (by CGE and peptide mapping). Heterogeneity analyses must take into consideration C- and N-terminal modifications, glycation of lysine, oxidation, deamidation, aggregation, disulfide bond shuffling, and amino acid substitutions, insertions and deletions. The large variety of assays and techniques gives some idea of the daunting analytical challenge. As early as 1994, Russell Middaugh of Merck Research Laboratories (Middaugh, 1994) called for a single comparative analysis in which “a number of critical parameters are essentially simultaneously determined”. We believe that mass spectrometry (MS) now largely answers this call, because it can cover most of the physicochemical properties required for molecular analysis.
One of the problems with MS-based assays, however, is the lack of high-quality data analysis software. Unlike slow gel-based peptide mapping, which allows human visual comparison, MS generally relies on automatic data analysis, due to the huge numbers of spectra (often >10,000/hour), the high accuracy of the measurements (often in the 1-10 ppm range), and the complexity of spectra (100s of peaks spanning a dynamic range>1000). There are a large number of programs for “easy” MS-based proteomics, for example, SEQUEST, Mascot, X!Tandem, etc., but these programs were not designed for deep analysis of single proteins, and are incapable of difficult analytical tasks such as characterizing mutations, glycopeptides, or metabolically altered peptides. Moreover, the programs just named are all identification tools and must be coupled with other programs such as Rosetta Elucidator (now discontinued), Scaffold, or Thermo Sieve for differential quantification. There are also specialized tools such as PEAKS for de novo sequencing, along with a host of academic tools. The confusing array of software tools poses an obstacle to biotech companies adopting MS-based assays.
The methods and systems described herein free up the time of technical staff for additional projects while reducing staff frustration with the analysis process. Prior to the present methods and systems, sequence variant analysis (SVA) used a cumbersome combination of several existing software tools, supplemented with the use of spreadsheet macros. In contrast, described herein is an integrated approach providing a single user-friendly dashboard where one can identify false positives and quantify true positives efficiently. This gives greater confidence to the user and drastically reduces the time required to distinguish true from false positive identifications. Drug substance analyses are generally on the critical path of development, and projects are often gated by the analysis of a production run. Any time saving that leads to earlier commercialization of a drug brings significant monetary benefits to the company, not to mention the therapeutic benefits of bringing novel treatments to the patients as early as possible.
Described herein are methods and systems (including user interfaces, software, etc.) for interactively allowing a user to distinguish signal from even noisy spectra. Described herein are methods and apparatuses (including systems, devices, user interfaces, software, and the like) that may address the needs discussed above.