Therapeutic vaccines based on tumor-specific neoantigens hold great promise as a next-generation of personalized cancer immunotherapy.1-3 Cancers with a high mutational burden, such as non-small cell lung cancer (NSCLC) and melanoma, are particularly attractive targets of such therapy given the relatively greater likelihood of neoantigen generation.4,5 Early evidence shows that neoantigen-based vaccination can elicit T-cell responses' and that neoantigen targeted cell-therapy can cause tumor regression under certain circumstances in selected patients.7 
One question for neoantigen vaccine design is which of the many coding mutations present in subject tumors can generate the “best” therapeutic neoantigens, e.g., antigens that can elicit anti-tumor immunity and cause tumor regression.
Initial methods have been proposed incorporating mutation-based analysis using next-generation sequencing, RNA gene expression, and prediction of MHC binding affinity of candidate neoantigen peptides8. However, these proposed methods can fail to model the entirety of the epitope generation process, which contains many steps (e.g., TAP transport, proteasomal cleavage, and/or TCR recognition) in addition to gene expression and MHC binding9. Consequently, existing methods are likely to suffer from reduced low positive predictive value (PPV). (FIG. 1A)
Indeed, analyses of peptides presented by tumor cells performed by multiple groups have shown that <5% of peptides that are predicted to be presented using gene expression and MHC binding affinity can be found on the tumor surface MHC10,11 (FIG. 1B). This low correlation between binding prediction and MHC presentation was further reinforced by recent observations of the lack of predictive accuracy improvement of binding-restricted neoantigens for checkpoint inhibitor response over the number of mutations alone.12 
This low positive predictive value (PPV) of existing methods for predicting presentation presents a problem for neoantigen-based vaccine design. If vaccines are designed using predictions with a low PPV, most patients are unlikely to receive a therapeutic neoantigen and fewer still are likely to receive more than one (even assuming all presented peptides are immunogenic). Thus, neoantigen vaccination with current methods is unlikely to succeed in a substantial number of subjects having tumors. (FIG. 1C)
Additionally, previous approaches generated candidate neoantigens using only cis-acting mutations, and largely neglected to consider additional sources of neo-ORFs, including mutations in splicing factors, which occur in multiple tumor types and lead to aberrant splicing of many genes13, and mutations that create or remove protease cleavage sites.
Finally, standard approaches to tumor genome and transcriptome analysis can miss somatic mutations that give rise to candidate neoantigens due to suboptimal conditions in library construction, exome and transcriptome capture, sequencing, or data analysis. Likewise, standard tumor analysis approaches can inadvertently promote sequence artifacts or germline polymorphisms as neoantigens, leading to inefficient use of vaccine capacity or auto-immunity risk, respectively.