High-Performance Liquid Chromatography (HPLC) of biological macromolecules has been a field of intensive research and the inspiration for several generations of separation scientists. Peptide LC separation science matured both theoretically and practically during the 1980's-1990's.1-3 When recent advances in mass spectrometry led to significant breakthroughs in protein/peptide analysis (often regarded as “omics” era), the chromatographic counterpart was ready to accommodate these new challenges. Indeed, the optimal separation conditions for peptide mixtures are “common knowledge” and used with only slight variations in many proteomics laboratories: a 0-60% acetonitrile gradient with trifluoroacetic acid (TFA) as ion-pairing modifier is recommended for the separation of peptide mixtures.4 
Simultaneously, the basic questions of LC separation techniques are approached very differently by “classic” HPLC studies and “modern” proteomics/peptidomics applications. For LC specialists, peptide separation selectivity is a primary criteria, since detection options traditionally were limited to mostly spectrophotometric detection. Separation conditions were varied to try to optimize complete peak resolution. Conversely, the use of powerful mass-spectrometry detection in proteomics allows the simultaneous identification of co-eluting species, shifting the emphasis to providing better separation efficiency and optimal sampling rate (peak capacity) for mass detectors.
Despite a basic (or certain) negligence towards resolving the fundamental questions of selectivity optimization, selectivity prediction will become an important part of proteomics protocols in the future. Peptide retention (i.e. selectivity) prediction can be used as an additional filter to harden protein/peptide identification in bottom-up proteomics studies,5-7 to decrease the time required for analysis,8 and to even direct the choice of optimal separation systems for a particular sample.9 Recent developments in the field were fuelled by the abundant availability of peptide sequence-retention data sets. A number of peptide retention prediction models have been reported in past few years,10-18 building on earlier work from the 1980's-1990's.16-19 These recent attempts served to add knowledge of peptide separation to classical HPLC. Specifically there were definite improvements in understanding the ion-pairing separation mechanism,10 it's influence on apparent hydrophobicity of amino acid side chains (especially at N- and C-termini),20 the affect of the residue position within peptide chain,13 and the propensity to form helical structures.11 This information resulted in significant improvements of peptide retention prediction accuracy: correlations of R2-value ˜0.98 have been demonstrated,11,13 while the “classical” additive approach cannot exceed R2˜0.93 in real proteomics samples.
One might speculate that the virtually unlimited datasets being filled by mass-spectrometry MS/MS peptide identification would ultimately render predictive models unnecessary, as all retention characteristics for all possible peptides would eventually be experimentally determined. But there are significant barriers preventing this from happening soon. First, while majority of proteomics researchers deal with the separation of tryptic peptides, there is a large number of non-tryptic species formed in-vivo and even during the trypsin digestion. Adding chemical and post-translation protein/peptide modifications makes it difficult to estimate the number of species one might deal with during proteomics experiments, let alone evaluating the probability of detecting all of them. Second, there are a large number of variations in mobile/stationary phases used in proteomics LC. There are many commercial suppliers of RP-separation media, and these products are the subject of ongoing improvements and modifications. There are several variations of mobile phase composition: for example, ESI-, MALDI-compatible and high pH RP for 2-dimensional schemes. Different separation systems will require collection of separate data sets. Third, it is unlikely that such vast amount of information can be collected in one laboratory or even using one LC experimental platform. It will require collective efforts from the proteomics community, making the issue of a “standardized” alignment of LC-MS data critical. Finally, peptides represent the group of “irregular” solutes from the linear-solvent-strength (LSS) point of view.4 In other words, the slopes (S) in the fundamental LSS equation are different for different peptides:log k=log kw−S*φ  (1)where k is capacity factor at organic solvent volume fraction φ and kw is the capacity factor at φ=0. It was shown that this could result in selectivity variation and even reverse the elution order based on different column sizes, gradient steepness values and flow rates.21 This will create discrepancies during the transfer of retention information, even between systems with identical stationary/mobile phase combinations.
All these issues need to be considered in this new “proteomics/peptidomics” stage of peptide HPLC research. Recently, peptide separation selectivity of 300 Å and 100 Å sorbents, and of C18 phases and C18 materials with embedded polar groups was compared. This study included screening of a number of commercial C18 phases and switching the ion-pairing modifiers to monitor changes in selectivity.22 Studies like this help at least determine a few groups of phases/conditions where predictive models and data sets can be considered transferable.
Another important problem is the accurate alignment of LC-MS data. Spiking the samples with a mixture of known peptides, or monitoring retention times (RT) of the redundant species present in different LC-MS runs is a common practice. Petritis et al. followed the retention of six peptides frequently observed in D. radiodurans and S. oneidensis to align 687 different LC-MS/MS runs.23 When dealing with the samples from non-related organisms, spiking the analyzed mixture with known peptides is required. In other work human transferrin was added to all of the protein mixtures used to collect peptide retention data sets for the model optimization. Following tryptic digestion it produced ˜35 easily detectable peptides; which were used to draw RT vs. calculated hydrophobicity (H) linear plots and to align the data to adjust for small changes in slopes and intercepts. Later, a simpler digest of horse heart myoglobin was used for the same purpose.11 It was also proposed to use the extrapolated (on combined RT vs. H plots for the optimization data set) “ideal hydrophobicity” values of these standard peptides. Finding retention times of target peptides in each LC run allows a very robust chromatographic calibration and data alignment. An alternative approach is to use a number of confidently MS/MS identified peptides to plot RT vs. H as a calibration.15, 24 However, the accuracy of this procedure is dependent on the elimination of false-positive identifications and retention prediction accuracy for particular set of peptides.
A peptide retention standard for RP HPLC of peptides was developed for commercial use, and contains a mixture of five synthetic C-terminal amide decapeptides of the generic formula: [Ac-Arg-Gly-X-X-Gly-Leu-Gly-Leu-Gly-Lys-Amide] [SEQ ID NO:1].25 Sequence variations of [X-X] are [Gly-Gly], [Ala-Gly], [Val-Gly] and [Val-Val]. The fifth peptide contains the [Ala-Gly] sequence plus a free N-terminal amino group. Molecular weights of these species ranged between 883-995 Da, making them easily detectable by both ESI and MALDI MS techniques. The disadvantage of this mixture is a narrow range of hydrohobicities—˜6% on acetonitrile percentage scale, whereas typically tryptic peptides elute within a ˜40% window. Recently Eyers et al.26 developed a peptide standard to address the performance of LC-MS systems as a whole including separation and detection parts. They generated artificial protein QCAL, which provides a set of 22 peptides of various sizes and hydrophobicities upon tryptic digestion.
Another issue closely related to retention standard development is how the value of peptide hydrophobicity is expressed and used in chromatographic experiments. Thus, the Sequence Specific Retention Calculator (SSRCaIc) model plots dependencies of RT vs. calculated hydrophobicity, with the latter being a product of multiple factors reflecting the influence of amino acid sequence, pl and propensity to form helical structures.11 This value has no connection to any physical property of separated species unless it's related to the hydrophobicity of standard peptide(s). An alternative way to represent retention prediction data is the use of normalized retention/elution time (NRT/NET) values.12,23 While this approach normalizes retentions to a set of known peptides, these values don't express real chromatographic properties of the species either, unless the standard peptides are well characterized.
While the general concept of hydrophobicity of peptides and proteins is well understood, this field is known for the large number of hydrophobicity scales.27-29 