The identification of proteins in complex mixtures is the primary goal of the field of proteomics. Proteomics seeks to understand cellular and disease processes by analyzing a plurality of proteins that number in the tens of thousands and can vary in concentration by up to 10 orders of magnitude (Qian et al., Molecular and Cellular Proteomics 5, 1727-1744 (2006)). The biological samples that are studied in proteomics can vary tremendously and include cultured cell lines, tissues, and bodily fluids, among others. The ability to analyze the proteomic complexity in samples such as these remains a major challenge for any study based on global biological analysis. Decreased sample complexity enables identification of a greater number of proteins in a given sample, as well as the focused identification of particular classes of proteins among a background of the full complement of proteins present in the sample. One means of achieving decreased sample complexity is through selective and site-specific labeling of discrete functional groups on proteins. Through greater proteomic coverage and identification of discrete protein subsets, such selective protein labeling methodologies enable the study of biological states as a function of time, disease, or of biological perturbation in a highly comprehensive manner.
However, chemical methods for labeling proteins suffer from a lack of specificity that results from the introduction of labels at multiple sites. For example, while one is able to label primary amine functionalities using amine reactive reagents such as succinimidyl esters, such reagents label both ε-amines of lysines as well as α-amines of unblocked protein N-termini. One can attempt to achieve specificity of labeling by adjusting the pH of the reaction, but this is difficult to do in practice since the pKa values for α-amine and ε-amines only differ by 2 pH units or less, and there are normally multiple lysines and only one N-terminus per protein. Recently, a method using pyridoxyl phosphate for selective labeling of protein α-amines has been proposed, but this reaction is slow and does not result in labeling of N-terminal serine, threonine, cysteine, tryptophan, or proline residues (Gilmore J. M. et al., Angew. Chem. Int. Ed. 45, 5307-5311 (2006)).
Proper cellular function and homeostasis requires careful regulation of cellular and extracellular proteins. Protein regulation in cells and tissues is accomplished through a variety of mechanisms, including transcriptional and translational control of synthesis, as well as, through posttranslational modification of proteins. Such posttranslational protein modifications include phosphorylation, glycosylation, lipidation, ubiquitination, and proteolytic cleavage. Proteolytic processing of proteins, or proteolysis, is carried out by enzymes termed proteases that are involved in the regulation of a myriad of biological processes. These include the conversion of pre- and pro-proteins into their active forms, blood clotting, regulation of cell cycle progression, regulation of cell migration and cancer metastasis, tissue remodeling during development, programmed cell death and apoptosis, T- and B-cell development, immunity, and memory, among others. Given the complexity of these biological processes, a variety of proteases exist in cells that can process a variety of substrate proteins. Examples of regulatory proteases include caspases, matrix metalloproteases, cathepsins, calpains, granzymes, and the proteasome, among others. Each of these proteases is involved in specific biological processes that depend on the processing of specific sets of substrate proteins to result in either a gain or loss of protein substrate function, and a concomitant biological phenotype or effect.
As a specific illustration, after receiving a cell death signal, apoptotic cells execute a cellular program that results in widespread and dramatic cellular changes that can include: (1) cell shrinkage and rounding due to the breakdown of the proteinaceous cytoskeleton; (2) the appearance of a dense cytoplasm and tight packing of cell organelles; (3) chromatin condensation into compact patches against the nuclear envelope; (4) discontinuity of the nuclear envelope and DNA fragmentation; (5) breakdown of the nucleus into several discrete chromatin bodies or nucleosomal units due to the degradation of DNA; (6) blebbing of the cell membrane into irregular buds. Near the conclusion of the apoptotic program, the cell breaks apart into several vesicles called apoptotic bodies, which are then phagocytosed.
The loss of regulation of apoptosis is a hallmark of many cancer cells, which continue to divide in a malignant fashion, rather than undergoing cell death to eliminate cells that have sustained, for instance, potentially carcinogenic damage to DNA. The program of cellular degradation in apoptosis is executed in part by a family of proteases, known as the caspases. Given the profound and global cellular changes that occur during apoptosis, one would expect that a variety of substrate proteins are degraded at defined times and locations within a cell to effect this process. Knowledge of the proteins degraded in biological processes such as apoptosis, cancer cell metastasis, or memory would, thus, have a dramatic impact on the development of therapies for conditions such as cancer and memory loss, as just two examples. However, the identity and extent of the proteins degraded during proteolytic processes such as apoptosis are poorly understood. For these and other reasons, new and improved methods for identifying proteins that are substrates for proteases in a variety of biological processes in health and disease are needed. The present invention satisfies these and other needs by providing a robust method for labeling the N-termini of proteins in complex mixtures.