Proteomics refers to a diverse area of study including mass spectrometric analysis of proteins, products of genes, mapping of inter-relationship therebetween, and structural analyses of proteins, thereby ultimately elucidating functions of specific protein and genes thereof. Based on the studies of protein structures and roless of functional proteins expressed in genetic codes, determining the primary structures, or the amino acid sequences, of proteins should be the primary goal in proteomics research, since it provides necessary information for analyses of the tertiary structures of proteins.
A classical method for protein sequencing involves the use of Edman reagent, for example Edman degradation. This method is performed by reacting the N-terminus of proteins or peptides with phenyl isothiocyanate under basic conditions and then converting the reaction conditions into an acidic condition, to remove one amino acid in the form of thiazoline from the N-terminus. Therefore, it is possible to determine the entire sequence of proteins or peptides by analyzing amino acids on a stepwise basis. However, such a method is disadvantageous in that it requires high purity proteins or peptides for analysis, and is time consuming. Although recently, an auto sequencer that automatically repeats the above-mentioned reaction process has been developed, it takes from 30 minutes to one hour to determine one amino acid sequence and there is also a problem associated with a need for a large amount of sample when long amino acid sequence analyses are required, because desired products are not obtained in a yield of 100% by one cycle.
For these reasons, in 1980's, a method for analyzing proteins or peptide sequences using a mass spectrometer was introduced to overcome disadvantages associated with the use of Edman degradation. This method is called tandem mass spectrometry (referred to as tandem MS, hereinafter), and analyzes samples ions according to their mass-to-charge ratios, fragments an ion of interest by kinetic collision with inert gas such as helium. Observations of the resulting fragmented ions provide information for proteomic analyses of the proteins.
Conventionally, applications of the tandem MS technique in a peptide analysis result in the generation of daughter ions due to cleavages of certain peptide bonds, which are observed in a daughter-ion mass spectrum. Then, based on the comparison to well-established databases of known proteins, the sample is characterized. However, this method not only leads to erroneous results when the database is incorrect, even when the database is correct, results do not ensure absolute reliability because they are merely the probability that the identified peptide is indeed the desired one. As a result, a variety of attempts have been made to identify proteins by reducing the dependence on databases and mainly focusing on mass analysis results. When analyzing pure peptides using a tandem MS, most peptides exhibit complicated tandem mass spectra, so attempts have been made to simplify spectra by modifying with a series of treatments. As such, a method to directly obtain an amino acid sequence by means of tandem MS, without any database, is called de novo sequencing.
WO 02/08767 discloses a technique wherein chlorosulfonylacetyl chloride is used to sulfonate the N-terminus of a polypeptide, neutralizing b-type ions, which are the N-terminal fragments, so that only the y-type ions can be observed. However, this method suffers from problems in that chlorosulfonylacetyl chloride is over reactive and has two reactive sites, resulting in complex products, and furthermore, the method is completely inapplicable to quantification analyses. Also, since a lysine modification reagent also reacts with the epsilon amino groups of lysine side-chains, this method cannot be used for peptides with lysine at the C-terminus, placing a severe limitation on the utility of this technique.
In order to solve the above-mentioned problems, there have been efforts to protect the epsilon amino groups of the lysine side-chain prior to their sulfonation and also to convert lysine into homo arginine using O-methyl isourea to enhance ionization efficiency. However, these attempts may result in a significant loss of peptides due to the addition of one pretreatment step and also pose a problem that O-methyl isourea might react with the amino group of the N-terminus of the peptide.
Meanwhile, in order to further simplify interpretation of a tandem mass spectrum and to gain more useful information, techniques involving isotopic substitution have been developed.
Smith et. al. (Analytical Chemistry, Vol. 74, No. 19, Oct. 1, 2002) present a technique capable of simplifying interpretation of a tandem mass spectrum. According to the technique, yeasts are cultured in a medium containing 13C isotope-labeled lysine (the 6 carbon atoms constituting lysine are substituted with 13C) and a medium containing untreated lysine, respectively, and the obtained cultures are mixed. Then the mixture is hydrolyzed using a proteolytic enzyme, Lys-C and the resulting products are allowed to discriminate between y-type and b-type, thus realizing simplification of the tandem mass spectrum. However, this method utilizes in vivo labeling and thus is not applicable to experiments involving human subjects, or when the orgasms subject to analyses have alternate routes to producing lysine in vivo. In addition, various attempts have been made to design methods capable of simultaneously performing protein identifications and quantification analyses. Gygi et. al. (Nature Biotechnology, Vol. 17, October 1999: 994-999) present quantitative analyses of complex protein mixtures using Isotope-Coded Affinity Tags (ICAT) in which H was substituted with D as the reagent. However, the reagent reacts with the thiol group of cysteine, while cysteine is only present in the amount of from 5 to 10% of total amino acids in most proteins. Therefore, this quantitative analysis using the reagent ICAT and cysteine inevitably exhibits significant error. In addition, due to the difference in the hydrogen-bond strengths between H and D, there is also a difference in retention time during reverse phase liquid chromatography. These factors result in the occurrence of errors in a quantification analysis, leading to pronounced lowering of reliability.