In recent years, structural and functional analyses of proteins have been rapidly promoted as post-genome research. As one method for such structural and functional analyses of proteins (proteome analyses), an expression analysis or primary structure analysis of a protein using a mass spectrometer has been widely performed in recent years. In this context, a so-called MSn analysis, which includes the steps of capturing a specific kind of ion and dissociating the ion by collision induced dissociation (CID) or similar process within a quadrupole ion trap or the like, has proven itself to be a powerful technique.
A general process of identifying a protein by MSn analysis is as follows: A sample of the protein is broken into peptide fragments by a chemical process or enzymatic digestion. The obtained mixture of peptide fragments is subjected to mass spectrometry to obtain a mass spectrum (MS1 spectrum). Subsequently, from the mass spectrum data of the mixture of the peptide fragments, a group of isotope peaks originating from a single peptide are selected as precursor ions. Then, these precursor ions are dissociated into fragment ions by CID, and a mass spectrometry of these fragment ions, i.e. the MS2 analysis, is performed. By the CID process, the amino acid sequence making up a specific peptide has its bonds broken at various positions, being divided into fragments having different amino acid residues. Therefore, the obtained MS2 spectrum reflects the amino acid sequence of that specific peptide.
That is to say, the distances of the peaks on the MS2 spectrum correspond to the molecular weights of the amino acid residues. Therefore, it is possible to determine the amino acid sequence from the distances of those peaks. A partial amino acid sequence of the original peptide can be obtained by extracting a sequence tag (i.e. a tag showing a continuous amino acid sequence that can exist in a peptide or protein) from the MS2 spectrum. Furthermore, by subjecting this partial amino acid sequence to an amino acid sequence homology search, such as BLAST® (Basic Local Alignment Search Tool), the protein can be identified. The aforementioned technique of obtaining a partial amino acid sequence of a peptide is called “de novo sequencing” and is widely used.
In another technique, called the “MS/MS ion search”, the protein is directly identified from the MS2 spectrum by using the mass-to-charge ratios (m/z) of fragment ions (product ions). In the MS/MS ion search, the identification process relies on the statistical determination of the degree of coincidence between an MS2 spectrum obtained by an actual measurement and a virtual CID spectrum created in a computer by calculating the distribution of the mass-to-charge ratios of peptide fragments obtained by digesting each and every protein registered in a database with the same enzyme. An expected value indicating the reliability of the degree of coincidence is also calculated from the molecular weight information of the peptides. Commonly known examples of the tools for the MS/MS ion search are “X!Tandem”, which is a piece of open source software, and “Mascot MS/MS ion search”, which is a product of the British manufacturer Matrix Science Ltd.
To identify proteins, the BLAST system uses only the information of character strings representing amino acid sequences, while the MS/MS ion search uses only the information of the mass-to-charge ratios of product ions. Another searching tool, called the “Sequence Tag search”, identifies proteins by using these types of information in a combined form. Similar to the MS/MS ion search, the Sequence Tag search includes attempting identification of a peptide based on the virtual CID spectra of proteins registered in a database and the already revealed amino acid sequence information, and showing the result. One commonly known example of the tools for the Sequence Tag search is the “Mascot Sequence Query”, which is also a product of Matrix Science Ltd (see Non-Patent Document 1). The database search setting screen of Mascot Sequence Query is similar to that of Mascot MS/MS ion search; a difference exists in that a sequence tag is used as input data in place of a list of peaks collected from an MS2 spectrum (see Non-Patent Document 2).
For example, the tags used in the Sequence Tag search look like “M tag (M1, Str, M2)”, where M is the mass-to-charge ratio of the precursor ion of the MS2 analysis, M1 is the mass of one ion P1 in the MS2 spectrum, M2 is the mass of another ion P2 in the MS2 spectrum, and Str is a partial amino acid sequence corresponding to the difference between the two ions P1 and P2. That is to say, the Sequence Tag search uses the mass-to-charge ratio of the precursor ion, the partial amino acid sequence as well as the mass-to-charge ratios of the peaks at the starting and ending points of the partial amino acid sequence to identify the peptide. As compared to the MS/MS ion search, the Sequence Tag search is characterized in that the peptide can be identified with high reliability even from a small number of peaks. As compared to BLAST, the characteristic exists in that the protein can be identified even from a shorter amino acid sequence.
As just described, the Sequence Tag search is an effective technique for protein identification. However, to ensure its high identification accuracy, a highly reliable sequence tag must be given to the system. To address this problem, several methods for generating sequence tags for peptide identification have been proposed in recent years. For example, in a method described in Non-Patent Document 3, any ion peaks other than the ions corresponding to the b+, y+, b++ and y++ fragments are removed from an MS2 spectrum obtained by an MS2 analysis of a triply-charged peptide. In other words, any ion peaks other than the ions forming either a singly charged or doubly charged pair are removed. Then, on the assumption that the remaining peaks are highly reliable, one or more sequence tags that can be derived from these peaks are listed as possible choices. Several other methods for generating sequence tags based on an MS2 spectrum have also been proposed. However, it is not always easy to assuredly obtain highly reliable sequence tags, because the S/N ratio of MS2 spectra is generally low and the peaks of the product ions are often prevented from being clearly observable.