Field of the Invention
The invention relates to a computer program algorithm for identifying and selecting ion peaks from mass spectral raw data and generating a peak list. The invention is directed to an apparatus or a system suitable for performing peak detection automatically such that further protein analysis can be pursued in mass spectrometry.
Description of the Related Art
Generally, a peak detection method has played an important role in protein identification using mass spectrometry. A peak list provides information of precursor ions selected for fragmentation to generate tandem mass spectra by a mass spectrometer. The list also provides information on fragment ions that are dissociated from the selected precursor ions. The information is entered into a protein analysis program, such as database search (DB) or de novo sequencing. The ion peaks in the peak list are used to be matched against amino acid sequences in a database, or to construct an ion series that best represents the peptide provided by a de novo sequencing method, or to thereafter identify the protein from the determined peptide sequences.
In general, the algorithm for picking peaks in peak detection software is to find signal peaks from noise; and ion peaks are usually represented by the mono-isotopic peak if the peak is sufficiently resolved from the other isotopic peaks in the spectra. For the purpose of distinguishingly detecting ion peaks from noise, signal processing techniques are required to reduce noise and resolve ion peaks buried in noise and overlapping peaks. A various computer-based mathematical methods have been applied to improve the resolution of overlapping peaks, to fit models to confirm assumptions from the expected features found in spectra and to recover information not directly observed in the spectra because of the instrumental limitations.
In using a peak detection computer program in mass spectrometry, all the signal peaks present in a mass spectrum should be found and confirmed through the program, such that a peak list can maximally represent ion information acquired from the experiment. From the existing peak detection software, the derived peak list is often quite short and may contain only a small number of peaks with distinguishing intensity values. It may, on the other hand, be long and contain a large number of peaks including many false positive peaks. For an example, FIG. 1, shows a peak at 900 Da (peak A) of a significantly high intensity. Most peak detection software can easily find and select it as an ion peak. There is another ion peak at 1001 Da (peak B) of a low intensity. This may not be as easy to determine as the previous peak because the intensity for this peak is close to noise peaks in the spectrum. One of the two pieces of existing peak detection software may detect only peak A, but not peak B. The other can detect peak B, but also includes noise peaks, like peak C.
There are also peak detection methods of using an idealized model to fit a spectral profile, and there is certainly a case where the peak shapes may be a key point in finding signal peaks. In these methods, a set of criteria is used to analyse the correlation between the model and spectral data. This works well when peak shapes, such as isotopic peaks in a cluster, are well resolved in the spectra. But when the peaks in a spectrum are of a poor resolution; unreliable correlation from the analysis may be consequently obtained. It becomes difficult to pick correct ion mass peaks or results in real ion peaks being undetected, particularly for those peaks with low intensity. In addition, using this type of method often requires longer computing time to process the whole spectra.
The above peak detection processes and programs are described in Du, P.; Kibbe, W. A. and Lin, S. M.; (2006) Bioinfomratics, 22, 2059-2065, Gras, R.; Muller, M.; Gasteiger, E.; Gay, S.; Binz, P-A.; Bienvenut, W.; Hoogland, C.; Sanchez, J-C.; Bairoch, A.; Hochstrasser, D. F. and Appel, R. D.; (1999) Electrophoresis; 20, 3535-3550, and Yang, C.; He, Z Y. and Yu, W C., BMC Bioinformatics, (2009) 10:4
(1) All the Signal Peaks are Contained in the Peak List While Noise Peaks are Eliminated.
With influences of various factors of instrumentation on the results of experiments, peaks recorded in a spectrum become difficult to identify if the peak shape is distorted from its ideal shape or buried within noise peaks. From a computer program standpoint, it is really a challenge to build an accurate model to reflect these variations. Because of the limitation of method used for identifying ion peaks from spectra, if a peak list contains only peaks with significant intensity from the resolved shapes, some ion information may be lost. In contrast, more noise peaks would be included in order to detect ion peaks with low intensity. The peak lists in both cases do not reflect the best analysis result which would normally be expected from the spectra of a mass spectrometry experiment.
A good database search engine selects expected ions from a peak list to match the proposed ions provided in the sequence database. The problems may be raised in using those peak lists as stated in “Background of the Invention”. For a short list where some ion information is lost, the number of ions given in the list is not enough to matching a correct sequence or easy to lead to false hits whereas for a long list including more noise peaks, a wrong ion may be matched by the noise peaks. It also requires a longer time to handle all the peaks in the list. The ambiguity in determining peptide sequence may be even increased when those peak lists are used in de novo sequencing software because the de novo sequencing method usually applies a high quality requirement to the peak list.
(2) Signal Peaks of Low Intensity are Also Detected.
A peak list of sufficient quality contains not only peaks of high intensity but also low intensity and keeps the number of false peaks to a minimum. This requires a peak detection method to determine an accurate noise level present in the spectrum. But this has never been a trivial job in peak detection methods because there are several uncertain factors in finding the distribution of noise. Noise varies depending on what instruments are used, or what mass, or intensity ranges are selected. Inappropriate identification of noise level will generate misleading signals in the peak list. In the existing software, to optimize the selection of ion peaks, various parameters and tolerance values are applied. Thus, more parameters are usually required for the methods. The parameters commonly used include: signal-to-noise ratio, intensity threshold, local maximum and peak width, and so on. If peak shape and distribution are also considered in peak detection, extra criteria are used to judge if a proposed model is fitted to the selected peak. Those parameters are set in the program or determined by the experienced user and entered through an interface. An optimized combination of the parameters may give reasonable results to certain spectra but may not be suitable for applying to other spectra. A test report (Yang et al, 2009) has shown that the effect to increase sensitivity by using those peak detection programs is to bring high false discovery rates. That means more noise peaks represent in the peak list.
With spectral quality varying from experiments and parameters set for different conditions, it is even more difficult to select suitable parameters for all the spectra involved in an analysis, particularly for generating a combination of peak lists in a robust way for high-throughput mass spectrometry data.