The identification and/or characterization of significant or useful features is a classic problem in the analysis of indexed data. Often this problem is reduced to separating the desired signal from undesired noise. Transient features, specifically peaks are frequently of interest. For indexed data, a peak appears as a deviation, for example a rise and fall, in the responses over consecutive indices. However, the appearance of background noise can also result in a deviation of responses for indexed data.
Traditionally, peak detection based upon rejecting responses below a threshold value has been used. Whether manual or automated, selection of a threshold is still an art, requiring arbitrary and subjective operator/analyst-dependent decision making. The effectiveness of traditional peak detection is affected by signal to noise ratio, signal drift, and varying baseline signal. Consequently, an operator or analyst may have to apply several thresholds to the responses over different regions of indices to capture as much signal as possible, which is difficult to reproduce, suffers from substantial signal loss, and is subject to operator/analyst uncertainty.
For example, in developing statistical analysis methods for MALDI-MS (matrix-assisted laser desorption/ionization--mass spectrometry), current peak detection and characterization algorithms are inadequate. The MALDI-MS process begins with an analyte of interest placed on a sample plate and mixed with a matrix. The matrix is a compound chosen to absorb light of wavelengths emitted by a given laser. Laser light is then directed at the sample and the matrix absorbs the light energy, becoming ionized. The ionization of the matrix results in subsequent ionization of the analyte as analyte ions 100 (FIG. 1). A charge is applied at the detector 104 that attracts the analyte ions 100 through a flight tube 102 to the detector 104. The detector 104 measures the abundance of ions that arrive in short time intervals. The abundance of ions over time is converted to the abundance of ions as a function of mass/charge (m/z) ratio. The ions 100 arrive at the detector 104 in a disperse packet which spans multiple sampling intervals of the detector 104. As a result, the ions 100 are binned so that the are counted over several m/z units as illustrated in FIG. 2. Current algorithms require the user to specify a detection threshold 200; only peaks 202 exceeding this threshold will be detected and characterized. The detection threshold procedure is conceptually appealing and suggests that m/z values for which no ions are present will read zero relative abundance, while m/z values for which ions are present will result in a peak. The list of MALDI-MS peaks produced by the instrument depends on how a given user sets the detection threshold 200 on any given day. This required human intervention makes complete automation impossible and induces variability that makes accurate statistical characterization of MALDI-MS spectra difficult.
Operator, instrumental and experimental uncertainty add noise to the MALDI-MS spectra, decreasing even further the effectiveness of current peak detection algorithms. If the user-defined threshold 200 is set too low, noise can erroneously be characterized as a peak. However, if the user-defined threshold 200 is set too high, small peaks might be erroneously identified as noise.
Related to the problem of distinguishing signal from noise is bounding uncertainty of the signal. It is well known that replicate analyses of a sample often produce slightly different indexed data.
Thus, there is a need in the art of indexed data collection and analysis for a method of processing indexed data that provides greater confidence in identification/characterization of spectral feature(s), and/or greater confidence in separating signal from noise with less signal loss that is robust and minimizes the adverse effects of low signal to noise ratio, signal drift, varying baseline signal and combinations thereof. In addition, there is a need for a method for characterizing multi-dimensional uncertainty of the signal.