Cancer is a disease in which functions of normal cells are hindered by indefinite proliferation of cells. Representative examples of cancer include lung cancer, gastric cancer, breast cancer (“BRC”), colorectal cancer (“CRC”), and so on, but cancer can develop virtually any place of the body. While the early stage of cancer diagnosis technology focused on the external changes of biological tissues depending on growth of cancer cells, the recent attempts adopt diagnosis and detection based on biological tissues such as blood, glycol-chain, or DNAs or a trace of biological molecules present on cells. Among these, the most general cancer diagnostic method is done by using tissue samples obtained from biopsy, or imaging technology.
The biopsy has shortcomings including tremendous pain, expensive cost and lengthy time until the diagnosis. If a patient suspected of cancer indeed has cancer, there is a possibility that the cancer spreads during biopsy. Further, for specific sites of a body where biopsy is limited, diagnosing is often not available until suspicious tissues are extracted by surgical operation.
The imaging-based diagnosis basically determines the cancer based on the X-ray image, the nuclear magnetic resonance (NMR) images, or the like, using contrast agent to which disease-targeting substance is attached. The shortcomings of the imaging-based diagnosis include possibility of misdiagnosis depending on expertise of clinician or personnel who reads the data, and high dependency on the precision of the image-acquisition devices. Furthermore, even the device with the upmost precision is not able to detect a tumor under several mm in size, which means that early detection is unlikely. Further, in the process of image acquisition, as a patient is exposed to high energy electromagnetic wave which itself can induce mutation of genes, there is possibility that another disease may be induced and the number of diagnosis by imaging is limited.
Presence and absence of disease in gastric system is generally determined by observation by naked eyes with the use of endoscope. The process is painful and even when abnormality is observed during this examination, biopsy is still required to accurately determine whether the cancer is malignant/benign tumor, polypus, etc.
CRC is the third most commonly diagnosed cancer in the world and the cure thereof hugely depends on the stages of cancer development. That is, CRC is highly curable when detected at an early stage by screening. While early detection is very important, symptoms of this cancer are not palpable until the patient perceives the possibility from changed color of excretion due to presence of blood therein. Generally, a patient or a person suspected of CRC first goes thorough endoscopic examination of large intestines and then necessarily takes biopsy to accurately determine specific disease. That is, for CRC, early detection is critical, but since endoscopic examination of large intestines and biopsy take tremendous time and cost and also are inconvenient and painful, a diagnosis method is necessary, which can considerably reduce the number of subjects of the endoscopic examination and biopsy which can be unnecessary.
Accordingly, by providing CRC screening at an early stage based on new molecular approach, patients will be benefited. The genomics, proteomics and molecular pathology have provided various biomarker candidates with clinical potentials. It will be possible to improve treatment effect by actively utilizing the biomarker candidates in the customized treatment of cancers according to stages and patients, and therefore, many researches are necessary to apply the above in the actual clinical treatment.
The recent CRC screening test includes determination of gross abnormality by endoscopic examination of large intestines, or fecal occult blood test (FOBT) which detects blood in feces. The endoscopic examination of large intestines has been utilized as a standard way of examination in the CRC screening, but due to invasiveness thereof, patients who can receive the examination are limited. Accordingly, many attempts have been focused on the examination of feces, for advantages such as noninvasiveness, no need for colonic irrigation, and transferability of the sample. The fecal marker may include feces oozing, excreted or exfoliated from the tumor. For example, hemoglobin in traditional FOBT was perceived as the oozing type of the marker in the large scale screening program. However, the markers known so far, including the above, have not met the satisfaction.
Meanwhile, it is possible to extract spectra of mass ions within blood using the matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectrometer. The mass spectrometry, generally used in the researches on proteins, mainly categorizes 800 to 2500 m/s mass range as the target of analysis, because the specific range corresponds to the mass value region of peptide when the protein is dissociated by trypsin. It is also possible to extract the mass spectra of los-mass ions by using MALDI-TOF mass spectrometer. However, for the low mass region below approximately 800 m/z where the matrix mass ions coexist, research has not been active on this particular region.
The extracted low mass ion mass spectra can be analyzed by the conventional software, MarkerView™ (version 1.2). The inventors of the present invention analyzed mass spectra of the low mass ions extracted from the serums of CRC patient group and normal group (control, CONT) using MarkerView™ in a manner that will be explained in detail below with reference to FIG. 1.
The low mass ion mass spectra in T2D file format was imported with MarkerView™ from the set (A1) of samples of serums collected from 133 CRC patients and 153 normal controls (11). The condition for import was:
TABLE 1Mass tolerance100 ppmMinimum required response10.0Maximum number of peaks10000
The imported peak intensities were then normalized (A12). MarkerView™ has a plurality of normalization methods, and among these, “Normalization Using Total Area Sums” was employed for the normalization. According to the method, partial sums of the intensities of the respective samples were obtained and mean is obtained, and then each peak intensity was multiplied by a scaling factor so that the sums of the respective samples were in agreement with the mean values. As a result, the partial sums of the intensities of the respective samples became identical after the normalization.
Next, the normalized peak intensities were Pareto-scaled (A13). That is, the peak intensities were Pareto-scaled by subtracting the mean values of the respective mass ions from the respective normalized peak intensities, and dividing the same by the square root of the standard deviation.
Next, with respect to the Pareto-scaled peak intensities, discriminant scores (DS) were computed by performing the principal component analysis-based linear discriminant analysis (PCA-DA) (A14). The PCA-DA was performed by two stages, to obtain factor loadings, which are the weighting factors of the respective mass ions, and the Pareto-scaled intensities were multiplied by the factor loadings. The resultant values were summed, to compute the discriminant scores of the respective samples. The import condition of Table 103 includes maximum 10,000 peaks with sufficient samples imported, so that there were 10,000 factor loadings computed, and one DS was computed by summing 10,000 terms.
Next, it was determined whether the computed DS was positive number or not (A15), and if so, determined positive (A16), and if not, determined negative (A17). In other words, when implemented on CRC, the positive number was interpreted as CRC patient group, while negative number was interpreted as normal control group.
FIG. 2 illustrates distribution of DS which were computed by the method of FIG. 1 with respect to the set consisting of 133 clinically CRC-diagnosed patients and 153 non-cancer subjects. Confusion matrix may summarize and represent the determination results according to the determinant scores. As used herein, the confusion matrix is defined as 3×3 on the right, lower part of Table 2.
TABLE 2Result of Clinical StudyPatientNon-patientPCA-DAPatientTrue positiveFalse positivePositivePrediction(TP)(FP)PredictiveResultValue (PPV)Non-patientFalse negativeTrue negativeNegative(FN)(TN)PredictiveValue(NPV)SensitivitySpecificity
That is, while the confusion matrix basically consists of the number of true positive (IP), false positive (FP), false negative (FN), true negative (TN) instances, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) are also added for convenience of analysis. Accordingly, the determination results according to the determinant scores of FIG. 2 can be summarized by the confusion matrix as follows:
TABLE 3131298.5%215198.7% 98.5%98.7%
Referring to FIG. 2 and Table 3, excellent discrimination result was obtained with all of the sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) exceeding 98% by the conventional PCA-DA of the MarkerView™.
However, the robustness of the formula must be verified for clinical use. That is, even the mass spectra that were additionally measured by a number of times with respect to the dataset that was measured once and constituted discriminant formula, are required to maintain good discrimination results, and the discrimination result based on the same discriminant also has to be sound with respect to new CRC patient groups and non-cancer subjects that were not taken into consideration in the designing of the discriminant. The process of repeatedly measuring mass spectra may include the process of freezing and thawing serums and mixing the serum newly with methanol/chloroform to obtain extract. These processes are considered the disturbances in the statistic analysis with respect to the mass spectra, and clinical implementation is only possible when the discriminant is least influenced by the disturbances.
To sum up, although the conventional PCA-DA explained above with reference to FIGS. 1 and 2 and Tables 1, 2, 3 sometimes exhibit good discrimination result when applied individually to the set of specific samples, i.e., to individual training set, the discrimination result was unsatisfactory when applied with respect to the validation set (Table 7). It appears that the discriminant exhibiting very good discrimination result with respect to the training set, is not so robust because the 10,000 mass ions constituting the discriminant include a considerable amount of mass ions which may be at least unnecessary for the discrimination between patients and non-patient subjects and although not entirely problematic in the discrimination of training set, which can potentially cause confusion in the discrimination result in the discrimination of the validation set. Accordingly, a process is necessary, which exclusively locates mass ions that are absolutely necessary to obtain good and robust discrimination result, by actively removing mass ions which are at least unnecessary or which can potentially confuse discrimination result.