1. Field of the Invention
The present invention relates to data classification strategies, and more particularly to a robust data classification strategy using machine learning tools on mass spectrometry data to create a robust phenotype identifier to detect cancer.
2. Description of the Related Art
The field of genomics and proteomics has grown rapidly. However, in spite of much effort there remains a need for robust, clinically useful predictors that might be adopted by the medical community.
Conventional techniques have been developed to use biological data for phenotype identification from data and, more particularly, phenotype prediction for various types of cancer. However, these conventional techniques have limitations, due to the current status of the instruments used to obtain the data, due to a lack of robustness of the selected biomarkers or of the predictive models, due to poor validation and due to a lack of protein biomarker and pathway identification. There exists a need for a robust, accurate and noise insensitive phenotype identifier to distinguish cancer from non-cancer.
Other difficulties with mass spectrometry data include the large data size (e.g., on the order of tens of thousands to hundreds of thousands of features), the need to distinguish subtle differences between phenotype properties, and the incoherence between predictions provided by different techniques. These issues make it necessary to devise a technique to integrate over different methods to get an ensemble view of the data.