Currently, various statistical approaches are available for analyzing a high dimensional dataset. The commonly used approaches include filter, wrapper and embedded methods. Filter methods evaluate each gene via discriminative power without considering a combined effect of the gene group (Dudoit et al., Journal of the American Statistical association. 97, 2002, 77-87). Wrapper methods utilize a particular learning method as the feature evaluation measurement to select the gene subsets regarding the minimization of the classification errors and build the final classifier (Rivals, I. and Personnaz, L. 3, 2003, 1383-1398). Golub et al. (Golub, T. O., et al. Science. 286, 1999, 531-537) also proposed a gene selection approach utilizing support vector machines (SVM) based on recursive feature elimination.
However, these methods are developed purely from gene expression data without utilizing any biological gene network knowledge. The results generated from these methods have poor accuracy and preciousness. Accordingly, these methods are not suitable for biological analysis in particular for determining the relationship between a biological feature with a disease.
There remains a strong need for systems and associated methods for determining an association of biological features like gene expression with a medical condition which are effective and ensure sufficient accuracy in case of high-dimensional microarray data.