1. Field of the Invention
The present invention relates generally to the fields of molecular biology and oncology. More particularly, it concerns the classification of gliomas based on the expression of various proteins identified as relevant to various glioma states.
2. Description of Related Art
Gliomas are complex cancers with different growth characteristics and involves different types of cells. Because the original clone of tumor cells may exist at any stage during the cell differentiation, the boundaries between cell lineages can be blurred. The current morphologically-based tumor classifications often mix cell lineage features with tumor growth characteristics. The results are subjective and there can be disagreements among physicians as to what kind of tumor cell is involved. To date, a successful application of gene-based classification has not been applied to gliomas.
Molecular biology provides the potential for an improved method of tumor cell classification. This is based on the premise that all cell phenotypes have their origin in genetics. Thus, the rationale is that a detailed examination of gene expression will be the most accurate representation of a cell's character. Recent successes in the subclassification of neoplasms within a disease group using gene expression profiles provide support for such a belief (Golub et al., 1999; Alizadeh et al., 2000; Bittner et al., 2000).
Thus, the issue is how to best identify the “strong” feature genes that are closely linked to specific phenotypes from among the thousands of genes in gene expression profiles, and whether this information really aids classification of tumors more. There are many technical challenges in the path to accomplishing the task of finding the key links. Algorithms can assist in the identification of robust classifiers from very limited data sets. Three criteria have to be met: (a) given a set of variables, a classifier from the sample data should provide good classification over the general population; (b) the algorithm should be able to estimate the error of a designed classifier when data are limited; and (c) given a large set of potential variables, the algorithm should be able to select a set of variables as inputs to the classifier from the large number of expression level determinations provided by microarrays.
However, a major roadblock is the small sample size issue inherent to microarray-based classification efforts (Dougherty, 2001). Contributing to this are the limited numbers of human tissues for study and the cost of such gene expression profiling projects. Because classifiers are designed from observed expression vectors that have randomness arising from both biologic and experimental variability, the design, performance evaluation, and application of classifiers must take this randomness into account, especially when the number of samples (tissue specimens) is small, which is the case in most human tissue-based microarray experiments.