Microarrays have the potential to identify pathways that are altered in disease. This promise has resulted in this technology being aggressively pursued by researchers, hospitals and pharmas because of the possibility that it might lead to improved understanding of the disease process, better diagnostic protocols, new drugs, and new treatment regimens.
For example, a major application of micro arrays has been to the analysis of cancer. The focus has been on identifying genes that are altered in the initiation, progression and metastasis of cancer. Such analysis is confounded by the fact that most cancers are highly heterogeneous, micro-array signals are noisy and there is a large variation in the “normal” levels of most genes. A major cause of the lack of success in treatment is that tumors with similar histopathology signatures have divergent clinical courses and prognosis. Many studies have been done motivated by the hope that molecular signatures can be found that are more correlated with outcome than the currently prevalent histological categories and treatment protocols. Such signatures would provide insight into processes critical to tumor development and lead to potential therapeutics or targets.
However, identification of the subtle signals that identify the disease phenotypes and its progression requires robust techniques. Further, considering the pathological heterogeneity of cancer, it seems reasonable to ask whether these cancer phenotypes are robust and homogeneous and whether they can be reliably stratified into sub-subtypes. Although a number of markers for each of these phenotypes are determined by several different studies, and the cohorts differ in each study, a consensus set of markers that would apply to different types of cancer patients, e.g. breast cancer is still not available.
It would be highly desirable to provide a new robust technique to analyze micro array data, for example, one that uses of principal component analysis and consensus ensemble k-clustering to identify robust clusters and gene markers in the data.