In the area of molecular diagnostics, microarray data and proteomics data are increasingly being used to develop new tests for classifying patients. An example of such a test is described in “Multiclass classification of microarray data with repeated measurements: application to cancer” by K. Y. Yeung and R. E. Bumgarner, in Genome Biology, 2004, 4:R83.
Classification of microarray data and proteomics data may relate to, for example, diagnosis and patient stratification. Finding the right biomarkers, for example the right set of genes or proteins, to base this classification on, and finding the right rule to translate the measurements of these biomarkers into a classification, is of prime importance, as it may have a large impact on the classification accuracy. Given the biomarkers and the classification rule, new cases can be classified in a clinical setting or at a general practitioner.
Microarrays offer an important tool to biologists by facilitating the ability to simultaneously measure thousands of gene expression levels per sample. One of the main tasks of microarray classification is to map a set of gene expression measurements, the features, to a given target label, i.e. a patient's class. In contrast to measuring a person's body temperature or a person's height, measuring gene expression levels is very challenging, costly and time consuming. It is a multi-step process in which many individual procedures have to be performed. Some of these steps involve conditions that cannot be fully controlled and may cause the classification result to be unreliable.