The invention relates to prototype-based algorithms used for pattern recognition, sequence and image processing. Clustering techniques are primarily aimed at partitioning a set of data points into subsets in order to estimate their density distribution, respectively. Clustering is an unsupervised technique, that is, it does not take into account labels of the data points. However, it is an important task with high relevance for a plurality of applications, such as pattern recognition, sequence processing and image processing. Conventional tools for clustering often use known prototype-based algorithms, such the Self-Organizing Map, k-means clustering and the Neural Gas network algorithms.
Neural algorithms, such as a Neural Gas algorithm, are modeled as a set of neurons, each of which responds to a particular stimulus pattern called a “prototype.” A Neural Gas network maps data points from a (normally) high-dimensional data space D onto a set of N prototypes. The data space D is typically a subset of real-valued d-tuples (D⊂Rd). Each prototype is associated with a weight vector wiεD (i=1 . . . N). In order to use such a network for clustering, the network must first be “trained.” During the training of the Neural Gas network, a sequence of training data points vjεD (j=1 . . . M) is presented to the set of prototypes with respect to the data density distribution. Each one of these training data points is then mapped onto the prototype that is “closest” in distance to the presented training data point. The distance between a training data point v and a prototype wi is usually determined by applying the Euclidian norm to the weight vector and data vector: d(v,wi)=∥v−wi∥. The weight vectors of the closest prototype and prototypes in its neighborhood are then “adapted” or shifted towards the data vector of the presented training data point as follows:Δwi=ε·h(v,wi)·(v−wi),
wherein ε is a parameter for the learning rate, wi is the weight vector of the prototype with index i, v is the data vector of the presented training data point and h(v,wi) is a neighborhood function. The neighborhood function of the Neural Gas network is usually defined as:
            h      ⁡              (                  v          ,                      w            i                          )              =          exp      (              -                              k            ⁡                          (                              v                ,                                  w                  i                                            )                                σ                    )        ,
wherein σ is a scaling parameter for the size of the neighborhood and k(v,wi) is the so called rank function which yields the number of prototypes that are closer to the presented training data point than the prototype with index i. The adapting rule for the weight vectors follows, in average, a dynamic according to a potential function ENG. An illustrative potential function is:
            E      NG        ∝                  ∑        vj            ⁢                        ∑          wi                ⁢                              h            ⁡                          (                                                v                  j                                ,                                  w                  i                                            )                                ·                      d            ⁡                          (                                                v                  j                                ,                                  w                  i                                            )                                            ,
wherein h(vj, wi) is the neighborhood function of the Neural Gas network and d(vj,wi) is the distance measure between the prototype with index i and the training data point with index j. Each weight vector wi is adapted along that direction where the cost function ENG decreases most strongly. Neural networks and potential functions are described in more detail in Martinez et al.: “Neural-gas network for vector quantization and its application to time-series prediction”, in IEEE Transactions on Neural Networks, v. 4, iss. 4, pp 558-569, 1993, which article is incorporated herein by reference in its entirety.
In addition to conventional algorithms for clustering of unlabeled data, there are also known prototype-based classification algorithms that work on labeled data in a supervised scheme. In accordance with these latter algorithms, labeled prototypes are distributed in the data space and trained to detect and represent different data classes. A trained algorithm (also called a “constructed classifier”) can be used to assign unlabeled data points to one of the different data classes. Important approaches along these lines are the Learning Vector Quantization (LVQ) and recent developments like the Generalized LVQ (GLVQ) or the Supervised Neural Gas network (SNG) by Villmann et al. (“Supervised neural gas for learning vector quantization”, in: 5th German Workshop on Artificial Life, IOS Press, pp 9-18, 2002).
The SNG transforms the unsupervised Neural Gas network into a supervised classifier. Here, the prototypes and data points for training the prototypes are labeled, i.e. each of them is associated with one class label (cv=label of data point, cwi=label of prototype with index i). The cost function of the SNG is modified as follows:
            E      SNG        ∝                  ∑        vj            ⁢                        ∑          wi                ⁢                                            h              *                        ⁡                          (                                                v                  j                                ,                                  w                  i                                            )                                ·                      sgd            ⁡                          (                                                                    d                    ⁡                                          (                                                                        v                          j                                                ,                                                  w                          i                                                                    )                                                        -                                      d                    -                                                                                        d                    ⁡                                          (                                                                        v                          j                                                ,                                                  w                          i                                                                    )                                                        +                                      d                    -                                                              )                                            ,
wherein h* is a modified neighborhood function, sgd(x) denotes the well known logistic function, d(wi,vj) is the distance measure between the prototype with index i and the training data point with index j, and d− denotes the distance of the closest mislabeled prototype to the training data point with index j. The term of the logistic function stems from the GLVQ. The modified neighborhood function h is defined as:
            h      *        ⁡          (              v        ,                  w          i                    )        =      {                                        exp            (                          -                                                k                  ⁡                                      (                                          v                      ,                                              w                        i                                                              )                                                  σ                                      )                                                              c              v                        =                          c              wi                                                            0                                                                    c                v                            ≠                              c                wi                                      ,                              
wherein σ is a scaling parameter for the size of the neighborhood and k(wi,v) is the rank function. Only the closest misclassified prototype and prototypes that have the same class label as the presented training data point contribute to the cost function. The minimization of the cost function ESNG results in shifting all prototypes having the same label as the presented training data point toward the training data point and in shifting the closest misclassified prototype away from the presented training data point.
A further development is the Supervised Relevance Neural Gas network (SRNG) by Villmann et al. (“Supervised Neural Gas and Relevance Learning in Learning Vector Quantization”, in: 4th Workshop on Self-Organizing Maps, Kitakyushu (Japan) 2003, pp. 47-52). Here, the distance measure between training data points and prototypes is extended by introducing relevance parameters that weight each dimension of the data space differently.
One major assumption of most conventional classification approaches is that the assignments to classes have to be “crisp”, that is, training data points and prototypes must be uniquely assigned to one of the classes. It is possible to relax the crisp assignment requirement for the prototypes by a subsequent post-labeling of the prototypes after unsupervised training according to their responsibility to the training data points, thereby yielding fuzzy assignments. However, at present, there are no supervised prototype-based approaches to work with fuzzy labels during training, although such approaches would be desirable. In real world applications, especially for classification of biological data, a clear (crisp) classification of training data points may be difficult or impossible. For example, assignments of a patient to a certain disorder (disease) can generally be done only in a probabilistic (fuzzy) manner. Hence, it is of great interest to have classifiers, which are able to manage this type of data.
Here, the term “biological data” or “biological data points” means any data derived from measuring biological conditions of human, animals or other biological organisms including microorganisms, viruses, plants and other living organisms. Biological data may include, but is not limited to, clinical tests and observations, physical and chemical measurements, genomic determinations, proteomic determinations, drug levels, hormonal and immunological tests, neurochemical or neurophysical measurements, mineral and vitamin level determinations, genetic and familial histories, and other determinations that may give insight into the state of the individual or individuals that are undergoing testing.
In the last decade especially, mass spectrometry is increasingly used to investigate biological systems and to generate biological data as defined above. In general, mass spectrometric data can be considered as high-dimensional data because every signal in a mass spectrum related to a particular mass may be considered as a single dimension. Even if a mass spectrum is preprocessed into a list of peaks by selecting only those signals above a threshold, as it is well known in the art, mass spectrometric data may still be high-dimensional and therefore, it would be highly appropriate to analyze this data by prototype-based algorithms.