The classification problem can be formulated as a verification scheme where the objective is to determine if a pair of points is positive or negative, i.e. positive if the points belong to the same class, negative otherwise. Given a set of labeled data, one can try to learn the metric that gives a discriminative distance in a given feature space that would perform the best for this task; which is to give a low distance between points of a positive pair and a high distance to points of a negative pairs. A detailed overview of this type of methods can be found in [1].
Global metric learning has become popular to improve classification algorithms, such as the K-nearest neighbors (KNN) classification algorithm [2]. It often consists of estimating the optimal covariance matrix of the Mahanlobis distance that will be used for classification. While these kinds of global metrics have shown impressive improvements for classification, they do not capture local properties in the feature space which may be relevant for complex data distributions. To overcome this difficulty, a two step approach is generally employed [3, 4]. Firstly, a global metric is learned and training points in the feature space are transformed accordingly; secondly, a local metric is estimated in the neighborhood of each transformed training point. These local metrics allow for better adaptiveness to the data but often require an heuristic choice of locality.
The proposed invention is instead a hierarchical global-to-local approach, where the metric is iteratively refined and learned using the data distribution Itself. The approach starts by estimating a global metric and applying the corresponding (metric) transformation to the data. Then the transformed points are clustered to obtain a set of K clusters. These two steps are recursively applied to each cluster until a termination criterion is satisfied on the cluster. such criteria can be e.g. maximal height in the tree, minimal variance of the data points in the cluster or a minimum number of data points in the cluster. This forms a tree with a metric associated to each node.