Discriminators are not entirely suited to the field of assisting decision-making. When a datum is presented to a discriminator, the latter proposes a decision on belonging to a class (possibly provided with a belonging index), a class being a set of data having analogies. However, the user is generally not expert in statistical learning and classification. There is then the fear that the decision rendered by the discriminator may be considered with too much confidence or skepticism, some users systematically accepting the automatic decision, others never acting on them.
One solution to remedy this problem is the use of a dimensionality reduction method enabling the data to be represented in a Euclidian space, usually with two or three dimensions, preserving the distances between data. A critical point for the understanding of data by the user is that the data is generally of large dimension, and therefore unintelligible. Hereinafter, the expression “original data” refers to all of the data enabling construction of the representation and the expression “points of the representation” refers to its equivalents in the representation space. Thus the expression “original space” will designate the space of the original data and the expression “representation space” will designate the space over which the representation is completed, this space being sometimes referred to as a “map”. Thus dimensionality reduction methods enable the relations between data to be summarized in the form of a map on which the position of the points may be described with the aid of a small number of parameters. This enables a user to have an intuitive vision of the organization of the data in the original space. Understanding the distribution of the classes then offers the user a means of making an informed decision. In particular, one popular means consists in constructing a map of the data in a plane and optimizing the preservation of the distances.
The benefit of this type of approach may be illustrated by an example relating to the recognition of objects that consist of handwritten characters. In this example, the data may consist of 8×8 pixel grayscale imagettes of handwritten digits, in which case an imagette may be seen as a point in a space with 64 dimensions. The data may thus belong to ten classes corresponding to the ten digits from 0 to 9. It is then a question of placing the imagettes in a two-dimensional space formed by the map so that the Euclidian distance between the representations of two imagettes on this map is as close as possible to the distance between the two imagettes themselves in the original space in the sense of a measurement of dissimilarity. Accordingly, the proximity of two imagettes is materialized by the proximity of the points that are associated with them on the map.
Most existing methods for constructing a data map are non-supervised methods, i.e. methods that do not take account of the data possibly belonging to classes of data in order to place them on the map. For example, in the case of imagettes of handwritten digits, the data may be divided into ten balanced classes corresponding to the ten digits (0, 1, 2, . . . , 9), each imagette being labeled as belonging to one of these ten classes. A non-supervised method thus leads to a map of the imagettes in which the imagettes are placed without taking account of the digits that they represent. A major drawback of this is that classes may be mixed without this corresponding to a reality specific to the data. In such cases, an essential character of the data set is lost. Moreover, the organization of the representation offered to the user may become relatively illegible.
A classic supervised solution is discriminating factorial analysis (DFA) (see Fisher R. A., “The Use of Multiple Measurements in Taxonomic Problems”, Annals of Eugenics, No. 7, p. 179-188, 1936; Gilbert Saporta, Probabilités, Analyse des données et Statistique, 2006), which is a linear method enabling a supervised representation of the data to be proposed. The object of this method is to find a subspace in which the orthogonal projection of the data provides the best discrimination of the classes, i.e. the method searches for the projection that minimizes the ratio between the intra-class and inter-class variance. This method has two major drawbacks, however. On the one hand, DFA is linear, and is therefore not efficacious if non-linear relations exist between variables. On the other hand, DFA assumes that the data space is Euclidian.
A generalization of DFA intended to take account of non-linear relations by using the “kernel trick” has also existed since 1999. This method, known as “Kernel Fisher Discriminant Analysis” (KFD) (Mika S., Rätsch G., Weston J., Schölkopf B., Müller K-R., “Fisher Discriminant Analysis with Kernels”, Neural Networks for Signal Processing, Vol. 9, 1999, p. 41-48) functions in a manner comparable to DFA, but in a space augmented by the kernel used. This method has the usual drawbacks of kernel methods, however. In particular, it is indispensible to choose a kernel, which is not a simple matter, as indicated by the abundant literature on this subject. Moreover, a relatively simple model implicit in the data is assumed. However, there exist numerous data sets to which this assumption does not apply.
A number of “pseudo-supervised” dimensionality reduction methods have also been proposed. They mostly correspond to non-supervised methods in which the distances undergo preprocessing before placement on the map. The following methods may be cited:                “Supervised Curvilinear Components Analysis” (Laanaya H., Martin A., Aboutajine D. and Khenchaf A., “A New Dimensionality Reduction Method for Seabed Characterization: Supervised Curvilinear Component Analysis”, IEEE OCEANS'05 EUROPE, Brest, France, 20-23 Jun. 2005; Laanaya H., Martin A., Khenchaf A. and Aboutajine D. “Une nouvelle méthode pour l'extraction de paramètres: l'analyse en composante curvilinéaire supervisee, Atelier Fouille de données complexes dans un processus d'extraction de connaissance”, Extraction et Gestion des Connaissances (EGC), pp. 21-32, Namur, Belgium, 24-26 Jan. 2007);        “Supervised Locally Linear Embedding” (O. Kouropteva, O. Okun, A. Hadid, M. Soriano, S. Marcos, and M. Pietikainen., “Beyond locally linear embedding algorithm—Technical Report MVG-01-2002”, Machine Vision Group, University of Oulu, 2002; D. de Ridder, O. Kouropteva, and O. Okun., “Supervised locally linear embedding—Lecture Notes in Artificial Intelligence”, 2714:333-341, 2003; D. de Ridder, M. Loog, M. J. T. Reinders, “Local Fisher embedding”, in Proceedings of the 17th International Conference on Pattern Recognition, 2004, pp. 295-298);        “Supervised Isomap (S-isomap)” (S. Weng, C. Zhang, Z. Lin, “Exploring the structure of supervised data by discriminant isometric mapping”, Pattern Recognition 38 (2005) 599-601; Geng X., Zhan D. C. and Zhou Z. H., “Supervised nonlinear dimensionality reduction for visualization and classification”, IEEE Transactions on Systems, Man, and Cybernetics, Part B 35(6): 1098-1107, 2005);        “SE-isomap” (Li C. G. and Guo J., “Supervised isomap with explicit mapping”, in Proceedings in the 1st IEEE International Conference on Innovative Computing, Information and Control, ICICIC '06, Beijing, China, August 2006).        
One way or another, these “pseudo-supervised” methods always use a matrix of the modified distances in order to increase artificially the inter-class distances and/or to reduce the intra-class distances. A non-supervised method is then employed that uses the modified distances. Thus the classes are always visually identifiable in the representation, even if the classes are perfectly mixed in the data space. This kind of technique is thus more of a means of visualizing the classes individually than a means of apprehending the spatial organization of the data, the latter being highly degraded by the pre-processing. Moreover, because of the manipulation of distances, the distances in the original and representation spaces are no longer comparable with techniques of this kind. This may prove disadvantageous if the distances themselves make sense to the user, as in the case where they wish to use map evaluation methods (Shepard diagram, etc.) or to place points a posteriori without knowing the class. This latter point is particularly disadvantageous in the field of discrimination decision assistance, i.e. when it is a question of determining the class of a test datum knowing the reference data class.
A non-supervised dimensionality reduction method known as “Data-Driven High Dimensional Scaling” (DD-HDS) (Lespinats S., Verleysen M., Giron A. and Fertil B., “DD-HDS: a tool for visualization and exploration of high dimensional data”, IEEE Trans. Neural Netw., Vol. 18, No. 5, pp. 1265-1279, 2007) was developed to overcome the aforementioned drawbacks. The DD-HDS method suggests, among other things, using a weighting function G enabling more or less importance to be assigned to distances according to whether they are large or small, taking into account the phenomenon of concentration of the measurement. This method makes it possible for example to visualize in spaces with two or three dimensions data from much larger spaces, preserving the spatial organization of the data. This makes it possible to visualize classes if a link exists between the classes and the spatial organization of the data. Unfortunately, as explained hereinafter, in difficult cases it is impossible to avoid making representation errors, whether the method is supervised or not. The differences between the results of the most efficacious methods are generally linked to the position of said errors. Now, in the context of the DD-HDS method, such errors may well impede reading of the map by scrambling an organization linked to the classes. In such a situation, it becomes hazardous to determine the class of an unlabeled datum from its position on the map.