1. Field of the Invention
This invention has as its object a polytomous segmentation process, it being understood that it can include dichotomous segmentation.
2. Discussion of the Background
Segmentation, or treelike distribution of sample units (any objects, images, systems, various data, . . . ) is known under the English names of "decision tree" or "tree classifier".
Polytomous segmentation is used in the field of data analysis, shape recognition and discrimination, indeed of automatic classification of data. Standard segmentation methods are called "dichotomous." The use of these methods on explanatory variables (single or multidimensional representation of a measurement) provides a Boolean interpretation of the variable to be explained (interpretation or type attached to the measurement). The result of these standard segmentation methods appears in the form of binary functions that are simple to use.
In a process of analysis of data or of discrimination separated into an inference phase (learning) and a decision phase (generalization), these methods offer certain advantages:
During the learning phase, they make it possible to measure the discrimination ability of the explanatory variables and to provide a description of the discriminatory analysis that can be easily interpreted. PA1 During the generalization phase, in addition to the extreme rapidity of classification (decisions at thresholds in finite number), the user can influence the classification of a sample unit by including in the segmentation criteria a priori knowledge depending on the effect that he desires. The obtained decision rule resembles the methods of decision by successive approximations that are common in human reasoning, and makes it possible to develop between the user and the computer a very clear interactivity that is not found in the other methods of discrimination. PA1 standard discriminatory analysis provides decision rules expressed in the form of algebraic expressions that are often difficult to use; PA1 the method of the closest related K's is sensitive to the metric used and remains, despite the recent work on faster variants, very costly in computation time.
To cite only the most used of these other methods;
The standard methods, however, exhibit the following drawbacks. A standard segmentation method is based on a binary spitting on the variable or variables of the measurement. When the choice or the interpretation to be given of the measurement is binary (choice No. 1 or choice No. 2), they are perfectly suited, but when it is necessary to choose between several possible choices (&gt;2) decision conflicts can result when the segmentation is redone several times with different choice pairs, for example:
X is unknown and can belong to classes No. 1, No. 2 or No. 3 PA0 1st phase: class No. 1 against class No 2+class No. 3 X is assigned to class No. 1 PA0 2nd phase: class No. 2 against class No. 1+class No. 3 X is assigned to classes No. 1 +No. 3 PA0 3rd phase: class No. 3 against class No. 1+class No. 2 X is assigned to class No. 3
Such a situation is due to the assignment errors during each phase and leads to an uncertainty. Further, the number of phases has been multiplied, therefore the number of chances of indecision has been multiplied and the cost of a decision (in time and/or in equipment) has been increased.
Certain variants, at each step, choose the best (in the sense of a specific criteria) classes or the best groups of classes to oppose two by two.
In the latter case, the number of phases is limited to a single one, and also the chances of indecision are limited, but then the a posteriori interpretation of the analysis is complicated by increasing the number of tests and of segments through which the measurement passes.
A final constraint of the standard methods comes from their nonparametric nature, i.e., the search for the optimal split between two classes is performed by a counting of the data on each side of the split, which makes difficult, on the one hand, the generalization of the method when the statistical properties of the measurements are known, and on the other hand, leads to prohibitive computation times when the number and content of the measurements become considerable.
These methods are, for example, described in the following works. BREIMAN L., FRIEDMAN JH., OHLSEN RA. and STONE CJ.: "Classification and Regression Trees" WADSWORTH Publications 1984. CELEUX G. and LECHEVALIER Y.: "Methodes de segmentation non-parametrique, Revue de statistiques appliguees" ["Nonparametrique Segmentation Methods, Applied Statistics Journal"] Vol. XXX, 4 pp. 39-53 - 1982. FRIEDMAN JH.: "A recursive partitionary decision rule for nonparametric classification" IEEE Trans-Computer pp. 404-408 - April 1977. GUEGUEN A. and NAKACHE JP.: "Methode de discrimination base sur la construction d'un arbre de decision binaire" ["Discrimination method based on the construction of a binary decision tree"], Applied Statistics Journal, XXXVI, 1 pp. 19-38 - 1988. BERGONIER H. and BOUCHARENC L.: "Methode de segmentation basee sur la theorie de l'information," M. DASSAULT Prize - 1967.