The problem posed in this application is that of discrimination between various objects. The variety of objects and backgrounds present in the natural scenes involved is very considerable and it is complex to discern the objects, all the more so since their distance and optionally their radial speed when these objects are mobile, are not known with acquisitions carried out under passive imaging. For example, at long distance, boats may seem very much like airplanes (similar radial speeds, uniform quasi-rectilinear motion, similar intensity levels, etc.). Moreover, the objects of interest must potentially be processed at long distance, these portending low resolutions and therefore an information cue which is not necessarily very rich with regard to taking a classification decision. Furthermore, the picture-taking conditions (weather conditions, day/night conditions, reflections, dazzle, etc.) modify the signal on these objects, further complicating the discrimination task.
Classification techniques operate by representing the objects by a set of characteristics (speed, SNR, intensity, shape signatures, etc.). The latter define one or more multidimensional spaces of characteristics into which are projected the characteristics extracted from the objects, thus forming “clouds of points” or classes whose boundaries need to be found. These boundaries are “learnt” on the basis of a set of reference objects also called the learning set, whose real type we know (that is to say the nature of the classes is known a priori and without ambiguity). The better the characteristics and the more separated the clouds of points formed by the various classes of interest, the more discriminating are the boundaries found. Likewise, the greater the variety of the objects and the more considerable the number of classes, the more complex it is to properly characterize them and therefore to discriminate them. The rules making it possible to decide an object's membership or otherwise of a class, arise from the learning.
A computer program whose role is to decide to which class a new object provided as input belongs, as a function of the information cues learnt, is called a classifier (or expert). The membership class is determined by applying the decision rules (otherwise called knowledge database) which have themselves been previously learnt on the learning data.
The classification of a new object therefore assumes that the decision rules have previously been formulated.
The formulation of this knowledge database is considered firstly. It is based on a set of known examples called prototypes. The prototypes are often represented by vectors of characteristics where each component is a measurement made on the real objects or on one of their qualitative attributes. Each characteristic therefore becomes an axis in a space whose dimension is equal to the cardinality of the set of characteristics. A prototype is a point projected into this space and this series of measurements, or the set of characteristics of a prototype, forms a representation of the real object and constitutes its signature. The difficulty here is to find “good” characteristics which subsequently allow the classifier to easily recognize the various classes of objects: they are then said to be discriminating.
The learning phase consists in splitting (or separating) the representation space by virtue of boundaries and in assigning class labels to the regions thus formed. The formulation of the knowledge database (or the learning of the classifiers) therefore consists in searching for these decision boundaries. The region where a vector of characteristics is situated determines its membership class.
There exist several schemes for defining a certain number of rules indicating membership or otherwise in a class. These schemes can be decomposed into two large families, one using a so-called structural approach and the other a statistical approach.
The structural approach utilizes the topology of the elementary structures of the objects (the shape is described in the form of elementary structures and of relations between these structures) to define these rules; for example, in syntax recognition, a word is represented by letters arranged in a precise order. Thus, decision trees, expert systems and syntax analysis programs form part of this category of schemes.
Generally, it is not possible to build a perfect partition of space as illustrated in the example of FIG. 1. The three classes of objects, respectively labeled “et1”, “et2” and “et3”, which are represented with the aid of two characteristics, speed and intensity of the object, are clearly delimited by the three boundaries “boundary1”, “boundary2” and “boundary3”. According to a statistical approach, the decision boundaries are learnt with the help of the learning set (or database) presumed to be statistically representative of the real distribution of the classes; hence the major role played by the reference objects of this database. This approach is based on characteristics having the form of a vector of numerical (generally real) values.
The definition of these membership rules results, as will be seen, from a certain compromise.
An object or an observation to be classed (during a subsequent step, the knowledge database having been established previously), therefore becomes a point in the space of characteristics. The knowledge of the spatial distribution of the classes makes it possible theoretically to categorize and therefore to instantaneously recognize the objects thus represented. The boundaries separating the classes in the space of characteristics, called decision boundaries and which ensue from these membership rules, must therefore be the result of a certain compromise between the capacity for generalization and that for memorization. The term generalization is understood to mean the capacity of a classifier to correctly recognize new observations, whereas memorization is its capacity to properly class the examples which served it for learning. The learning of the statistical classifiers is therefore a search for these decision boundaries.
There exist several types of characteristics relating to:                local description: numerous algorithms have been proposed for developing descriptors that are invariant to changes of scale and to affine transformations;        description of shape: if the radiometric information cue is significant, many objects or classes of objects are characterized by their shape. Taking, for example, the class of humans, it is impossible to make do with information cues regarding gray levels and it is necessary to describe the shape of the silhouettes. For example, this shape can be described with the help of the 2D spatial derivatives, of the 2D contours or even of the 3D shape.        description of texture: the description of the texture is combined with non-supervised classification algorithms or in a more general manner with algorithms which describe data distributions. It is then possible to obtain an appropriate texture description which is discriminating and invariant to image transformations. The use of a large quantity of learning images makes it possible to model real textures such as, for example grass and foliage and therefore to model certain types of images, such as for example natural outdoor scenes.        
The characteristics are generally based on local measurements made on the object to be recognized. Texture descriptors or schemes of the “bag of words” type (J. Ponce, M. Hebert, C. Schmid, and A. Zisserman (eds.), Toward Category-Level Object Recognition, Springer-Verlag, Lecture Notes in Computer Science, Vol. 4170. In press) allow context to be taken into account to some extent, however these approaches are often expensive.
The classes, their labels and the rules of membership in these classes having been established, the step of classifying a new object in one of these classes is now considered; this is a multi-class classification problem. A high number of classes renders these problems difficult to solve and gives rise to high computational complexity.
There exist statistical approaches for solving multi-class problems. Two families of schemes are distinguished:                the conventional schemes such as the K nearest neighbors or neural networks which consider all the classes at once; these are multi-class schemes,        the other schemes combine binary classifiers with “one against all” or “one against one” strategies, examples of which are described respectively in the publications “Duda, R., Hart, P., & Stork, D. (2000). Pattern Classification. New York, N.Y.: Wiley-interscience” and “Hastie, T. & Tibshirani, R. (1998). “Classification by pairwise coupling,” 1997 Conf. On Advances in neural information processing systems, The MIT Press, Vol. 10, 507-513”.        
In the “one against all” strategy, the similarity between the various classes is not taken into account. There is therefore no guarantee as regards the existence of a discrimination between the classes. This poses a genuine problem of performance of the classification module. The “one against one” strategy exhaustively decomposes a problem with Q classes into a problem with CQ2 classes. Such a strategy considerably increases the number of classes as well as the computation times.
In order to improve the separation of the classes and the readability of the classification problem, the structural approach based on a decision tree may be relevant. However, the construction of these trees is difficult both at the level of the choice of the attributes to be used for each node of the tree, and at the level of the depth of the tree. Moreover, even if such a structure is comprehensible to a human, it does not guarantee good discrimination.
Mixed approaches, combining decision trees and statistical approaches, have recently appeared in the state of the art and propose a cascade of boosted classifiers; it is possible to cite “Viola & Jones (2001) Rapid object detection using a boosted cascade of simple features. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition”. The advantage of such schemes is mainly that of minimizing the computation time spent on the simple cases and of devoting more processing time to the difficult cases. These approaches are used for binary classification applications (face or pedestrian detection). However, work has been carried out to extend these algorithms to multi-class classification problems. The major drawback of these techniques is the difficulty in comprehending and interpreting the manner in which these algorithms operate.
In the field of teledetection where one seeks to recognize the nature of the natural coverage of the observed scene, there exist applications using a Digital Terrain Model (or DTM) coupled with a classification (supervised or non-supervised). One seeks in this case to recognize the type of natural coverage of the scene (glacier, lake, forest, field, etc.) and not objects of interest in the scene.
In the field of imaging-based surveillance, schemes for classifying targets are little described. The majority of the approaches presented are based on tracking-evolution models and belief functions. When the discrimination task is difficult, the computation time/performance compromise is difficult to obtain.