The invention relates to a partitioning method of partitioning a set of objects, comprising:
a first iterative partitioning step, which utilizes a first partition formed by classes containing an object and which, at each iteration, combines two classes that maximize a criterion of similarity for creating a new partition used as an initial partition for the next step, until a partition is obtained that contains only a single class,
a selection step of selecting from the obtained partitions a partition which comprises an optimal number of classes.
The invention also relates to a multilevel partitioning method of partitioning a set of objects, which utilizes such a partitioning method.
The invention finally relates to a method of searching a predetermined number of objects which are closest to an example in a tree-like structure of objects created by using such a multilevel partitioning method.
The invention further relates to computer programs for implementing such methods. The invention also relates to equipment which comprises data storage and data processing means, of which notably means for implementing such methods. The invention finally relates to a data transmission system which comprises at least such equipment.
The invention has interesting applications in the field of data classification, notably data relating to audio and/or video, for example, MPEG-7 descriptions.
Data storage and transmission capacities augment considerably, so much so that in many fields including the field of consumer electronics, the user, however, has difficulty in managing the information he or she has at his or her disposal. In this context the methods of partitioning a set of objects become ever more important. They notably apply to the navigation in a set of objects and to the search for objects.
The article  less than  less than Symbolic Clustering using a new dissimilarity measure greater than  greater than  by K. Chidananda Gowda and E. Diday, published in the journal  less than  less than Pattern Recognition greater than  greater than , vol. 24 no. 6, pp. 567 -578, 1991, describes a partitioning method of a set of objects as cited in the introductory paragraph and called  less than  less than Hierarchical Agglomerative Clustering Methodology greater than  greater than .
This partitioning method provides an optimal number of classes. But, the obtained partition is not well adapted to the search for objects.
The invention notably has for its object to propose a partitioning method of partitioning a set of objects particularly well adapted to the search for objects. For this purpose, a method according to the invention and as described in the opening paragraph is characterized in that it comprises:
a step of calculating elements which are representative of classes of the selected partition,
a second iterative partitioning step which utilizes the selected partition as a first current partition and which, at each iteration, creates a new partition by associating each object of said set with the element which represents the current partition it is closest to, and calculates elements which represent classes of the new partition, the new partition forming the current partition for the next iteration, until the elements which are representative of the new partition are identical with the elements which are representative of the current partition.
Thus the method according to the invention comprises determining in a first stage a partition that contains an optimal number of classes, and then modifying the classes so that their contents are adapted to the search for objects.
It should be observed that the article xe2x80x9cAn efficient K-means clustering algorithmxe2x80x9d by K. Alsabti, S. Ranka and V. Singh, published on the occasion of the xe2x80x9cIPPS/SPDP Workshop on High Performance Data Mining, 1998, Orlando, Fla.xe2x80x9d describes a partitioning method called xe2x80x9cK-meansxe2x80x9d which permits to divide a set of objects into a number N of classes known a priori, by choosing N start prototypes. This method is iterative. At each iteration each object is combined with the prototype it is closest to so as to determine classes, after which the prototype of each class is recalculated. The new obtained prototype serves as a start prototype for the next iteration. The final partition is obtained when the convergence is reached, that is to say, when all the new obtained prototypes correspond to the start prototypes at the end of an iteration.
This method makes it necessary to know a priori the number of classes of the partition to be constructed and to have a set of start prototypes.
The invention thus notably comprises the use of a xe2x80x9chierarchical agglomerative clusteringxe2x80x9d method for the initialization of a method of the type xe2x80x9cK-meansxe2x80x9d, so as to provide a partitioning method whose number of classes and the contents of the classes are optimal as regards the search for objects.
In a general manner, the invention can be applied to any type of objects provided that a similarity measure f is defined for this type of objects and that this measure of similarity satisfies the following conditions:
f is an application which associates two objects with a real number,
this real number is identical whatever the order in which the two objects are considered,
the real number associated with two identical objects is higher than the real number associated with two different objects.
In a particularly interesting embodiment of the invention said objects are meta data, that is to say, structures containing a set of data. They are, for example, descriptions of video shots, notably descriptions of the MPEG-7 type. The MPEG-7 standard project defines indeed a certain number of descriptors for video shots (color descriptors, text descriptors, camera movement descriptors . . . ), as well as similarity measures associated with these descriptors. For more details reference is made to the document ISO/IEC JTC1/SC29/WG11 N3521 (July 2000) entitled xe2x80x9cCoding of moving pictures and associated audio informationxe2x80x9d, which refers to the document xe2x80x9cVisual Working Draftxe2x80x9d version 4.0.
A second object of the invention is to propose a multilevel partitioning method of a set of objects. In accordance with the invention such a method comprises:
a partitioning step which utilizes a partitioning method as described above for determining a partition of a group of objects,
a step of determining elements which are representative of classes of the obtained partition,
a step of storing said representative elements in a tree-like structure,
said steps being executed a first time with a group of objects formed by said set, then repeated with groups of objects formed by the classes of the obtained partitions so as to obtain classes that satisfy a predetermined criterion,
a step of storing the last partition in said tree-like structure.
The tree-like structure thus obtained forms a partition of the set of objects which has several levels. This type of partition is particularly advantageous for searching in a set of a large number of objects, because it permits to accelerate the search. Actually, with a single-level partition, when the size of the set of objects increases significantly, this brings about either the increase of the number of classes, or the increase of the number of objects contained in each class. In either one of the two cases one is led to compare the example searched for with a much larger number of objects. The processing time thus increases considerably. On the contrary, with a multilevel partition, the example searched for is only compared with a restricted number of objects at each level of the partition. The increase of the size of the set thus has much less influence on the processing time of the search.
A third object of the invention is to propose a method of searching a predetermined number of objects which are closest to an example in a tree-like structure of objects, which comprises nodes and leaves and which has been created with a multilevel partitioning method as described above is used.
According to the invention such a search method comprises the following steps:
a step of passing through the tree-like structure, starting from a node and going to the leaves while passing by the nodes which contain the representative elements that are closest to the example, to select one or various leaves which correspond to one or various objects,
a step of testing whether the number of selected leaves is lower than said predetermined number of objects,
and, if the number of selected leaves is lower than said predetermined number of objects, a new iteration of said steps starting from the brother node of the node passed through last, which contains the representative element that is closest to said example.