In recent years, the technique of boosting a set of simple or weak classifiers in order to obtain an overall strong classifier has evolved into a powerful solution, especially in the domain of image object detection. Image object detection is becoming increasing more popular and can be used in a number of different detection scenarios. Examples of such applications include face detection, pedestrian detection, traffic sign detection, and vehicle detection.
Boosting techniques are particularly effective for detecting a single object class. However, when extending this approach from the detection of one single object class to the detection of multiple object classes, its complexity scales linearly with the number of classes. Many detection applications require multiple object class detection in order to be effective. An example of such an application is vehicle detection where separate object classes may be defined for vehicles, trucks, pedestrians and traffic signs. Another example of a detection application that requires multiple object classes is people detection. Particularly, if the people are in motion, it is more effective to define people sub-classes based on the difference poses or actions of the people. For example, such sub-classes could include sitting, standing and walking.
For the task of object detection in images, a known approach uses a learning/detection framework that is based on boosting. Boosting selects and combines a setH={h1, . . . , hT}  (1)of simple or weak classifiers ht: X{+1, −1}, each of it taken from a large set of classifier candidates to form a final or strong classifier. For the problem of object image detection, X is the set of all image patches, the class +1 corresponds to an object and the class −1 to a non-object. Given an additional set of weighting factorsα={α1, . . . , αT}  (2)the object detection is solved by evaluating the strong classifier h on candidate image patches xεX. The decision h(x) is computed from the weighted sum of the weak classifier decisions, that is,
                              f          ⁡                      (            x            )                          =                                            ∑                              t                =                1                            T                        ⁢                                          α                t                            ⁢                                                h                  t                                ⁡                                  (                  x                  )                                                              -                      θ            ⁢                                                  ⁢            and                                              (        3        )                                                      h            ⁡                          (              x              )                                =                      sign            ⁢                                                  ⁢                          (                              f                ⁡                                  (                  x                  )                                            )                                      ,                            (        4        )            where θ is a threshold allowing the user to balance false alarm and miss detection rate. An optimal selection of the weak classifiers ht and a proper weighting αt is obtained from an AdaBoost training algorithm.
This technique has been applied very successfully to the detection of single class objects, for example, faces, vehicles, pedestrians, etc. However, situations exist where the detection of objects of a multitude {1, . . . , L} of classes within one scene is desirable, for example, a combined detection of cars, trucks, pedestrians and traffic signs in traffic scenes.
Based on the above-described algorithm, a naïve solution would use AdaBoost to train one individual ensemble of weak classifiers H(l) and weights α(l) for each class l, that is,{H(l), . . . , H(L)}={{h1(l), . . . , hTl(l)}, . . . , {h1(L), . . . , hTL(L)}}  (5)and{α(l), . . . , α(L)}={{α1(l), . . . , αTl(l)}, . . . , {α1(L), . . . , αTL(L)) }}  (6)The memory and computational complexity for the detection task of this approach scales linearly with L and in many cases circumvents a real-time detection system. Such a computationally extensive approach is not feasible in time, cost or efficiency to be considered as a reasonable solution. There is a need for a detection system that is capable of real-time multi-class detection and which can perform in an efficient manner.