Feature detection within images and streams of images is becoming an increasingly important function in image acquisition and processing devices.
Face detection and tracking, for example, as described in European Patent No. EP2052347 (Ref: FN-143) is a well-known example of feature detection in image processing. These techniques enable one or more face regions within a scene being imaged to be readily delineated and to allow for subsequent image processing based on this information. Such subsequent image processing can include face recognition which attempts to identify individuals being imaged, for example, for tagging or authentication purposes; auto-focusing by bringing a detected and/or selected face region into focus; or defect detection and/or correction of the face region(s).
Referring now to FIG. 1, there is shown a block diagram for a conventional type template matching engine (TME) 10 for identifying features within an image or portion of an image. The processing steps of the TME are:                1. A detector cascade is loaded into a detectors buffer 12 from system memory (not shown) across a system bus. A detector cascade comprises information for a sequence of stages which are applied to a window within an image to determine if the window contains an object to be detected. The detector cascade, use of which is explained in more detail below, can be arranged to be applied by a classifier 22 to one or more different forms of features extracted from an image. As well as the image intensity (Y) value itself, examples of features which can be employed by a detector cascade include: Integral Image or II2 image typically employed by a HAAR classifier, Histogram of Gradients (HoG), Census or Linear Binary Patterns (LBP). Details of methods for producing HoG maps are disclosed in PCT Application No. PCT/EP2015/073058 (Ref: FN-398) and U.S. Application No. 62/235,065 filed 30 Sep. 2015 (Ref: FN-0471) and techniques for providing multiple feature maps for a region of interest within an image are disclosed in U.S. Patent Application No. 62/210,243 filed 26 Aug. 2015 (Ref: FN-469);        2. Intensity plane information, for example, a luminance channel, for the input image or image portion is loaded into a Y cache 14 from the system memory across the system bus. (Other image planes could also be used if required.);        3. The image in the Y cache is scanned with a sliding window on various scales, one scale at a time as follows:                    a. A resampler module 16 resamples the input image to the desired scale (usually processing begins with the most downsampled version of an image to detect the largest features).            b. The window size employed after the resampler 16 is typically fixed and, depending on the application and implementation, may be 22×22, 32×32 or 32×64 pixels. (Thus the size of object being detected within a given image depends on the degree of downsampling of the image prior to application of a detector cascade.)            c. The sliding window step between adjacent windows is typically 1 or 2 pixels.                        4. For each pixel location of the sliding window, the values for the corresponding locations of the feature maps (channels), such as those referred to above, are calculated by a feature calculator 18. Note that the feature calculator can take into account the fact that consecutive windows overlap so it does not re-calculate feature map values that have already calculated for an image.        5. The feature map values from the feature calculator 18 can be buffered in a features buffer 20.        6. The classifier 22 applies the detector cascade from the detectors buffer 12 to the feature maps for the current window in the features buffer 20 to determine if the window features match or not an object of interest (e.g. a face). Within the classifier 22, a detector cascade is typically applied stage-by-stage, building a score for a window. A complete detector cascade can have any number stages, for example, up to 4096 stages is a common maximum length. (Note that most windows fail after a few detector stages. For example, with a well-trained classifier, 95% of the windows tested fail after 12 stages.)        7. Steps 2 to 6 of the above process can then be repeated from scratch for the next window in the image.        
As disclosed in PCT Application No. PCT/EP2015/073058 (Ref: FN-398), it is possible for the feature calculation module 18 to provide the required features buffer 20 for a new window at each clock cycle. The classifier 22 typically processes one detector cascade stage per clock cycle and typically, this happens only after the processing pipeline is filled at the start of each new window—this can again involve a number of clock cycles.
Thus, it will be seen that while processing one window, the classifier 22 needs to stall the whole pipeline before it (using a backpressure mechanism indicated by the upwards arrows connecting elements 22-14). Thus, the classifier 22 is the bottleneck of the process, due to the fact that the detector cascade stages must be applied in a sequence.