Many vision-based applications rely on obtaining an accurate estimate of the location and size (scale or extent), i.e., localization, of the objects in a scene. However, detecting objects and estimating their sizes in the 2D plane is difficult, and is a subject of on-going research. Many current object detection techniques may err by detecting parts of an object as multiple separate objects instead of generating a single output for the whole object. This object fragmentation can cause inaccuracies in any additional processing that relies on the output of the object detection technique. For example, an application may be counting the number of people in a scene. If the object detection technique detects a person as multiple objects rather than a single object, the count will be inaccurate.
In an effort to determine a more accurate 2D object size to reduce the chances of object fragmentation, some object detection techniques perform complex, computationally intensive processing such as processing an image at multiple scales using an image pyramid, and/or using clustering algorithms to group similar features together, etc. These techniques may be effective in some cases, but come at a high cost in terms of processing time and computation and memory bandwidth. As a result, object detection may be a bottleneck in many applications. Accordingly, improvements in object localization are desirable.