Detecting objects in frames of image data can be useful in a variety of applications. For example, image data from security systems and vehicle-based cameras can be processed to identify and track movement of people or objects. Object detection involves classifying image data to identify candidate windows that may include objects of interest. The classification stage typically will output several candidate windows depending on the classification threshold used. It is important to group these candidate windows into few locations that represent the actual locations of the object. Continuing improvements in video processing, however, are needed to support advanced vehicle control and other real-time applications in which identification and tracking of pedestrians or other objects is important. In particular, the classification step of image object identification yields several candidate windows which are then grouped into a smaller number of locations that represent actual locations of objects in the image data. Non-maxima suppression (NMS) grouping techniques and “mean shift grouping” are two popular window grouping approaches, but neither provides a complete solution for both accurate object identification and tracking, as well as computational efficiency suitable for real-time applications. NMS grouping is particularly suited for single frame detections, but multi-frame applications for object tracking using NMS grouping provide poor temporal consistency in which group locations appear jittery and lacking in smooth movement. Mean shift grouping has good temporal consistency, but it is computationally inefficient, as this technique concurrently determines the number of clusters and finds the cluster centers using iterative techniques typically requiring many iterations. Accordingly, improved solutions for grouping for object identification and location, as well as multi-frame object tracking are desired.