The dimensions of a moving object as captured in a video image are difficult to estimate because many factors affect the distance among pixels representing the prominent features of the moving object. When a moving object is captured in a video image, information regarding the three-dimensional coordinates of each feature point in a physical world is reduced to information containing two-dimensional coordinates in a video image. Specifically, all points in the line of sight from a video camera are reduced to a single pixel in a video image. As the mapping of the three-dimensional locations to the two dimensional coordinates in the video image eliminates one dimension, multiple points in the physical world correspond to a single point in the video image. Thus, the distance between the video camera and any specific point of a moving object as captured in a video image must be estimated by other means.
The difficulty in the estimation of the dimensions of the moving object in a time series of video images is complicated by the image distortion inherent in all video cameras, and the variations in the distance at which the moving object may appear. For example, a video camera configured to monitor movement of vehicles on a road containing multiple lanes generates images in which a moving vehicle may be present in any lane.
Homography refers to taking measurements on the ground and transforming imagery taken from cameras in fixed positions to “real world” measurements. Referring to FIG. 1A, an example of a video image in a time series of video images as acquired by a video camera is shown. The distance between pixels representing a set of prominent features of the vehicle in a video image varies with the distance between the vehicle and the video camera. For example, a pixel-to-pixel distance of 10 pixels in FIG. 1A corresponds to different real-world distances depending on the located in the image even for the points on the road.
Referring to FIG. 1B, an example of a homographied image employing a homography reference plane that coincides with the plane of a surface of a road is shown. In a homographied image, all pixels in a video images are presumed to be a representation of an object in a homography reference plane, which is the plane of the road in this case. Thus, the distance between pixels representing physical points on the road is linearly proportional to the physical distance between the physical points on the road. However, the distance between pixels representing physical points on the vehicle is not proportional to the distance between the physical points on the vehicle. For example, a pixel-to-pixel distance of 10 pixels in FIG. BA is a fixed distance provided that the pixels are selected in the ground plane, which is the plane of the road, and also the homography reference plane of the homography transformation. Points outside the homography reference plane, i.e., out of the plane of the road, do not have a linear relationship between a pixel-to-pixel distance and the corresponding physical distance. Objects appear somewhat distorted on points that are not in the homography reference plane in homographied images.
Thus, estimation of the dimensions of features of the vehicle, such as the height of the vehicle, from homographied images requires additional knowledge of the distance between the camera and the moving vehicle, which cannot be extracted from the homographied images alone. There is considerable uncertainty in the accuracy of the data regarding the dimensions between features of the moving object as extracted from homographied images.
In view of the above, there exists a need for a method for accurately determining the dimensions between features of a moving object from a time series of video images. Particularly, there exists a need for estimating the height of a moving object from a time series of video images.